Preface |
|
xiii | |
About the Editors |
|
xvii | |
1 Basic Arithmetic Circuits |
|
1 | (32) |
|
|
|
|
1 | (1) |
|
1.2 Addition and Subtraction |
|
|
1 | (3) |
|
1.2.1 Ripple-Carry Addition |
|
|
2 | (1) |
|
1.2.2 Bit-Serial Addition and Subtraction |
|
|
3 | (1) |
|
1.2.3 Digit-Serial Addition and Subtraction |
|
|
4 | (1) |
|
|
4 | (11) |
|
1.3.1 Partial Product Generation |
|
|
5 | (1) |
|
1.3.2 Avoiding Sign-Extension (the Baugh and Wooley Method) |
|
|
6 | (1) |
|
1.3.3 Reducing the Number of Partial Products |
|
|
6 | (2) |
|
1.3.4 Reducing the Number of Columns |
|
|
8 | (1) |
|
1.3.5 Accumulation Structures |
|
|
8 | (3) |
|
1.3.5.1 Add-and-Shift Accumulation |
|
|
8 | (1) |
|
1.3.5.2 Array Accumulation |
|
|
9 | (1) |
|
1.3.5.3 Tree Accumulation |
|
|
9 | (2) |
|
1.3.5.4 Vector-Merging Adder |
|
|
11 | (1) |
|
1.3.6 Serial/Parallel Multiplication |
|
|
11 | (4) |
|
1.4 Sum-of-Products Circuits |
|
|
15 | (4) |
|
|
17 | (1) |
|
1.4.2 Linear-Phase FIR Filters |
|
|
18 | (1) |
|
1.4.3 Polynomial Evaluation (Homer's Method) |
|
|
18 | (1) |
|
1.4.4 Multiple-Wordlength SOP |
|
|
18 | (1) |
|
|
19 | (5) |
|
|
19 | (2) |
|
|
21 | (2) |
|
1.5.3 Multiplication Through Squaring |
|
|
23 | (1) |
|
1.6 Complex Multiplication |
|
|
24 | (2) |
|
1.6.1 Complex Multiplication Using Real Multipliers |
|
|
24 | (1) |
|
1.6.2 Lifting-Based Complex Multipliers |
|
|
25 | (1) |
|
|
26 | (4) |
|
1.7.1 Square Root Computation |
|
|
26 | (2) |
|
1.7.1.1 Digit-Wise Iterative Square Root Computation |
|
|
27 | (1) |
|
1.7.1.2 Iterative Refinement Square Root Computation |
|
|
27 | (1) |
|
1.7.2 Polynomial and Piecewise Polynomial Approximations |
|
|
28 | (2) |
|
|
30 | (3) |
2 Shift-Add Circuits for Constant Multiplications |
|
33 | (44) |
|
|
|
|
|
|
|
33 | (3) |
|
2.2 Representation of Constants |
|
|
36 | (4) |
|
2.3 Single Constant Multiplication |
|
|
40 | (3) |
|
2.3.1 Direct Simplification from a Given Number Representation |
|
|
40 | (1) |
|
2.3.2 Simplification by Redundant Signed Digit Representation |
|
|
41 | (1) |
|
2.3.3 Simplification by Adder Graph Approach |
|
|
41 | (2) |
|
2.3.4 State of the Art in SCM |
|
|
43 | (1) |
|
2.4 Algorithms for Multiple Constant Multiplications |
|
|
43 | (15) |
|
2.4.1 MCM for FIR Digital Filter and Basic Considerations |
|
|
43 | (2) |
|
2.4.2 The Adder Graph Approach |
|
|
45 | (4) |
|
2.4.2.1 The Early Developments |
|
|
45 | (1) |
|
2.4.2.2 n-Dimensional Reduced Adder Graph (RAG-n) Algorithm |
|
|
46 | (1) |
|
2.4.2.3 Maximum/Cumulative Benefit Heuristic Algorithm |
|
|
47 | (1) |
|
2.4.2.4 Minimization of Logic Depth |
|
|
47 | (1) |
|
2.4.2.5 Illustration and Comparison of AG Approaches |
|
|
48 | (1) |
|
2.4.3 Common Subexpression Elimination Algorithms |
|
|
49 | (7) |
|
2.4.3.1 Basic CSE Techniques in CSD Representation |
|
|
51 | (1) |
|
2.4.3.2 Multidirectional Pattern Search and Elimination |
|
|
52 | (1) |
|
2.4.3.3 CSE in Generalized Radix-2 SD Representation |
|
|
53 | (1) |
|
2.4.3.4 The Contention Resolution Algorithm |
|
|
54 | (1) |
|
2.4.3.5 Examples and Comparison of MCM Approaches |
|
|
54 | (2) |
|
2.4.4 Difference Algorithms |
|
|
56 | (1) |
|
2.4.5 Reconfigurable and Time-Multiplexed Multiple Constant Multiplications |
|
|
56 | (2) |
|
2.5 Optimization Schemes and Optimal Algorithms |
|
|
58 | (4) |
|
2.5.1 Optimal Subexpression Sharing |
|
|
58 | (2) |
|
2.5.2 Representation Independent Formulations |
|
|
60 | (2) |
|
|
62 | (2) |
|
2.6.1 Implementation of FIR Digital Filters and Filter Banks |
|
|
62 | (1) |
|
2.6.2 Implementation of Sinusoidal and Other Linear Transforms |
|
|
63 | (1) |
|
|
63 | (1) |
|
2.7 Pitfalls and Scope for Future Work |
|
|
64 | (2) |
|
2.7.1 Selection of Figure of Merit |
|
|
64 | (1) |
|
2.7.2 Benchmark Suites for Algorithm Evaluation |
|
|
65 | (1) |
|
2.7.3 FPGA-Oriented Design of Algorithms and Architectures |
|
|
65 | (1) |
|
|
66 | (1) |
|
|
67 | (10) |
3 DA-Based Circuits for Inner-Product Computation |
|
77 | (36) |
|
|
|
|
|
77 | (1) |
|
3.2 Mathematical Foundation and Concepts |
|
|
78 | (3) |
|
3.3 Techniques for Area Optimization of DA-Based Implementations |
|
|
81 | (12) |
|
3.3.1 Offset Binary Coding |
|
|
81 | (4) |
|
|
85 | (1) |
|
3.3.3 Coefficient Partitioning |
|
|
85 | (2) |
|
3.3.4 Exploiting Coefficient Symmetry |
|
|
87 | (1) |
|
3.3.5 LUT Implementation Optimization |
|
|
88 | (2) |
|
3.3.6 Adder-Based DA Implementation with Single Adder |
|
|
90 | (1) |
|
3.3.7 LUT Optimization for Fixed Coefficients |
|
|
90 | (2) |
|
3.3.8 Inner-Product with Data and Coefficients Represented as Complex Numbers |
|
|
92 | (1) |
|
3.4 Techniques for Performance Optimization of DA-Based Implementations |
|
|
93 | (5) |
|
3.4.1 Two-Bits-at-a-Time (2-BAAT) Access |
|
|
93 | (1) |
|
3.4.2 Coefficient Distribution over Data |
|
|
93 | (2) |
|
3.4.3 RNS-Based Implementation |
|
|
95 | (3) |
|
3.5 Techniques for Low Power and Reconfigurable Realization of DA-Based Implementations |
|
|
98 | (11) |
|
3.5.1 Adder-Based DA with Fixed Coefficients |
|
|
99 | (1) |
|
3.5.2 Eliminating Redundant LUT Accesses and Additions |
|
|
100 | (2) |
|
3.5.3 Using Zero-Detection to Reduce LUT Accesses and Additions |
|
|
102 | (1) |
|
3.5.4 Nega-Binary Coding for Reducing Input Toggles and LUT Look-Ups |
|
|
103 | (4) |
|
3.5.5 Accuracy versus Power Tradeoff |
|
|
107 | (1) |
|
3.5.6 Reconfigurable DA-Based Implementations |
|
|
108 | (1) |
|
|
108 | (1) |
|
|
109 | (4) |
4 Table-Based Circuits for DSP Applications |
|
113 | (36) |
|
|
|
|
113 | (2) |
|
4.2 LUT Design for Implementation of Boolean Function |
|
|
115 | (2) |
|
4.3 Lookup Table Design for Constant Multiplication |
|
|
117 | (10) |
|
4.3.1 Lookup Table Optimizations for Constant Multiplication |
|
|
117 | (5) |
|
4.3.1.1 Antisymmetric Product Coding for LUT Optimization |
|
|
119 | (1) |
|
4.3.1.2 Odd Multiple-Storage for LUT Optimization |
|
|
119 | (3) |
|
4.3.2 Implementation of LUT-Multiplier using APC for L = 5 |
|
|
122 | (1) |
|
4.3.3 Implementation of Optimized LUT using OMS Technique |
|
|
123 | (1) |
|
4.3.4 Optimized LUT Design for Signed and Unsigned Operands |
|
|
124 | (2) |
|
4.3.5 Input Operand Decomposition for Large Input Width |
|
|
126 | (1) |
|
4.4 Evaluation of Elementary Arithmetic Functions |
|
|
127 | (7) |
|
4.4.1 Piecewise Polynomial Approximation (PPA) Approach for Function Evaluation |
|
|
128 | (3) |
|
4.4.1.1 Accuracy of PPA Method |
|
|
129 | (1) |
|
4.4.1.2 Input Interval Partitioning for PPA |
|
|
130 | (1) |
|
4.4.2 Table-Addition (TA) Approach for Function Evaluation |
|
|
131 | (3) |
|
4.4.2.1 Bipartite Table-Addition Approach for Function Evaluation |
|
|
131 | (1) |
|
4.4.2.2 Accuracy of TA Method |
|
|
132 | (2) |
|
4.4.2.3 Multipartite Table-Addition Approach for Function Evaluation |
|
|
134 | (1) |
|
|
134 | (11) |
|
4.5.1 LUT-Based Implementation of Cyclic Convolution and Orthogonal Transforms |
|
|
135 | (1) |
|
4.5.2 LUT-Based Evaluation of Reciprocals and Division Operation |
|
|
136 | (2) |
|
4.5.2.1 Mathematical Formulation |
|
|
136 | (1) |
|
4.5.2.2 LUT-Based Structure for Evaluation of Reciprocal |
|
|
137 | (1) |
|
4.5.2.3 Reduction of LUT Size |
|
|
137 | (1) |
|
4.5.3 LUT-Based Design for Evaluation of Sigmoid Function |
|
|
138 | (5) |
|
4.5.3.1 LUT Optimization Strategy using Properties of Sigmoid Function |
|
|
139 | (2) |
|
4.5.3.2 Design Example for epsilon = 0.2 |
|
|
141 | (1) |
|
4.5.3.3 Implementation of Design Example for epsilon = 0.2 |
|
|
142 | (1) |
|
|
143 | (2) |
|
|
145 | (4) |
5 CORDIC Circuits |
|
149 | (37) |
|
|
|
|
|
|
|
149 | (2) |
|
5.2 Basic CORDIC Techniques |
|
|
151 | (5) |
|
5.2.1 The CORDIC Algorithm |
|
|
151 | (3) |
|
5.2.1.1 Iterative Decomposition of Angle of Rotation |
|
|
152 | (1) |
|
5.2.1.2 Avoidance of Scaling |
|
|
153 | (1) |
|
5.2.2 Generalization of the CORDIC Algorithm |
|
|
154 | (1) |
|
5.2.3 Multidimensional CORDIC |
|
|
155 | (1) |
|
5.3 Advanced CORDIC Algorithms and Architectures |
|
|
156 | (12) |
|
5.3.1 High-Radix CORDIC Algorithm |
|
|
157 | (1) |
|
5.3.2 Angle Recoding Methods |
|
|
158 | (3) |
|
5.3.2.1 Elementary Angle Set Recoding |
|
|
158 | (1) |
|
5.3.2.2 Extended Elementary Angle Set Recoding |
|
|
159 | (1) |
|
5.3.2.3 Parallel Angle Recoding |
|
|
159 | (2) |
|
5.3.3 Hybrid or Coarse-Fine Rotation CORDIC |
|
|
161 | (3) |
|
5.3.3.1 Coarse-Fine Angular Decomposition |
|
|
161 | (1) |
|
5.3.3.2 Implementation of Hybrid CORDIC |
|
|
162 | (1) |
|
5.3.3.3 Shift-Add Implementation of Coarse Rotation |
|
|
163 | (1) |
|
5.3.3.4 Parallel CORDIC-Based on Coarse-Fine Decomposition |
|
|
164 | (1) |
|
5.3.4 Redundant Number-Based CORDIC Implementation |
|
|
164 | (2) |
|
5.3.5 Pipelined CORDIC Architecture |
|
|
166 | (1) |
|
5.3.6 Differential CORDIC Algorithm |
|
|
167 | (1) |
|
5.4 Scaling, Quantization, and Accuracy Issues |
|
|
168 | (4) |
|
5.4.1 Implementation of Mixed-Scaling Rotation |
|
|
168 | (1) |
|
5.4.2 Low-Complexity Scaling |
|
|
169 | (1) |
|
5.4.3 Quantization and Numerical Accuracy |
|
|
170 | (1) |
|
5.4.4 Area-Delay-Accuracy Trade-off |
|
|
170 | (2) |
|
5.5 Applications of CORDIC |
|
|
172 | (6) |
|
|
172 | (1) |
|
|
172 | (1) |
|
5.5.1.2 Singular Value Decomposition and Eigenvalue Estimation |
|
|
173 | (1) |
|
5.5.2 Signal Processing and Image Processing Applications |
|
|
173 | (1) |
|
5.5.3 Applications to Communication |
|
|
174 | (2) |
|
5.5.3.1 Direct Digital Synthesis |
|
|
175 | (1) |
|
5.5.3.2 Analog and Digital Modulation |
|
|
175 | (1) |
|
5.5.3.3 Other Communication Applications |
|
|
176 | (1) |
|
5.5.4 Applications of CORDIC to Robotics and Graphics |
|
|
176 | (13) |
|
5.5.4.1 Direct Kinematics Solution (DKS) for Serial Robot Manipulators |
|
|
176 | (1) |
|
5.5.4.2 Inverse Kinematics for Robot Manipulators |
|
|
177 | (1) |
|
5.5.4.3 CORDIC for Other Robotics Applications |
|
|
177 | (1) |
|
5.5.4.4 CORDIC for 3D Graphics |
|
|
177 | (1) |
|
|
178 | (1) |
|
|
178 | (8) |
6 RNS-Based Arithmetic Circuits and Applications |
|
186 | (51) |
|
|
|
186 | (3) |
|
6.2 Modulo Addition and Subtraction |
|
|
189 | (4) |
|
6.2.1 Modulo (2n - 1) Adders |
|
|
189 | (2) |
|
6.2.2 Modulo (2n + 1) Adders |
|
|
191 | (2) |
|
6.3 Modulo Multiplication and Modulo Squaring |
|
|
193 | (7) |
|
6.3.1 Multipliers for General Moduli |
|
|
194 | (1) |
|
6.3.2 Multipliers mod (2n - 1) |
|
|
195 | (1) |
|
6.3.3 Multipliers mod (2n + 1) |
|
|
196 | (3) |
|
|
199 | (1) |
|
6.4 Forward (binary to RNS) Conversion |
|
|
200 | (3) |
|
6.5 RNS to Binary Conversion |
|
|
203 | (7) |
|
6.5.1 CRT-Based RNS to Binary Conversion |
|
|
203 | (3) |
|
6.5.2 Mixed Radix Conversion |
|
|
206 | (1) |
|
6.5.3 RNS to Binary conversion using New CRT |
|
|
207 | (1) |
|
6.5.4 RNS to Binary conversion using Core Function |
|
|
208 | (2) |
|
6.6 Scaling and Base Extension |
|
|
210 | (3) |
|
6.7 Magnitude Comparison and Sign Detection |
|
|
213 | (1) |
|
6.8 Error Correction and Detection |
|
|
214 | (2) |
|
|
216 | (10) |
|
|
216 | (2) |
|
6.9.2 RNS in Cryptography |
|
|
218 | (7) |
|
6.9.2.1 Montgomery Modular Multiplication |
|
|
218 | (3) |
|
6.9.2.2 Elliptic Curve Cryptography using RNS |
|
|
221 | (2) |
|
6.9.2.3 Pairing processors using RNS |
|
|
223 | (2) |
|
6.9.3 RNS in Digital Communication Systems |
|
|
225 | (1) |
|
|
226 | (11) |
7 Logarithmic Number System |
|
237 | (36) |
|
|
|
|
237 | (1) |
|
7.1.1 The Logarithmic Number System |
|
|
237 | (1) |
|
7.1.2 Organization of the Chapter |
|
|
237 | (1) |
|
7.2 Basics of LNS Representation |
|
|
238 | (2) |
|
7.2.1 LNS and Equivalence to Linear Representation |
|
|
238 | (2) |
|
7.3 Fundamental Arithmetic Operations |
|
|
240 | (9) |
|
7.3.1 Multiplication, Division, Roots, and Powers |
|
|
240 | (1) |
|
7.3.2 Addition and Subtraction |
|
|
241 | (10) |
|
7.3.2.1 Direct Techniques and Approximations |
|
|
242 | (1) |
|
7.3.2.2 Cancellation of Singularities |
|
|
242 | (7) |
|
7.4 Forward and Inverse Conversion |
|
|
249 | (1) |
|
7.5 Complex Arithmetic in LNS |
|
|
250 | (1) |
|
|
251 | (6) |
|
7.6.1 A VLIW LNS Processor |
|
|
252 | (5) |
|
|
253 | (1) |
|
7.6.1.2 The Arithmetic Logic Units |
|
|
253 | (3) |
|
|
256 | (1) |
|
7.6.1.4 Memory Organization |
|
|
256 | (1) |
|
7.6.1.5 Applications and the Proposed Architecture |
|
|
257 | (1) |
|
7.6.1.6 Performance and Comparisons |
|
|
257 | (1) |
|
7.7 LNS for Low-Power Dissipation |
|
|
257 | (8) |
|
7.7.1 Impact of LNS Encoding on Signal Activity |
|
|
258 | (3) |
|
7.7.2 Power Dissipation and LNS Architecture |
|
|
261 | (4) |
|
|
265 | (3) |
|
7.8.1 Signal Processing and Communications |
|
|
265 | (2) |
|
|
267 | (1) |
|
|
268 | (1) |
|
|
268 | (1) |
|
|
269 | (4) |
8 Redundant Number System-Based Arithmetic Circuits |
|
273 | (40) |
|
|
|
273 | (5) |
|
8.1.1 Introductory Definitions and Examples |
|
|
274 | (4) |
|
8.1.1.1 Posibits and Negabits |
|
|
275 | (3) |
|
8.2 Fundamentals of Redundant Number Systems |
|
|
278 | (2) |
|
8.2.1 Redundant Digit Sets |
|
|
278 | (2) |
|
8.3 Redundant Number Systems |
|
|
280 | (7) |
|
8.3.1 Constant Time Addition |
|
|
281 | (2) |
|
8.3.2 Carry-Save Addition |
|
|
283 | (2) |
|
8.3.3 Borrow Free Subtraction |
|
|
285 | (2) |
|
8.4 Basic Arithmetic Circuits for Redundant Number Systems |
|
|
287 | (10) |
|
8.4.1 Circuit Realization of Carry-Free Adders |
|
|
287 | (1) |
|
8.4.2 Fast Maximally Redundant Carry-Free Adders |
|
|
288 | (2) |
|
8.4.3 Carry-Free Addition of Symmetric Maximally Redundant Numbers |
|
|
290 | (2) |
|
8.4.4 Addition and Subtraction of Stored-Carry Encoded Redundant Operands |
|
|
292 | (5) |
|
8.4.4.1 Stored Unibit Carry Free Adders |
|
|
294 | (2) |
|
8.4.4.2 Stored Unibit Subtraction |
|
|
296 | (1) |
|
8.5 Binary to Redundant Conversion and the Reverse |
|
|
297 | (2) |
|
8.5.1 Binary to MRSD Conversion and the Reverse |
|
|
297 | (1) |
|
8.5.2 Binary to Stored Unibit Conversion and the Reverse |
|
|
298 | (1) |
|
8.6 Special Arithmetic Circuits for Redundant Number Systems |
|
|
299 | (4) |
|
8.6.1 Radix-2h MRSD Arithmetic Shifts |
|
|
299 | (1) |
|
8.6.2 Stored Unibit Arithmetic Shifts |
|
|
300 | (3) |
|
|
303 | (1) |
|
|
303 | (5) |
|
8.7.1 Redundant Representation of Partial Products |
|
|
305 | (1) |
|
8.7.2 Recoding the Multiplier to a Redundant Representation |
|
|
305 | (1) |
|
8.7.3 Use of Redundant Number Systems in Digit Recurrence Algorithms |
|
|
306 | (1) |
|
8.7.4 Transcendental Functions and Redundant Number Systems |
|
|
306 | (1) |
|
8.7.5 RDNS and Fused Multiply-Add Operation |
|
|
307 | (1) |
|
8.7.6 RDNS and Floating Point Arithmetic |
|
|
307 | (1) |
|
8.7.7 RDNS and RNS Arithmetic |
|
|
308 | (1) |
|
8.8 Summary and Further Reading |
|
|
308 | (1) |
|
|
309 | (4) |
Index |
|
313 | |