List of Tables |
|
xiii | |
List of Figures |
|
xv | |
List of Listings |
|
xvii | |
Preface |
|
xxi | |
Companion Website |
|
xxv | |
Acknowledgments |
|
xxvii | |
Part I Assembly As A Language |
|
1 | (168) |
|
|
3 | (32) |
|
1.1 Reasons to Learn Assembly |
|
|
4 | (4) |
|
|
8 | (1) |
|
|
9 | (19) |
|
1.3.1 Representing Natural Numbers |
|
|
9 | (2) |
|
|
11 | (4) |
|
1.3.3 Representing Integers |
|
|
15 | (5) |
|
1.3.4 Representing Characters |
|
|
20 | (8) |
|
1.4 Memory Layout of an Executing Program |
|
|
28 | (3) |
|
|
31 | (4) |
|
Chapter 2 GNU Assembly Syntax |
|
|
35 | (18) |
|
2.1 Structure of an Assembly Program |
|
|
36 | (2) |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
37 | (1) |
|
2.1.4 Assembly Instructions |
|
|
38 | (1) |
|
2.2 What the Assembler Does |
|
|
38 | (2) |
|
2.3 GNU Assembly Directives |
|
|
40 | (10) |
|
2.3.1 Selecting the Current Section |
|
|
40 | (1) |
|
2.3.2 Allocating Space for Variables and Constants |
|
|
41 | (2) |
|
2.3.3 Filling and Aligning |
|
|
43 | (2) |
|
2.3.4 Setting and Manipulating Symbols |
|
|
45 | (1) |
|
2.3.5 Conditional Assembly |
|
|
46 | (1) |
|
2.3.6 Including Other Source Files |
|
|
47 | (1) |
|
|
48 | (2) |
|
|
50 | (3) |
|
Chapter 3 Load/Store and Branch Instructions |
|
|
53 | (26) |
|
3.1 CPU Components and Data Paths |
|
|
54 | (1) |
|
|
55 | (3) |
|
3.3 Instruction Components |
|
|
58 | (2) |
|
3.3.1 Setting and Using Condition Flags |
|
|
58 | (1) |
|
|
59 | (1) |
|
3.4 Load/Store Instructions |
|
|
60 | (10) |
|
|
61 | (3) |
|
3.4.2 Load/Store Single Register |
|
|
64 | (1) |
|
3.4.3 Load/Store Multiple Registers |
|
|
65 | (3) |
|
|
68 | (1) |
|
3.4.5 Exclusive Load/Store |
|
|
69 | (1) |
|
|
70 | (3) |
|
|
70 | (1) |
|
|
71 | (2) |
|
|
73 | (3) |
|
|
73 | (2) |
|
|
75 | (1) |
|
|
76 | (3) |
|
Chapter 4 Data Processing and Other Instructions |
|
|
79 | (20) |
|
4.1 Data Processing Instructions |
|
|
79 | (11) |
|
|
80 | (1) |
|
4.1.2 Comparison Operations |
|
|
81 | (2) |
|
4.1.3 Arithmetic Operations |
|
|
83 | (2) |
|
|
85 | (1) |
|
4.1.5 Data Movement Operations |
|
|
86 | (1) |
|
4.1.6 Multiply Operations with 32-bit Results |
|
|
87 | (1) |
|
4.1.7 Multiply Operations with 64-bit Results |
|
|
88 | (1) |
|
4.1.8 Division Operations |
|
|
89 | (1) |
|
|
90 | (3) |
|
4.2.1 Count Leading Zeros |
|
|
90 | (1) |
|
4.2.2 Accessing the CPSR and SPSR |
|
|
91 | (1) |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
93 | (2) |
|
|
93 | (1) |
|
|
94 | (1) |
|
4.4 Alphabetized List of ARM Instructions |
|
|
95 | (1) |
|
|
96 | (3) |
|
Chapter 5 Structured Programming |
|
|
99 | (38) |
|
|
100 | (1) |
|
|
101 | (3) |
|
5.2.1 Using Conditional Execution |
|
|
101 | (1) |
|
5.2.2 Using Branch Instructions |
|
|
102 | (1) |
|
|
103 | (1) |
|
|
104 | (4) |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
106 | (2) |
|
|
108 | (15) |
|
5.4.1 Advantages of Subroutines |
|
|
109 | (1) |
|
5.4.2 Disadvantages of Subroutines |
|
|
110 | (1) |
|
5.4.3 Standard C Library Functions |
|
|
110 | (1) |
|
|
110 | (3) |
|
5.4.5 Calling Subroutines |
|
|
113 | (4) |
|
5.4.6 Writing Subroutines |
|
|
117 | (1) |
|
5.4.7 Automatic Variables |
|
|
118 | (1) |
|
5.4.8 Recursive Functions |
|
|
119 | (4) |
|
|
123 | (8) |
|
|
124 | (1) |
|
|
124 | (2) |
|
5.5.3 Arrays of Structured Data |
|
|
126 | (5) |
|
|
131 | (6) |
|
Chapter 6 Abstract Data Types |
|
|
137 | (32) |
|
6.1 ADTs in Assembly Language |
|
|
138 | (1) |
|
6.2 Word Frequency Counts |
|
|
139 | (22) |
|
6.2.1 Sorting by Word Frequency |
|
|
147 | (3) |
|
|
150 | (11) |
|
6.3 Ethics Case Study: Therac-25 |
|
|
161 | (4) |
|
6.3.1 History of the Therac-25 |
|
|
162 | (1) |
|
6.3.2 Overview of Design Flaws |
|
|
163 | (2) |
|
|
165 | (4) |
Part II Performance Mathematics |
|
169 | (194) |
|
Chapter 7 Integer Mathematics |
|
|
171 | (48) |
|
7.1 Subtraction by Addition |
|
|
172 | (1) |
|
7.2 Binary Multiplication |
|
|
172 | (9) |
|
7.2.1 Multiplication by a Power of Two |
|
|
173 | (1) |
|
7.2.2 Multiplication of Two Variables |
|
|
173 | (4) |
|
7.2.3 Multiplication of a Variable by a Constant |
|
|
177 | (1) |
|
7.2.4 Signed Multiplication |
|
|
178 | (1) |
|
7.2.5 Multiplying Large Numbers |
|
|
179 | (2) |
|
|
181 | (14) |
|
7.3.1 Division by a Power of Two |
|
|
181 | (1) |
|
7.3.2 Division by a Variable |
|
|
182 | (8) |
|
7.3.3 Division by a Constant |
|
|
190 | (4) |
|
7.3.4 Dividing Large Numbers |
|
|
194 | (1) |
|
|
195 | (21) |
|
|
216 | (3) |
|
Chapter 8 Non-Integral Mathematics |
|
|
219 | (46) |
|
8.1 Base Conversion of Fractional Numbers |
|
|
220 | (3) |
|
8.1.1 Arbitrary Base to Decimal |
|
|
220 | (1) |
|
8.1.2 Decimal to Arbitrary Base |
|
|
220 | (3) |
|
|
223 | (3) |
|
|
226 | (5) |
|
8.3.1 Interpreting Fixed-Point Numbers |
|
|
226 | (4) |
|
|
230 | (1) |
|
8.3.3 Properties of Fixed-Point Numbers |
|
|
230 | (1) |
|
8.4 Fixed-Point Operations |
|
|
231 | (11) |
|
8.4.1 Fixed-Point Addition and Subtraction |
|
|
231 | (1) |
|
8.4.2 Fixed Point Multiplication |
|
|
232 | (2) |
|
8.4.3 Fixed Point Division |
|
|
234 | (2) |
|
8.4.4 Division by a Constant |
|
|
236 | (6) |
|
8.5 Floating Point Numbers |
|
|
242 | (4) |
|
8.5.1 IEEE 754 Half-Precision |
|
|
243 | (2) |
|
8.5.2 IEEE 754 Single-Precision |
|
|
245 | (1) |
|
8.5.3 IEEE 754 Double-Precision |
|
|
245 | (1) |
|
8.5.4 IEEE 754 Quad-Precision |
|
|
246 | (1) |
|
8.6 Floating Point Operations |
|
|
246 | (1) |
|
8.6.1 Floating Point Addition and Subtraction |
|
|
246 | (1) |
|
8.6.2 Floating Point Multiplication and Division |
|
|
247 | (1) |
|
8.7 Computing Sine and Cosine |
|
|
247 | (14) |
|
8.7.1 Formats for the Powers of x |
|
|
248 | (1) |
|
8.7.2 Formats and Constants for the Factorial Terms |
|
|
249 | (2) |
|
8.7.3 Putting it All Together |
|
|
251 | (8) |
|
8.7.4 Performance Comparison |
|
|
259 | (2) |
|
8.8 Ethics Case Study: Patriot Missile Failure |
|
|
261 | (2) |
|
|
263 | (2) |
|
Chapter 9 The ARM Vector Floating Point Coprocessor |
|
|
265 | (32) |
|
9.1 Vector Floating Point Overview |
|
|
266 | (2) |
|
9.2 Floating Point Status and Control Register |
|
|
268 | (5) |
|
9.2.1 Performance Versus Compliance |
|
|
271 | (1) |
|
|
272 | (1) |
|
|
273 | (1) |
|
9.4 Load/Store Instructions |
|
|
274 | (3) |
|
9.4.1 Load/Store Single Register |
|
|
274 | (1) |
|
9.4.2 Load/Store Multiple Registers |
|
|
275 | (2) |
|
9.5 Data Processing Instructions |
|
|
277 | (2) |
|
9.5.1 Copy, Absolute Value, Negate, and Square Root |
|
|
277 | (1) |
|
9.5.2 Add, Subtract, Multiply, and Divide |
|
|
278 | (1) |
|
|
279 | (1) |
|
9.6 Data Movement Instructions |
|
|
279 | (3) |
|
9.6.1 Moving Between Two VFP Registers |
|
|
279 | (1) |
|
9.6.2 Moving Between VFP Register and One Integer Register |
|
|
280 | (1) |
|
9.6.3 Moving Between VFP Register and Two Integer Registers |
|
|
281 | (1) |
|
9.6.4 Move Between ARM Register and VFP System Register |
|
|
282 | (1) |
|
9.7 Data Conversion Instructions |
|
|
282 | (3) |
|
9.7.1 Convert Between Floating Point and Integer |
|
|
282 | (2) |
|
9.7.2 Convert Between Fixed Point and Single Precision |
|
|
284 | (1) |
|
9.8 Floating Point Sine Function |
|
|
285 | (7) |
|
9.8.1 Sine Function Using Scalar Mode |
|
|
285 | (2) |
|
9.8.2 Sine Function Using Vector Mode |
|
|
287 | (4) |
|
9.8.3 Performance Comparison |
|
|
291 | (1) |
|
9.9 Alphabetized List of VFP Instructions |
|
|
292 | (1) |
|
|
293 | (4) |
|
Chapter 10 The ARM NEON Extensions |
|
|
297 | (66) |
|
|
299 | (1) |
|
|
299 | (3) |
|
10.3 Load and Store Instructions |
|
|
302 | (7) |
|
10.3.1 Load or Store Single Structure Using One Lane |
|
|
303 | (2) |
|
10.3.2 Load Copies of a Structure to All Lanes |
|
|
305 | (2) |
|
10.3.3 Load or Store Multiple Structures |
|
|
307 | (2) |
|
10.4 Data Movement Instructions |
|
|
309 | (12) |
|
10.4.1 Moving Between NEON Scalar and Integer Register |
|
|
309 | (1) |
|
10.4.2 Move Immediate Data |
|
|
310 | (1) |
|
10.4.3 Change Size of Elements in a Vector |
|
|
311 | (1) |
|
|
312 | (1) |
|
|
313 | (1) |
|
|
314 | (1) |
|
|
315 | (1) |
|
|
316 | (1) |
|
|
317 | (2) |
|
10.4.10 Zip or Unzip Vectors |
|
|
319 | (2) |
|
|
321 | (1) |
|
10.5.1 Convert Between Fixed Point and Single-Precision |
|
|
321 | (1) |
|
10.5.2 Convert Between Half-Precision and Single-Precision |
|
|
322 | (1) |
|
10.6 Comparison Operations |
|
|
322 | (4) |
|
|
323 | (1) |
|
10.6.2 Vector Absolute Compare |
|
|
324 | (1) |
|
|
325 | (1) |
|
10.7 Bitwise Logical Operations |
|
|
326 | (3) |
|
10.7.1 Bitwise Logical Operations |
|
|
326 | (1) |
|
10.7.2 Bitwise Logical Operations with Immediate Data |
|
|
327 | (1) |
|
10.7.3 Bitwise Insertion and Selection |
|
|
328 | (1) |
|
|
329 | (6) |
|
10.8.1 Shift Left by Immediate |
|
|
329 | (1) |
|
10.8.2 Shift Left or Right by Variable |
|
|
330 | (1) |
|
10.8.3 Shift Right by Immediate |
|
|
331 | (1) |
|
10.8.4 Saturating Shift Right by Immediate |
|
|
332 | (1) |
|
|
333 | (2) |
|
10.9 Arithmetic Instructions |
|
|
335 | (8) |
|
10.9.1 Vector Add and Subtract |
|
|
335 | (1) |
|
10.9.2 Vector Add and Subtract with Narrowing |
|
|
336 | (1) |
|
10.9.3 Add or Subtract and Divide by Two |
|
|
337 | (1) |
|
10.9.4 Add Elements Pairwise |
|
|
338 | (1) |
|
10.9.5 Absolute Difference |
|
|
339 | (1) |
|
10.9.6 Absolute Value and Negate |
|
|
340 | (1) |
|
10.9.7 Get Maximum or Minimum Elements |
|
|
341 | (1) |
|
|
342 | (1) |
|
10.10 Multiplication and Division |
|
|
343 | (8) |
|
|
343 | (2) |
|
10.10.2 Multiply by Scalar |
|
|
345 | (1) |
|
10.10.3 Fused Multiply Accumulate |
|
|
346 | (1) |
|
10.10.4 Saturating Multiply and Double (Low) |
|
|
347 | (1) |
|
10.10.5 Saturating Multiply and Double (High) |
|
|
348 | (1) |
|
10.10.6 Estimate Reciprocals |
|
|
348 | (1) |
|
|
349 | (2) |
|
10.11 Pseudo-Instructions |
|
|
351 | (3) |
|
|
351 | (1) |
|
10.11.2 Bitwise Logical Operations with Immediate Data |
|
|
352 | (1) |
|
10.11.3 Vector Absolute Compare |
|
|
353 | (1) |
|
10.12 Performance Mathematics: A Final Look at Sine |
|
|
354 | (4) |
|
|
354 | (1) |
|
|
355 | (2) |
|
10.12.3 Performance Comparison |
|
|
357 | (1) |
|
10.13 Alphabetized List of NEON Instructions |
|
|
358 | (3) |
|
|
361 | (2) |
Part III Accessing Devices |
|
363 | (104) |
|
|
365 | (30) |
|
11.1 Accessing Devices Directly Under Linux |
|
|
365 | (11) |
|
11.2 General Purpose Digital Input/Output |
|
|
376 | (16) |
|
|
378 | (4) |
|
|
382 | (10) |
|
|
392 | (3) |
|
Chapter 12 Pulse Modulation |
|
|
395 | (10) |
|
12.1 Pulse Density Modulation |
|
|
396 | (1) |
|
12.2 Pulse Width Modulation |
|
|
397 | (1) |
|
12.3 Raspberry Pi PWM Device |
|
|
398 | (2) |
|
|
400 | (3) |
|
|
403 | (2) |
|
Chapter 13 Common System Devices |
|
|
405 | (26) |
|
13.1 Clock Management Device |
|
|
405 | (4) |
|
13.1.1 Raspberry Pi Clock Manager |
|
|
406 | (3) |
|
13.1.2 pcDuino Clock Control Unit |
|
|
409 | (1) |
|
13.2 Serial Communications |
|
|
409 | (20) |
|
|
410 | (3) |
|
13.2.2 Raspberry Pi UARTO |
|
|
413 | (5) |
|
13.2.3 Basic Programming for the Raspberry Pi UART |
|
|
418 | (4) |
|
|
422 | (7) |
|
|
429 | (2) |
|
Chapter 14 Running Without an Operating System |
|
|
431 | (36) |
|
|
432 | (2) |
|
14.2 Exception Processing |
|
|
434 | (8) |
|
14.2.1 Handling Exceptions |
|
|
438 | (4) |
|
|
442 | (1) |
|
14.4 Writing a Bare-Metal Program |
|
|
442 | (7) |
|
|
443 | (2) |
|
|
445 | (2) |
|
|
447 | (2) |
|
14.4.4 Putting it All Together |
|
|
449 | (1) |
|
|
449 | (12) |
|
|
449 | (1) |
|
14.5.2 Interrupt Controllers |
|
|
449 | (9) |
|
|
458 | (3) |
|
14.5.4 Exception Handling |
|
|
461 | (1) |
|
14.5.5 Building the Interrupt-Driven Program |
|
|
461 | (1) |
|
14.6 ARM Processor Profiles |
|
|
461 | (3) |
|
|
464 | (3) |
Index |
|
467 | |