Preface |
|
xvii | |
Acknowledgments |
|
xxiii | |
Author |
|
xxv | |
|
|
1 | (16) |
|
|
1 | (2) |
|
|
2 | (1) |
|
|
2 | (1) |
|
1.1.3 Mainframe Computers |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
3 | (4) |
|
|
3 | (2) |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.3 Hardware and Software Logical Equivalence |
|
|
7 | (1) |
|
1.4 Ladder of Abstraction |
|
|
8 | (6) |
|
1.4.1 Modeling-Level Architecture |
|
|
9 | (1) |
|
1.4.2 Algorithm-Level Architecture |
|
|
9 | (2) |
|
1.4.3 High-Level Architecture |
|
|
11 | (1) |
|
1.4.4 Assembly-Level Architecture |
|
|
12 | (1) |
|
1.4.5 System or Instruction Set Architecture-Level Architecture |
|
|
12 | (1) |
|
1.4.6 Machine or Microarchitecture-Level Architecture |
|
|
12 | (1) |
|
1.4.7 Control or Logic-Level Architecture |
|
|
13 | (1) |
|
1.4.8 Device-Level Architecture |
|
|
13 | (1) |
|
1.5 Application Programming Interfaces |
|
|
14 | (1) |
|
|
15 | (2) |
|
2 Processor Physics and Moore's Law |
|
|
17 | (20) |
|
2.1 Speed of Processing and Power Problem |
|
|
17 | (2) |
|
2.2 Area, Delay, and Power Consumption |
|
|
19 | (5) |
|
|
19 | (1) |
|
|
20 | (2) |
|
|
22 | (2) |
|
2.3 Area, Latency, and Power Trade-offs |
|
|
24 | (4) |
|
2.3.1 Area versus Delay Trade-off |
|
|
24 | (2) |
|
2.3.2 Delay versus Power Trade-off |
|
|
26 | (1) |
|
2.3.3 Area versus Delay versus Power Trade-off |
|
|
27 | (1) |
|
|
28 | (3) |
|
2.4.1 Leveraging Moore's Law |
|
|
30 | (1) |
|
2.4.1.1 Reconfigurable Computing |
|
|
31 | (1) |
|
|
31 | (1) |
|
|
32 | (1) |
|
|
32 | (1) |
|
2.5.3 Instruction-Level Parallelism Wall |
|
|
32 | (1) |
|
|
32 | (5) |
|
Section I Genesis of Parallel Computing |
|
|
|
|
37 | (14) |
|
|
37 | (1) |
|
3.2 Aspects of Processor Performance |
|
|
38 | (4) |
|
3.2.1 Potential for Speedup |
|
|
38 | (2) |
|
|
40 | (1) |
|
3.2.3 Speedup versus Communication Overhead |
|
|
40 | (2) |
|
3.3 Enhancing Uniprocessor Performance |
|
|
42 | (8) |
|
3.3.1 Improving CPU Performance |
|
|
43 | (1) |
|
3.3.2 Increasing Processor Clock Frequency |
|
|
43 | (1) |
|
3.3.3 Parallelizing Arithmetic Logic Unit (ALU) Structure |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
46 | (1) |
|
3.3.6 Very Long Instruction Word (VLIW) Processors |
|
|
46 | (1) |
|
|
47 | (1) |
|
3.3.8 Instruction-Level Parallelism |
|
|
47 | (1) |
|
3.3.9 Multicore Architectures |
|
|
48 | (2) |
|
|
50 | (1) |
|
|
50 | (1) |
|
|
51 | (14) |
|
|
51 | (3) |
|
|
51 | (1) |
|
|
52 | (1) |
|
|
52 | (2) |
|
|
54 | (1) |
|
|
54 | (2) |
|
4.2.1 Personal Area Networks |
|
|
54 | (1) |
|
4.2.2 Local Area Networks |
|
|
55 | (1) |
|
4.2.3 Metropolitan Area Networks |
|
|
56 | (1) |
|
|
56 | (1) |
|
|
56 | (5) |
|
4.3.1 OSI Reference Model |
|
|
57 | (2) |
|
4.3.2 TCP/IP Reference Model |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
60 | (1) |
|
|
60 | (1) |
|
4.3.2.4 Application Layer |
|
|
61 | (1) |
|
4.4 Interconnection Networks |
|
|
61 | (3) |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
64 | (1) |
|
5 Distributed Systems Basics |
|
|
65 | (18) |
|
|
65 | (12) |
|
5.1.1 Distributed Computing |
|
|
66 | (2) |
|
5.1.1.1 System Architectural Styles |
|
|
68 | (1) |
|
5.1.1.2 Software Architectural Styles |
|
|
69 | (5) |
|
5.1.1.3 Technologies for Distributed Computing |
|
|
74 | (3) |
|
5.2 Distributed System Benefits |
|
|
77 | (1) |
|
5.3 Distributed Computation Systems |
|
|
78 | (1) |
|
|
79 | (4) |
|
Section II Road to Parallel Computing |
|
|
|
|
83 | (16) |
|
6.1 Flynn's Taxonomy for Parallel Computer Architectures |
|
|
83 | (3) |
|
6.2 Types of Parallel Computers |
|
|
86 | (6) |
|
6.2.1 Shared Memory Multiprocessor Systems |
|
|
86 | (2) |
|
6.2.2 Distributed Memory Multicomputers |
|
|
88 | (1) |
|
6.2.2.1 Interconnection Network (IN) |
|
|
88 | (4) |
|
6.3 Characteristics of Parallel Systems |
|
|
92 | (5) |
|
6.3.1 Coupling, Parallelism, Concurrency, and Granularity |
|
|
92 | (1) |
|
6.3.2 Shared Memory Systems versus Message-Passing Systems |
|
|
93 | (1) |
|
6.3.3 Distributed Communication |
|
|
94 | (1) |
|
6.3.3.1 Blocking/Non-blocking, Synchronous/Asynchronous Primitives |
|
|
94 | (2) |
|
6.3.3.2 Processor Synchrony |
|
|
96 | (1) |
|
6.3.4 Synchronous versus Asynchronous Executions |
|
|
96 | (1) |
|
|
97 | (2) |
|
7 Parallel Computing Models |
|
|
99 | (16) |
|
|
99 | (5) |
|
|
99 | (1) |
|
|
99 | (1) |
|
|
99 | (2) |
|
|
101 | (1) |
|
7.1.2.1 Bulk Synchronous Parallel (BSP) Model |
|
|
101 | (2) |
|
|
103 | (1) |
|
7.2 Interconnection Network Models |
|
|
104 | (7) |
|
|
105 | (1) |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
107 | (1) |
|
7.2.1.4 Cube-Connected Cycles |
|
|
108 | (1) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
109 | (1) |
|
|
110 | (1) |
|
|
111 | (2) |
|
|
113 | (2) |
|
|
115 | (16) |
|
8.1 Classes of Problems Solvable through Parallelization |
|
|
115 | (3) |
|
8.1.1 Parallelizable Tasks |
|
|
116 | (2) |
|
8.2 Types of Parallelization |
|
|
118 | (4) |
|
8.2.1 Functional Parallelization |
|
|
118 | (1) |
|
8.2.2 Data Parallelization |
|
|
119 | (1) |
|
8.2.3 Recursive Parallelization |
|
|
120 | (1) |
|
8.2.4 Exploratory Parallelization |
|
|
121 | (1) |
|
8.2.5 Speculative Parallelization |
|
|
121 | (1) |
|
8.3 Granularity of Parallelization |
|
|
122 | (1) |
|
8.4 Assigning Computational Tasks to Processors |
|
|
123 | (1) |
|
8.5 Illustrating Design of a Parallel Algorithm |
|
|
124 | (1) |
|
8.6 Parallel Algorithms for Conventional Computations |
|
|
125 | (3) |
|
8.6.1 Parallel Prefix and Suffix Computations on a Linked List |
|
|
125 | (3) |
|
8.7 Parallel Algorithms for Unconventional Computations |
|
|
128 | (1) |
|
|
128 | (3) |
|
Section III Parallel Computing Architectures |
|
|
|
9 Parallel Computing Architecture Basics |
|
|
131 | (10) |
|
9.1 High-Performance Distributed Computing |
|
|
131 | (1) |
|
9.2 Performance Evaluation |
|
|
132 | (3) |
|
|
133 | (2) |
|
9.3 Application and Architecture |
|
|
135 | (2) |
|
9.3.1 Multiprocessor Architectures |
|
|
136 | (1) |
|
9.4 Maximum Performance Computing Approach |
|
|
137 | (1) |
|
9.5 Parallel Computing Basics |
|
|
138 | (2) |
|
9.6 Parallel Computing Paradigms |
|
|
140 | (1) |
|
|
140 | (1) |
|
10 Shared Memory Architecture |
|
|
141 | (16) |
|
10.1 Shared Memory Paradigm |
|
|
141 | (2) |
|
10.1.1 Multicore Architecture |
|
|
141 | (1) |
|
10.1.2 Multi-Socket Multicore Architecture |
|
|
142 | (1) |
|
10.1.3 Asymmetric Multicore Architecture |
|
|
142 | (1) |
|
|
143 | (4) |
|
|
145 | (1) |
|
|
145 | (1) |
|
10.2.3 Mapping of Memory Blocks to Cache Blocks |
|
|
145 | (2) |
|
|
147 | (1) |
|
10.3.1 Write-Through Policy |
|
|
148 | (1) |
|
|
148 | (1) |
|
|
148 | (6) |
|
10.4.1 Snooping Protocols |
|
|
150 | (1) |
|
10.4.2 Directory-Based Protocols |
|
|
151 | (3) |
|
|
154 | (1) |
|
10.5.1 Sequential Consistency |
|
|
154 | (1) |
|
|
155 | (2) |
|
11 Message-Passing Architecture |
|
|
157 | (10) |
|
11.1 Message-Passing Paradigm |
|
|
157 | (1) |
|
11.1.1 Tightly Coupled Distributed Memory Architecture |
|
|
157 | (1) |
|
11.1.2 Loosely Coupled Distributed Memory Architecture |
|
|
157 | (1) |
|
|
158 | (3) |
|
11.2.1 Routing Algorithms for Broadcasting and Multicasting |
|
|
159 | (1) |
|
11.2.2 Deadlocks and Routing Algorithms |
|
|
160 | (1) |
|
|
161 | (5) |
|
|
163 | (1) |
|
|
163 | (1) |
|
11.3.2.1 Store-and-Forward Routing |
|
|
163 | (2) |
|
11.3.2.2 Cut-Through Routing |
|
|
165 | (1) |
|
|
166 | (1) |
|
12 Stream Processing Architecture |
|
|
167 | (14) |
|
|
167 | (4) |
|
12.2 Parallel Accelerators |
|
|
171 | (1) |
|
|
172 | (5) |
|
12.3.1 Stream Processor Architecture |
|
|
173 | (1) |
|
12.3.2 Execution Overview |
|
|
174 | (2) |
|
12.3.3 Locality and Bandwidth Hierarchy |
|
|
176 | (1) |
|
|
177 | (4) |
|
Section IV Parallel Computing Programming |
|
|
|
13 Parallel Computing Programming Basics |
|
|
181 | (32) |
|
13.1 Shared-Memory Programming |
|
|
181 | (5) |
|
|
182 | (1) |
|
|
182 | (1) |
|
|
183 | (1) |
|
13.1.2 Programming Languages |
|
|
184 | (1) |
|
|
184 | (2) |
|
13.2 Message-Passing Programming |
|
|
186 | (7) |
|
|
187 | (1) |
|
|
187 | (1) |
|
|
188 | (1) |
|
|
189 | (2) |
|
13.2.2 Programming Languages |
|
|
191 | (1) |
|
|
191 | (2) |
|
|
193 | (5) |
|
|
194 | (1) |
|
|
194 | (1) |
|
|
194 | (1) |
|
|
195 | (1) |
|
13.3.2 Programming Languages |
|
|
195 | (1) |
|
|
195 | (3) |
|
|
198 | (1) |
|
Appendix 13A Functional Programming |
|
|
199 | (3) |
|
13.A.1 Characteristics of Functional Programming |
|
|
200 | (1) |
|
13.A.2 Advantages of Functional Programming |
|
|
201 | (1) |
|
13.A.3 Disadvantages of Functional Programming |
|
|
202 | (1) |
|
Appendix 13.B Hadoop MapReduce |
|
|
202 | (11) |
|
13.B.1 MapReduce Processing |
|
|
204 | (1) |
|
|
204 | (1) |
|
|
205 | (1) |
|
13.B.2 MapReduce Enhancements and Extensions |
|
|
206 | (1) |
|
13.B.2.1 Supporting Iterative Processing |
|
|
206 | (4) |
|
|
210 | (1) |
|
|
211 | (2) |
|
14 Shared-memory Parallel Programming with OpenMP |
|
|
213 | (10) |
|
|
213 | (1) |
|
14.2 Overview of Features |
|
|
214 | (4) |
|
14.3 Additional Feature Details |
|
|
218 | (3) |
|
|
218 | (1) |
|
14.3.1.1 Parallel Region Construct |
|
|
218 | (1) |
|
14.3.1.2 Work-sharing Constructs |
|
|
218 | (1) |
|
14.3.1.3 Directive Clauses |
|
|
219 | (1) |
|
|
219 | (1) |
|
14.3.3 Runtime Library Routines |
|
|
220 | (1) |
|
14.3.3.1 Execution Environment Routines |
|
|
220 | (1) |
|
|
220 | (1) |
|
|
221 | (1) |
|
|
221 | (2) |
|
15 Message Passing Parallel Programming with MPI |
|
|
223 | (16) |
|
|
223 | (1) |
|
15.2 Basic Point-to-point Communication Routines |
|
|
224 | (1) |
|
15.3 Basic MPI Collective Communication Routines |
|
|
225 | (4) |
|
15.4 Environment Management Routines |
|
|
229 | (2) |
|
15.5 Point-to-point Communication Routines |
|
|
231 | (5) |
|
15.5.1 Blocking Message Passing Routines |
|
|
231 | (3) |
|
15.5.2 Non-blocking Message Passing Routines |
|
|
234 | (2) |
|
15.6 Collective Communication Routines |
|
|
236 | (2) |
|
|
236 | (1) |
|
|
236 | (2) |
|
|
238 | (1) |
|
16 Stream Processing Programming with CUDA, OpenCL, and OpenACC |
|
|
239 | (16) |
|
|
239 | (2) |
|
|
241 | (7) |
|
16.2.1 Synchronization functions |
|
|
247 | (1) |
|
|
248 | (3) |
|
|
249 | (1) |
|
|
250 | (1) |
|
16.3.3 Asynchronous Processing and Synchronization |
|
|
251 | (1) |
|
|
251 | (1) |
|
|
251 | (4) |
|
Section V Internet of Things Big Data Stream Processing |
|
|
|
17 Internet of Things (IoT) Technologies |
|
|
255 | (32) |
|
|
256 | (8) |
|
17.1.1 IoT Building Blocks |
|
|
258 | (2) |
|
|
260 | (1) |
|
17.1.3 Widened Address Space with IPv6 |
|
|
261 | (3) |
|
17.2 RFID (Radio Frequency Identification) |
|
|
264 | (1) |
|
|
265 | (11) |
|
|
265 | (4) |
|
|
269 | (4) |
|
|
273 | (2) |
|
17.3.3.1 WSN Characteristics |
|
|
275 | (1) |
|
|
276 | (1) |
|
Appendix 17.A Internet of Things (IoT) in 5G Mobile Technologies |
|
|
277 | (5) |
|
|
278 | (1) |
|
|
279 | (1) |
|
|
279 | (1) |
|
17.A.4 5G-enabling Technologies |
|
|
280 | (2) |
|
Appendix 17.B Edge Computing and Fog Computing |
|
|
282 | (5) |
|
17.B.1 Implementation of Edge Computing |
|
|
284 | (1) |
|
17.B.1.1 Data Reduction Techniques |
|
|
284 | (1) |
|
|
285 | (2) |
|
18 Sensor Data Processing |
|
|
287 | (26) |
|
18.1 Sensor Data-Gathering and Data-Dissemination Mechanisms |
|
|
287 | (6) |
|
18.1.1 Mechanisms Based on Storage Location |
|
|
288 | (1) |
|
18.1.1.1 Database with Geographic Information |
|
|
288 | (1) |
|
18.1.2 Classification of Data-Gathering Mechanisms Based on the Direction of Diffusion |
|
|
289 | (1) |
|
18.1.2.1 Directed Diffusion |
|
|
290 | (1) |
|
|
291 | (1) |
|
18.1.3 Mechanisms Based on the Structure of Dissemination |
|
|
291 | (2) |
|
|
293 | (1) |
|
|
294 | (3) |
|
18.4 Data-Fusion Mechanisms |
|
|
297 | (2) |
|
18.4.1 Classification of Data-fusion Mechanisms Based on Functions |
|
|
298 | (1) |
|
18.4.2 System Architectures of Data Fusion |
|
|
298 | (1) |
|
18.4.3 Trade-offs of Resources |
|
|
299 | (1) |
|
18.5 Data-fusion Techniques, Methods, and Algorithms |
|
|
299 | (2) |
|
|
300 | (1) |
|
|
300 | (1) |
|
18.6 Data-fusion Architectures and Models |
|
|
301 | (6) |
|
|
302 | (1) |
|
18.6.2 Activity-based Models |
|
|
303 | (2) |
|
|
305 | (2) |
|
|
307 | (1) |
|
Appendix 18.A Wireless Sensor Networks Anomalies |
|
|
307 | (6) |
|
18.A.1 Architectural Design Guidelines |
|
|
310 | (3) |
|
|
313 | (22) |
|
19.1 Introduction to Big Data |
|
|
314 | (6) |
|
|
314 | (1) |
|
|
314 | (3) |
|
|
317 | (1) |
|
|
317 | (1) |
|
|
318 | (1) |
|
19.1.2 Common Characteristics of Big Data Computing Systems |
|
|
319 | (1) |
|
19.2 Tools and Techniques of Big Data |
|
|
320 | (6) |
|
19.2.1 Processing Approach |
|
|
320 | (2) |
|
19.2.2 Big Data System Architecture |
|
|
322 | (1) |
|
19.2.2.1 BASE (Basically Available, Soft State, Eventual Consistency) |
|
|
322 | (1) |
|
19.2.2.2 Functional Decomposition |
|
|
323 | (1) |
|
19.2.2.3 Master-Slave Replication |
|
|
323 | (1) |
|
19.2.3 Row Partitioning or Sharding |
|
|
324 | (1) |
|
19.2.4 Row Versus Column-Oriented Data Layouts |
|
|
324 | (1) |
|
19.2.5 NoSQL Data Management |
|
|
325 | (1) |
|
|
326 | (3) |
|
19.3.1 Column-Oriented Stores or Databases |
|
|
327 | (1) |
|
19.3.2 Key-Value Stores (K-V Stores) or Databases |
|
|
327 | (1) |
|
19.3.3 Document-Oriented Databases |
|
|
328 | (1) |
|
19.3.4 Graph Stores or Databases |
|
|
329 | (1) |
|
|
329 | (3) |
|
|
332 | (1) |
|
Appendix 19.A Compute-intensive Big Compute versus Data-intensive Big Data Computing |
|
|
332 | (3) |
|
20 Big Data Stream Processing |
|
|
335 | (26) |
|
20.1 Big Data Stream Processing |
|
|
335 | (11) |
|
20.1.1 History of Data Stream Processing |
|
|
338 | (2) |
|
20.1.2 Data Stream Processing |
|
|
340 | (1) |
|
20.1.2.1 Data Stream Processing Systems |
|
|
340 | (4) |
|
|
344 | (1) |
|
|
344 | (1) |
|
20.1.2.4 Stream Processing Platforms/Engine |
|
|
345 | (1) |
|
20.2 Stream Processing System Implementations |
|
|
346 | (9) |
|
20.2.1 Academic or Research Prototype Systems |
|
|
346 | (5) |
|
20.2.2 Open-Source Systems |
|
|
351 | (1) |
|
20.2.3 Commercial Systems |
|
|
352 | (1) |
|
|
352 | (1) |
|
20.2.3.2 IBM InfoSphere Streams |
|
|
353 | (1) |
|
20.2.3.3 TIBCO BusinessEvents |
|
|
354 | (1) |
|
|
354 | (1) |
|
|
355 | (1) |
|
|
356 | (5) |
|
|
357 | (1) |
|
|
358 | (2) |
|
|
360 | (1) |
Epilogue: Quantum Computing |
|
361 | (4) |
References |
|
365 | (2) |
Index |
|
367 | |