Preface |
|
xiii | |
List of Contributors |
|
xv | |
List of Figures |
|
xvii | |
List of Tables |
|
xxi | |
List of Abbreviations |
|
xxiii | |
1 Introduction |
|
1 | (14) |
|
|
|
|
|
|
|
|
|
1 | (5) |
|
1.1.1 The Convergence of High-performance and Embedded Computing Domains |
|
|
3 | (2) |
|
1.1.2 Parallelization Challenge |
|
|
5 | (1) |
|
1.2 The P-SOCRATES Project |
|
|
6 | (2) |
|
1.3 Challenges Addressed in This Book |
|
|
8 | (2) |
|
1.3.1 Compiler Analysis of Parallel Programs |
|
|
8 | (1) |
|
1.3.2 Predictable Scheduling of Parallel Tasks on Many-core Systems |
|
|
9 | (1) |
|
1.3.3 Methodology for Measurement-based Timing Analysis |
|
|
9 | (1) |
|
1.3.4 Optimized OpenMP Tasking Runtime System |
|
|
9 | (1) |
|
1.3.5 Real-time Operating Systems |
|
|
10 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (3) |
2 Manycore Platforms |
|
15 | (18) |
|
|
|
|
|
15 | (2) |
|
2.2 Manycore Architectures |
|
|
17 | (13) |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
19 | (2) |
|
|
21 | (1) |
|
2.2.5 STMicroelectronics STHORM |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
25 | (8) |
|
2.2.8.1 The I/O subsystem |
|
|
26 | (1) |
|
2.2.8.2 The Network-on-Chip (NoC) |
|
|
26 | (2) |
|
2.2.8.3 The Host-to-IOS communication protocol |
|
|
28 | (1) |
|
2.2.8.4 Internal architecture of the compute clusters |
|
|
28 | (1) |
|
2.2.8.5 The shared memory |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
31 | (2) |
3 Predictable Parallel Programming with OpenMP |
|
33 | (30) |
|
|
|
|
|
|
33 | (4) |
|
3.1.1 Introduction to Parallel Programming Models |
|
|
34 | (3) |
|
|
35 | (1) |
|
|
35 | (1) |
|
|
36 | (1) |
|
3.1.1.4 Intel® CiIk™ Plus |
|
|
36 | (1) |
|
|
36 | (1) |
|
|
37 | (1) |
|
3.2 The OpenMP Parallel Programming Model |
|
|
37 | (6) |
|
3.2.1 Introduction and Evolution of OpenMP |
|
|
37 | (2) |
|
3.2.2 Parallel Model of OpenMP |
|
|
39 | (3) |
|
|
39 | (1) |
|
3.2.2.2 Acceleration model |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (1) |
|
3.3 Timing Properties of OpenMP Tasking Model |
|
|
43 | (8) |
|
3.3.1 Sporadic DAG Scheduling Model of Parallel Applications |
|
|
43 | (1) |
|
3.3.2 Understanding the OpenMP Tasking Model |
|
|
44 | (2) |
|
3.3.3 OpenMP and Timing Predictability |
|
|
46 | (5) |
|
3.3.3.1 Extracting the DAG of an OpenMP program |
|
|
47 | (1) |
|
3.3.3.2 WCET analysis is applied to tasks and tasks parts |
|
|
48 | (1) |
|
3.3.3.3 DAG-based scheduling must not violate the TSCs |
|
|
49 | (2) |
|
3.4 Extracting the Timing Information of an OpenMP Program |
|
|
51 | (7) |
|
3.4.1 Parallel Structure Stage |
|
|
52 | (2) |
|
3.4.1.1 Parallel control flow analysis |
|
|
53 | (1) |
|
3.4.1.2 Induction variables analysis |
|
|
53 | (1) |
|
3.4.1.3 Reaching definitions and range analysis |
|
|
53 | (1) |
|
3.4.1.4 Putting all together: The wave-front example |
|
|
53 | (1) |
|
3.4.2 Task Expansion Stage |
|
|
54 | (4) |
|
3.4.2.1 Control flow expansion and synchronization predicate resolution |
|
|
54 | (2) |
|
3.4.2.2 tid: A unique task instance identifier |
|
|
56 | (1) |
|
3.4.2.3 Missing information when deriving the DAG |
|
|
57 | (1) |
|
3.4.3 Compiler Complexity |
|
|
58 | (1) |
|
|
58 | (1) |
|
|
59 | (4) |
4 Mapping, Scheduling, and Schedulability Analysis |
|
63 | (50) |
|
|
|
|
|
|
|
63 | (1) |
|
|
64 | (2) |
|
4.3 Partitioned Scheduler |
|
|
66 | (4) |
|
4.3.1 The Optimality of EDF on Preemptive Uniprocessors |
|
|
66 | (1) |
|
4.3.2 FP-scheduling Algorithms |
|
|
67 | (1) |
|
4.3.3 Limited Preemption Scheduling |
|
|
68 | (1) |
|
4.3.4 Limited Preemption Schedulability Analysis |
|
|
69 | (1) |
|
4.4 Global Scheduler with Migration Support |
|
|
70 | (5) |
|
4.4.1 Migration-based Scheduler |
|
|
70 | (2) |
|
4.4.2 Putting All Together |
|
|
72 | (1) |
|
4.4.3 Implementation of a Limited Preemption Scheduler |
|
|
73 | (2) |
|
4.5 Overall Schedulability Analysis |
|
|
75 | (12) |
|
4.5.1 Model Formalization |
|
|
75 | (3) |
|
4.5.2 Critical Interference of cp-tasks |
|
|
78 | (2) |
|
4.5.3 Response Time Analysis |
|
|
80 | (6) |
|
4.5.3.1 Inter-task interference |
|
|
80 | (2) |
|
4.5.3.2 Intra-task interference |
|
|
82 | (2) |
|
4.5.3.3 Computation of cp-task parameters |
|
|
84 | (2) |
|
4.5.4 Non-conditional DAG Tasks |
|
|
86 | (1) |
|
4.5.5 Series-Parallel Conditional DAG Tasks |
|
|
86 | (1) |
|
4.5.6 Schedulability Condition |
|
|
86 | (1) |
|
4.6 Specializing Analysis for Limited Pre-emption Global/Dynamic Approach |
|
|
87 | (9) |
|
4.6.1 Blocking Impact of the Largest NPRs (LP-max) |
|
|
88 | (1) |
|
4.6.2 Blocking Impact of the Largest Parallel NPRs (LP-ILP) |
|
|
88 | (4) |
|
4.6.2.1 LP worst-case workload of a task executing on c cores |
|
|
89 | (1) |
|
4.6.2.2 Overall LP worst-case workload |
|
|
90 | (1) |
|
4.6.2.3 Lower-priority interference |
|
|
91 | (1) |
|
4.6.3 Computation of Response Time Factors of LP-ILP |
|
|
92 | (3) |
|
4.6.3.1 Worst-case workload of Ti executing on c cores: µi[ c] |
|
|
92 | (2) |
|
4.6.3.2 Overall LP worst-case workload of lp(k) per execution scenario s1: pk[ sl] |
|
|
94 | (1) |
|
|
95 | (1) |
|
4.7 Specializing Analysis for the Partitioned/Static Approach |
|
|
96 | (11) |
|
|
96 | (4) |
|
|
97 | (2) |
|
|
99 | (1) |
|
|
100 | (1) |
|
4.7.2 Heuristic Approaches |
|
|
100 | (3) |
|
|
101 | (2) |
|
|
103 | (1) |
|
4.7.3 Integrating Interference from Additional RT Tasks |
|
|
103 | (1) |
|
|
104 | (1) |
|
4.7.5 Response-time Upper Bound |
|
|
105 | (2) |
|
4.8 Scheduling for I/O Cores |
|
|
107 | (1) |
|
|
107 | (2) |
|
|
109 | (4) |
5 Timing Analysis Methodology |
|
113 | (32) |
|
|
|
|
|
113 | (8) |
|
5.1.1 Static WCET Analysis Techniques |
|
|
115 | (3) |
|
5.1.2 Measurement-based WCET Analysis Techniques |
|
|
118 | (1) |
|
5.1.3 Hybrid WCET Techniques |
|
|
119 | (1) |
|
5.1.4 Measurement-based Probabilistic Techniques |
|
|
120 | (1) |
|
5.2 Our Choice of Methodology for WCET Estimation |
|
|
121 | (6) |
|
5.2.1 Why Not Use Static Approaches? |
|
|
122 | (2) |
|
5.2.2 Why Use Measurement-based Techniques? |
|
|
124 | (3) |
|
5.3 Description of Our Timing Analysis Methodology |
|
|
127 | (16) |
|
5.3.1 Intrinsic vs. Extrinsic Execution Times |
|
|
127 | (1) |
|
5.3.2 The Concept of Safety Margins |
|
|
128 | (2) |
|
5.3.3 Our Proposed Timing Methodology at a Glance |
|
|
130 | (1) |
|
5.3.4 Overview of the Application Structure |
|
|
131 | (2) |
|
5.3.5 Automatic Insertion and Removal of the Trace-points |
|
|
133 | (3) |
|
5.3.5.1 How to insert the trace-points |
|
|
133 | (2) |
|
5.3.5.2 How to remove the trace-points |
|
|
135 | (1) |
|
5.3.6 Extract the Intrinsic Execution Time: The Isolation Mode |
|
|
136 | (1) |
|
5.3.7 Extract the Extrinsic Execution Time: The Contention Mode |
|
|
137 | (4) |
|
5.3.8 Extract the Execution Time in Real Situation: The Deployment Mode |
|
|
141 | (1) |
|
5.3.9 Derive WCET Estimates |
|
|
141 | (2) |
|
|
143 | (1) |
|
|
143 | (2) |
6 OpenMP Runtime |
|
145 | (28) |
|
|
|
|
|
145 | (1) |
|
6.2 Offloading Library Design |
|
|
146 | (2) |
|
|
148 | (10) |
|
6.3.1 Task Dependency Management |
|
|
155 | (3) |
|
|
158 | (13) |
|
|
159 | (1) |
|
|
160 | (7) |
|
6.4.2.1 Applications with a linear generation pattern |
|
|
160 | (2) |
|
6.4.2.2 Applications with a recursive generation pattern |
|
|
162 | (1) |
|
6.4.2.3 Applications with mixed patterns |
|
|
163 | (2) |
|
6.4.2.4 Impact of cutoff on LINEAR and RECURSIVE applications |
|
|
165 | (1) |
|
6.4.2.5 Real applications |
|
|
166 | (1) |
|
6.4.3 Evaluation of the Task Dependency Mechanism |
|
|
167 | (8) |
|
6.4.3.1 Performance speedup and memory usage |
|
|
168 | (2) |
|
6.4.3.2 The task dependency mechanism on the MPPA |
|
|
170 | (1) |
|
|
171 | (1) |
|
|
171 | (2) |
7 Embedded Operating Systems |
|
173 | (30) |
|
|
|
|
|
|
|
|
173 | (2) |
|
|
175 | (12) |
|
7.2.1 Real-time Support in Linux |
|
|
175 | (5) |
|
7.2.1.1 Hard real-time support |
|
|
176 | (2) |
|
7.2.1.2 Latency reduction |
|
|
178 | (2) |
|
7.2.1.3 Real-time CPU scheduling |
|
|
180 | (1) |
|
7.2.2 Survey of Existing Embedded RTOSs |
|
|
180 | (6) |
|
7.2.3 Classification of Embedded RTOSs |
|
|
186 | (1) |
|
7.3 Requirements for The Choice of The Run Time System |
|
|
187 | (3) |
|
|
187 | (1) |
|
|
187 | (1) |
|
|
188 | (1) |
|
7.3.4 Scheduling Characteristics |
|
|
188 | (1) |
|
|
188 | (2) |
|
|
190 | (1) |
|
|
190 | (1) |
|
|
190 | (1) |
|
7.5 Operating System Support |
|
|
191 | (9) |
|
|
191 | (1) |
|
7.5.2 ERIKA Enterprise Support |
|
|
191 | (9) |
|
7.5.2.1 Exokernel support |
|
|
191 | (1) |
|
7.5.2.2 Single-ELF multicore ERIKA Enterprise |
|
|
192 | (1) |
|
7.5.2.3 Support for limited preemption, job, and global scheduling |
|
|
192 | (1) |
|
7.5.2.4 New ERIKA Enterprise primitives |
|
|
193 | (1) |
|
7.5.2.5 New data structures |
|
|
194 | (2) |
|
7.5.2.6 Dynamic task creation |
|
|
196 | (1) |
|
7.5.2.7 IRQ handlers as tasks |
|
|
196 | (1) |
|
|
197 | (1) |
|
7.5.2.9 Early performance estimation |
|
|
197 | (3) |
|
|
200 | (1) |
|
|
200 | (3) |
Index |
|
203 | (2) |
About the Editors |
|
205 | |