Authors |
|
xv | |
Preface |
|
xvii | |
Acknowledgments |
|
xxi | |
|
I Software Bloat, Lost Throughput, and Wasted Joules |
|
|
1 | (64) |
|
|
3 | (26) |
|
1.1 Green Software for the Expanding Digital Universe: Designing with a Sense of Proportion |
|
|
4 | (2) |
|
1.2 The "Challenge Posed by Emerging Systems: Why Hardware Advancements Are Not Enough |
|
|
6 | (8) |
|
1.2.1 Runtime bloat in framework based software |
|
|
7 | (1) |
|
1.2.2 Software interaction with systems |
|
|
8 | (2) |
|
1.2.3 Impact of non-volatile memory and low-latency fabrics |
|
|
10 | (2) |
|
1.2.4 Large scale connected architectures from edge to cloud |
|
|
12 | (1) |
|
1.2.5 Emerging software models and data centricity |
|
|
13 | (1) |
|
1.3 The Heart of the Matter: Why a Plea for Lean Software Is Not Enough |
|
|
14 | (3) |
|
1.3.1 The flexibility, productivity, and efficiency trade-off |
|
|
14 | (1) |
|
1.3.2 Unsustainability of tightly coupled hardware-software abstractions |
|
|
15 | (1) |
|
1.3.3 Traditional performance optimization is not enough |
|
|
16 | (1) |
|
1.3.4 Difficulty in quantifying the opportunity and impact of software bloat reduction |
|
|
16 | (1) |
|
1.4 The Resource Proportional Software Design Principle |
|
|
17 | (4) |
|
1.4.1 How to assess the propensity for bloat in a software component? |
|
|
17 | (1) |
|
1.4.2 How to broaden supported features without a runtime overhead? |
|
|
18 | (1) |
|
1.4.3 Can software be designed to cope with changing trade-offs as technology evolves? |
|
|
19 | (1) |
|
1.4.4 Can we anticipate what proportion of data processed by application is truly useful? |
|
|
20 | (1) |
|
1.5 Dimensions of Resource Proportional Design |
|
|
21 | (5) |
|
1.5.1 Resource proportional code features |
|
|
22 | (1) |
|
1.5.2 Resource proportional response to technology and system evolution |
|
|
23 | (2) |
|
1.5.3 Resource proportional data processing |
|
|
25 | (1) |
|
1.6 Approach in This Book |
|
|
26 | (3) |
|
2 The Problem of Software Bloat |
|
|
29 | (20) |
|
2.1 The Notion of Software Bloat |
|
|
31 | (1) |
|
2.2 Software Bloat: Causes and Consequences |
|
|
32 | (5) |
|
2.2.1 Principal aspects of bloat |
|
|
32 | (1) |
|
2.2.2 Definitions of software runtime bloat relevant for this book |
|
|
33 | (1) |
|
2.2.3 Systemic practices in framework based development attributed as causes of bloat |
|
|
34 | (3) |
|
2.3 Bloat in Containerized Software |
|
|
37 | (1) |
|
2.3.1 Application containers vs. virtual machines |
|
|
37 | (1) |
|
2.3.2 Container image bloat |
|
|
37 | (1) |
|
2.3.3 Runtime bloat in serverless computing |
|
|
38 | (1) |
|
2.4 Different Forms of Software Runtime Bloat |
|
|
38 | (2) |
|
2.4.1 Runtime bloat categories in Java applications |
|
|
38 | (1) |
|
2.4.2 Relationship between the various runtime manifestations of bloat |
|
|
39 | (1) |
|
2.5 Progress in Bloat Characterization, Measurement, and Mitigation |
|
|
40 | (7) |
|
2.5.1 Modeling and measuring bloat |
|
|
40 | (4) |
|
2.5.2 Mitigating and avoiding bloat |
|
|
44 | (1) |
|
2.5.2.1 Semi-automated approaches |
|
|
44 | (1) |
|
2.5.2.2 Automated code optimization |
|
|
44 | (2) |
|
2.5.2.3 Can runtime bloat be avoided by construction? |
|
|
46 | (1) |
|
|
47 | (2) |
|
3 Does Lean Imply Green? How Bloat in Software Impacts System Power Performance |
|
|
49 | (16) |
|
3.1 The Effects of Java Runtime Bloat on System Resources |
|
|
51 | (2) |
|
3.1.1 Allocation wall effect |
|
|
51 | (1) |
|
3.1.2 Heap pressure effect |
|
|
52 | (1) |
|
3.1.3 Object construction computation overhead |
|
|
52 | (1) |
|
3.1.4 Influence of system configuration |
|
|
52 | (1) |
|
3.2 Insights from an Experimental Study |
|
|
53 | (5) |
|
3.2.1 Multi-platform experiments and results |
|
|
54 | (1) |
|
3.2.2 Single platform experiment variations: Cache pressure and power management |
|
|
55 | (2) |
|
3.2.3 Key experimental observations |
|
|
57 | (1) |
|
3.3 Analyzing the Interplay of Bloat, Energy Proportionality, and System Bottlenecks |
|
|
58 | (6) |
|
3.3.1 Power efficiency impact quantified using a simple abstract model |
|
|
58 | (2) |
|
3.3.2 Effect of degrees of energy proportionality |
|
|
60 | (1) |
|
3.3.3 System bottlenecks and bloat: A curious interaction |
|
|
60 | (1) |
|
3.3.3.1 Bloat at non-bottleneck resource |
|
|
61 | (1) |
|
3.3.3.2 Bloat at bottleneck resource |
|
|
61 | (1) |
|
3.3.3.3 Bloat reduction shifts bottleneck |
|
|
62 | (1) |
|
|
62 | (1) |
|
3.3.5 Model predictions seen in experimental observations |
|
|
62 | (2) |
|
|
64 | (1) |
|
II The Antidote: Resource Proportional Software Design |
|
|
65 | (90) |
|
4 Resource Proportional Software Design Principles to Reduce Propensity for Bloat |
|
|
67 | (18) |
|
4.1 Insights from Energy Proportional Hardware Design |
|
|
68 | (1) |
|
4.2 Resource Proportional Design of Software Features |
|
|
68 | (2) |
|
4.3 How Software Becomes Non-resource Proportional |
|
|
70 | (4) |
|
4.4 Defining Resource Proportionality with Respect to Feature Utilization to Predict Bloat Propensity |
|
|
74 | (7) |
|
4.4.1 Effect of using a generalized component in this scenario |
|
|
76 | (1) |
|
4.4.2 Weighted RP accounting for scenario distribution |
|
|
76 | (1) |
|
4.4.3 Effect of adding features not required for a scenario |
|
|
77 | (1) |
|
4.4.4 Bloat relative to actual resource consumed by a component |
|
|
78 | (1) |
|
4.4.5 Computing bloat propensity when Rspecialized is not directly available |
|
|
78 | (1) |
|
4.4.6 Trade-off between feature exploitation and provisioning overhead |
|
|
79 | (1) |
|
4.4.7 Resource proportionality characteristics |
|
|
80 | (1) |
|
4.4.7.1 Scenarios with montonically ordered feature spaces |
|
|
80 | (1) |
|
4.4.7.2 General scenarios (unordered feature spaces) |
|
|
80 | (1) |
|
4.5 Resource Proportional Optimization Control Points for Bloat Mitigation |
|
|
81 | (2) |
|
|
83 | (2) |
|
5 Resource Proportional Design Strategies I: What Component and Tool Developers Can Do |
|
|
85 | (36) |
|
5.1 Strategy 1: Minimize Interactions between Independently Usable Features without Sacrificing Efficient Reuse |
|
|
86 | (17) |
|
5.1.1 Development practice: Abstracting a minimal core of base features - Lessons from the Linux kernel |
|
|
87 | (2) |
|
5.1.2 A formal discipline for labeling feature interactions due to optional features: Insights from FOP (feature oriented programming) |
|
|
89 | (4) |
|
5.1.3 RPD analysis tool: Aid detection of structural interactions using Concern Augmented Program Analysis (CAPA) |
|
|
93 | (2) |
|
5.1.3.1 Computing microslices |
|
|
95 | (2) |
|
5.1.3.2 Computing the microslice interaction graph |
|
|
97 | (1) |
|
5.1.3.3 Computing the Concern Augmented microslice interaction graph |
|
|
97 | (3) |
|
5.1.3.4 Putting it together: The CAPA tool |
|
|
100 | (2) |
|
5.1.3.5 Example: Big endian to little endian conversion |
|
|
102 | (1) |
|
5.1.4 Interactions due to hidden features |
|
|
102 | (1) |
|
5.2 Strategy 2: Reduce Recurring Overheads due to Incidental Sources of Bloat |
|
|
103 | (3) |
|
5.2.1 Object reuse and result caching (amortize data construction overheads) |
|
|
103 | (2) |
|
5.2.2 Adaptive selection and replacement of collection data structures |
|
|
105 | (1) |
|
5.3 Strategy 3: Activate or Deactivate High Overhead Features On-demand |
|
|
106 | (2) |
|
5.3.1 Insights from AOP (aspect oriented programming) |
|
|
106 | (1) |
|
5.3.2 Practical considerations (80-20% rule vs. pure RPD) |
|
|
107 | (1) |
|
5.4 Strategy 4: Design Programming Constructs and Runtimes with Resource Proportionality Awareness |
|
|
108 | (9) |
|
5.4.1 Annotating code with line of sight into sources of overheads |
|
|
109 | (1) |
|
5.4.2 Alternate data and program representations |
|
|
109 | (1) |
|
5.4.2.1 Separating code bloat |
|
|
110 | (1) |
|
5.4.2.2 Separating data structure bloat |
|
|
110 | (2) |
|
5.4.2.3 New programming construct to represent associative pointers |
|
|
112 | (1) |
|
5.4.2.4 Research topic: Content addressable data layout for associative structures |
|
|
113 | (4) |
|
|
117 | (4) |
|
6 Resource Proportional Design Strategies II: Refactoring Existing Software for Improved Resource Proportionality |
|
|
121 | (10) |
|
6.1 Strategy 1: Whole System Impact Analysis to Identify Candidate Resources and Indicators of Bloat that Are Likely to Matter the Most |
|
|
122 | (3) |
|
6.1.1 Resource utilization, bottleneck analysis and power, performance models |
|
|
122 | (2) |
|
6.1.2 Measure indicators of bloat |
|
|
124 | (1) |
|
6.2 Strategy 2: Replacing Entire Components or Features with a Different Implementation |
|
|
125 | (1) |
|
6.2.1 Example: Serialization-Deserialization |
|
|
125 | (1) |
|
6.2.2 Example: Collection replacement |
|
|
126 | (1) |
|
6.2.3 Example: Trimming unused code (bloatware mitigation) |
|
|
126 | (1) |
|
6.3 Strategy 3: Reduce Recurring Overheads Due to Incidental Sources of Bloat |
|
|
126 | (1) |
|
6.3.1 Object reuse and memoization |
|
|
126 | (1) |
|
6.4 Strategy 4: Refactor Code to Minimize Structural Interactions |
|
|
127 | (1) |
|
6.4.1 Using optional feature indicators |
|
|
127 | (1) |
|
6.4.2 Using concern analysis tools |
|
|
128 | (1) |
|
|
128 | (3) |
|
7 Implications of a Resource Proportional Design |
|
|
131 | (24) |
|
|
131 | (10) |
|
7.1.1 Resource proportionality and high level design: Internal vs. external brokering |
|
|
133 | (1) |
|
7.1.2 A simple model for systems resource proportionality |
|
|
134 | (1) |
|
7.1.2.1 A simple regression based model |
|
|
135 | (1) |
|
7.1.3 Steering a system towards RP |
|
|
136 | (2) |
|
7.1.4 Difficulties in realizing k-RPD |
|
|
138 | (2) |
|
7.1.4.1 Searching a large RP design space |
|
|
140 | (1) |
|
|
141 | (1) |
|
|
141 | (4) |
|
7.2.1 Impact on "optimal" RPDs |
|
|
141 | (1) |
|
7.2.1.1 Impact on "microservices"-based RPDs |
|
|
142 | (1) |
|
7.2.2 RPD in the context of rapid change |
|
|
143 | (2) |
|
7.3 RPD and Security: Resource Usage as a Side Channel |
|
|
145 | (5) |
|
7.3.1 RP remediation/countermeasures |
|
|
148 | (2) |
|
7.4 RPD and Other Systemwide Concerns |
|
|
150 | (4) |
|
|
150 | (2) |
|
|
152 | (2) |
|
|
154 | (1) |
|
III Responding to Emerging Technologies: Designing Resource Proportional Systems |
|
|
155 | (116) |
|
8 Resource Proportional Programming for Persistent Memory |
|
|
157 | (34) |
|
8.1 Characteristics of Emerging Persistent Memory Technologies |
|
|
157 | (3) |
|
8.2 System Implications of PM Technology |
|
|
160 | (14) |
|
8.2.1 Data flow implications |
|
|
160 | (1) |
|
|
161 | (1) |
|
8.2.1.2 Flushing and fencing |
|
|
162 | (1) |
|
8.2.1.3 Data recoverability |
|
|
163 | (2) |
|
8.2.2 Process flow implications |
|
|
165 | (1) |
|
8.2.2.1 Context switch elimination |
|
|
165 | (2) |
|
8.2.2.2 Perturbation of CPU utilization |
|
|
167 | (2) |
|
8.2.3 Code reuse implications |
|
|
169 | (3) |
|
8.2.3.1 Data access granularity |
|
|
172 | (1) |
|
|
173 | (1) |
|
8.3 Storage Stack Bloat in the Dual Stack Scenario |
|
|
174 | (8) |
|
8.3.1 Analytic model of the dual stack scenario |
|
|
176 | (2) |
|
|
178 | (1) |
|
|
178 | (1) |
|
|
179 | (1) |
|
|
179 | (1) |
|
8.3.1.5 Dual stack RP baseline |
|
|
180 | (2) |
|
8.4 Multi-Layer Storage Stack Bloat Related to Atomicity |
|
|
182 | (2) |
|
8.5 Resource Proportional High Availability |
|
|
184 | (2) |
|
8.6 HA and Atomicity Function Deployment Scenarios |
|
|
186 | (2) |
|
8.7 Keeping up with the Evolution of Persistent Memory |
|
|
188 | (3) |
|
9 Resource Proportionality in Memory Interconnects |
|
|
191 | (24) |
|
9.1 Characteristics of Memory Interconnects |
|
|
192 | (4) |
|
9.2 Resource Proportionality and the Separation of Media Controllers from Memory Controllers |
|
|
196 | (5) |
|
9.2.1 Cost of asymmetric R/W latency with asynchronous MI |
|
|
197 | (4) |
|
9.3 Efficiency Model of Memory Fabric |
|
|
201 | (3) |
|
9.4 Resource Proportional Capacity Scaling |
|
|
204 | (2) |
|
9.5 PM Related Functionality Placement |
|
|
206 | (5) |
|
9.5.1 PM related functions |
|
|
206 | (1) |
|
9.5.1.1 Multi-phase write |
|
|
206 | (1) |
|
9.5.1.2 Atomic rewrite in place |
|
|
207 | (1) |
|
9.5.1.3 Memory interleave |
|
|
207 | (1) |
|
|
207 | (1) |
|
9.5.2 Functionality placement given split vs. monolithic memory controllers |
|
|
207 | (4) |
|
9.6 Resource Proportionality and Memory Centric System Architecture |
|
|
211 | (4) |
|
10 Applying Resource Proportional Design Principles to a Deeply Layered or Complex Software Stack |
|
|
215 | (34) |
|
|
215 | (21) |
|
10.1.1 Simple examples of RPD in systems design |
|
|
217 | (1) |
|
|
218 | (3) |
|
10.1.1.2 Resource rate mismatch costs |
|
|
221 | (1) |
|
10.1.2 Copy elimination in RDMA based storage stacks |
|
|
222 | (1) |
|
10.1.3 High level RP systems design |
|
|
223 | (3) |
|
10.1.3.1 Proportional design at different levels of the stack |
|
|
226 | (3) |
|
10.1.4 RPD by mixing analog and digital components then and now |
|
|
229 | (2) |
|
10.1.5 Blockchains and the proportional heuristic |
|
|
231 | (2) |
|
|
233 | (1) |
|
|
233 | (3) |
|
10.2 Some High-level Recurring Patterns in RPD |
|
|
236 | (4) |
|
|
236 | (1) |
|
10.2.2 Crosslayer optimization |
|
|
237 | (1) |
|
10.2.3 Memoization, checkpointing, and lazy/deferred designs |
|
|
238 | (1) |
|
10.2.4 Managing configuration space |
|
|
238 | (1) |
|
10.2.5 Using virtualization |
|
|
238 | (1) |
|
10.2.6 Applying the 80% 20% rule where possible |
|
|
238 | (1) |
|
10.2.7 Some theoretical insights useful for RPD |
|
|
239 | (1) |
|
10.3 General Design Principles and Observations for RPD |
|
|
240 | (5) |
|
10.4 What Is Feasible Theoretically? |
|
|
245 | (3) |
|
|
248 | (1) |
|
11 Data Centric Resource Proportional System Design |
|
|
249 | (22) |
|
11.1 Characteristics of Data Centric Workloads |
|
|
250 | (2) |
|
11.1.1 Data intensive rather than compute intensive |
|
|
250 | (1) |
|
11.1.2 Analytics oriented: Emphasis on insight derivation rather than data serving |
|
|
251 | (1) |
|
11.2 Data Centric Frameworks and Stack Evolution |
|
|
252 | (1) |
|
11.3 Sources and Impact of Non-resource Proportionality |
|
|
253 | (4) |
|
11.3.1 Characteristic resource cost amplifiers in data centric applications |
|
|
253 | (1) |
|
11.3.1.1 Data movement expense |
|
|
254 | (1) |
|
11.3.1.2 Fault tolerance, check-pointing, and lineage |
|
|
254 | (1) |
|
11.3.2 Characteristic sources of overheads in data centric applications |
|
|
255 | (1) |
|
11.3.2.1 Impedance mismatch between workload locality patterns and page locality of the system |
|
|
255 | (1) |
|
11.3.2.2 Computation on data that does not produce additional insight |
|
|
255 | (2) |
|
11.3.2.3 Unnecessary synchronization |
|
|
257 | (1) |
|
11.4 Graph Analytics Case Study |
|
|
257 | (5) |
|
11.4.1 Reducing the size of the input graph data processed |
|
|
259 | (1) |
|
11.4.2 Enhancing the proportion of relevant data paged in by the system |
|
|
260 | (2) |
|
11.5 Data Mining Case Study (Map-reduce/Spark) |
|
|
262 | (4) |
|
11.5.1 RPD aware storage systems for insight-centric applications |
|
|
263 | (1) |
|
11.5.1.1 Cross-layer insight reuse |
|
|
264 | (1) |
|
11.5.1.2 Semantic similarity detection |
|
|
265 | (1) |
|
11.5.1.3 Approximation reuse |
|
|
265 | (1) |
|
11.5.1.4 Proactive data gradation |
|
|
266 | (1) |
|
11.6 Streaming IoT Analytics Case Study |
|
|
266 | (3) |
|
|
269 | (2) |
|
|
271 | (52) |
|
12 Adapting the Systems Software Stack to a Radically Non-Uniform Memory System |
|
|
273 | (20) |
|
12.1 Resource Proportionality of Memory Resource Allocation |
|
|
274 | (6) |
|
12.1.1 The evolution of memory pooling |
|
|
275 | (3) |
|
12.1.2 Allocation system model |
|
|
278 | (2) |
|
12.2 Comparison of Guided vs. Automated Allocation Policies |
|
|
280 | (7) |
|
12.2.1 Class of Service driven allocation |
|
|
282 | (3) |
|
12.2.2 Automated caching and phase change |
|
|
285 | (2) |
|
12.3 Resource Proportionality of Waiting, Polling, and Context Switching |
|
|
287 | (3) |
|
12.4 Managing Non-uniformity Using Work Flow Scheduling and Classes of Service |
|
|
290 | (3) |
|
13 Bridging the Gap from What We Know Today to Open Challenges and Research Topics |
|
|
293 | (20) |
|
|
293 | (1) |
|
13.2 Approximate Computing |
|
|
294 | (4) |
|
13.3 Stateless Designs Revisited, or Reducing "State Spill" |
|
|
298 | (9) |
|
13.3.1 "Spill free designs" and failures |
|
|
301 | (2) |
|
13.3.2 "Spill free designs" and "serverless" computing/microservices |
|
|
303 | (1) |
|
13.3.3 Pointer management |
|
|
303 | (1) |
|
13.3.3.1 Memory management, pointer management, and read-copy-update |
|
|
304 | (2) |
|
|
306 | (1) |
|
13.3.3.3 Systems design vs. ML-based methods |
|
|
306 | (1) |
|
|
307 | (3) |
|
|
309 | (1) |
|
13.4.2 Homomorphic computation |
|
|
309 | (1) |
|
13.4.3 Intermittent computation |
|
|
310 | (1) |
|
13.5 When Is Resource Proportionality Not Applicable? |
|
|
310 | (2) |
|
|
312 | (1) |
|
|
313 | (10) |
|
14.1 Applicability to Large Scale Solution Stacks, Rising Complexity, and Software Evolution |
|
|
314 | (3) |
|
14.2 Resilience to Future Changes as Hardware Technology Advancements Are Raising the Bar |
|
|
317 | (1) |
|
14.3 RP Impact on Productivity and Other Metrics |
|
|
318 | (2) |
|
14.4 Future Systems Design |
|
|
320 | (3) |
Glossary |
|
323 | (14) |
Bibliography |
|
337 | (40) |
Index |
|
377 | |