|
|
xv | |
|
|
xxi | |
|
|
1 | (6) |
|
|
1 | (2) |
|
|
3 | (1) |
|
|
3 | (4) |
|
|
7 | (58) |
|
2 Application Service Quality |
|
|
9 | (20) |
|
2.1 Simple Application Model |
|
|
9 | (2) |
|
|
11 | (1) |
|
2.3 Key Quality and Performance Indicators |
|
|
12 | (3) |
|
2.4 Key Application Characteristics |
|
|
15 | (2) |
|
2.4.1 Service Criticality |
|
|
15 | (1) |
|
2.4.2 Application Interactivity |
|
|
16 | (1) |
|
2.4.3 Tolerance to Network Traffic Impairments |
|
|
17 | (1) |
|
2.5 Application Service Quality Metrics |
|
|
17 | (10) |
|
2.5.1 Service Availability |
|
|
18 | (1) |
|
|
19 | (5) |
|
2.5.3 Service Reliability |
|
|
24 | (1) |
|
2.5.4 Service Accessibility |
|
|
25 | (1) |
|
2.5.5 Service Retainability |
|
|
25 | (1) |
|
|
25 | (1) |
|
2.5.7 Service Timestamp Accuracy |
|
|
26 | (1) |
|
2.5.8 Application-Specific Service Quality Measurements |
|
|
26 | (1) |
|
2.6 Technical Service versus Support Service |
|
|
27 | (1) |
|
2.6.1 Technical Service Quality |
|
|
27 | (1) |
|
2.6.2 Support Service Quality |
|
|
27 | (1) |
|
2.7 Security Considerations |
|
|
28 | (1) |
|
|
29 | (20) |
|
3.1 Roles in Cloud Computing |
|
|
30 | (1) |
|
|
30 | (1) |
|
3.3 Cloud Essential Characteristics |
|
|
31 | (2) |
|
3.3.1 On-Demand Self-Service |
|
|
31 | (1) |
|
3.3.2 Broad Network Access |
|
|
31 | (1) |
|
|
32 | (1) |
|
|
32 | (1) |
|
|
33 | (1) |
|
3.4 Simplified Cloud Architecture |
|
|
33 | (3) |
|
3.4.1 Application Software |
|
|
34 | (1) |
|
3.4.2 Virtual Machine Servers |
|
|
35 | (1) |
|
3.4.3 Virtual Machine Server Controllers |
|
|
35 | (1) |
|
3.4.4 Cloud Operations Support Systems |
|
|
36 | (1) |
|
3.4.5 Cloud Technology Components Offered "as-a-Service" |
|
|
36 | (1) |
|
3.5 Elasticity Measurements |
|
|
36 | (8) |
|
|
37 | (1) |
|
3.5.2 Provisioning Interval |
|
|
37 | (2) |
|
|
39 | (1) |
|
|
40 | (1) |
|
3.5.5 Scaling Up and Down |
|
|
41 | (1) |
|
|
42 | (1) |
|
3.5.7 Slew Rate and Linearity |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
44 | (1) |
|
|
45 | (4) |
|
4 Virtualized Infrastructure Impairments |
|
|
49 | (16) |
|
4.1 Service Latency, Virtualization, and the Cloud |
|
|
50 | (4) |
|
4.1.1 Virtualization and Cloud Causes of Latency Variation |
|
|
51 | (1) |
|
4.1.2 Virtualization Overhead |
|
|
52 | (1) |
|
4.1.3 Increased Variability of Infrastructure Performance |
|
|
53 | (1) |
|
|
54 | (1) |
|
4.3 Nondelivery of Configured VM Capacity |
|
|
54 | (3) |
|
4.4 Delivery of Degraded VM Capacity |
|
|
57 | (2) |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
61 | (1) |
|
4.8 Failed or Slow Allocation and Startup of VM Instance |
|
|
62 | (1) |
|
4.9 Outlook for Virtualized Infrastructure Impairments |
|
|
63 | (2) |
|
|
65 | (126) |
|
5 Application Redundancy and Cloud Computing |
|
|
67 | (30) |
|
5.1 Failures, Availability, and Simplex Architectures |
|
|
68 | (2) |
|
5.2 Improving Software Repair Times via Virtualization |
|
|
70 | (2) |
|
5.3 Improving Infrastructure Repair Times via Virtualization |
|
|
72 | (3) |
|
5.3.1 Understanding Hardware Repair |
|
|
72 | (1) |
|
5.3.2 VM Repair-as-a-Service |
|
|
72 | (2) |
|
|
74 | (1) |
|
5.4 Redundancy and Recoverability |
|
|
75 | (5) |
|
5.4.1 Improving Recovery Times via Virtualization |
|
|
79 | (1) |
|
5.5 Sequential Redundancy and Concurrent Redundancy |
|
|
80 | (4) |
|
5.5.1 Hybrid Concurrent Strategy |
|
|
83 | (1) |
|
5.6 Application Service Impact of Virtualization Impairments |
|
|
84 | (6) |
|
5.6.1 Service Impact for Simplex Architectures |
|
|
85 | (1) |
|
5.6.2 Service Impact for Sequential Redundancy Architectures |
|
|
85 | (2) |
|
5.6.3 Service Impact for Concurrent Redundancy Architectures |
|
|
87 | (1) |
|
5.6.4 Service Impact for Hybrid Concurrent Architectures |
|
|
88 | (2) |
|
|
90 | (2) |
|
5.7.1 Data Storage Strategies |
|
|
90 | (1) |
|
5.7.2 Data Consistency Strategies |
|
|
91 | (1) |
|
5.7.3 Data Architecture Considerations |
|
|
92 | (1) |
|
|
92 | (5) |
|
5.8.1 Service Quality Impact |
|
|
93 | (1) |
|
5.8.2 Concurrency Control |
|
|
93 | (1) |
|
|
94 | (1) |
|
|
94 | (1) |
|
5.8.5 Other Considerations |
|
|
95 | (2) |
|
6 Load Distribution and Balancing |
|
|
97 | (14) |
|
6.1 Load Distribution Mechanisms |
|
|
97 | (2) |
|
6.2 Load Distribution Strategies |
|
|
99 | (1) |
|
|
99 | (2) |
|
6.4 Nonproxy Load Distribution |
|
|
101 | (1) |
|
6.5 Hierarchy of Load Distribution |
|
|
102 | (1) |
|
6.6 Cloud-Based Load Balancing Challenges |
|
|
103 | (1) |
|
6.7 The Role of Load Balancing in Support of Redundancy |
|
|
103 | (1) |
|
6.8 Load Balancing and Availability Zones |
|
|
104 | (1) |
|
6.9 Workload Service Measurements |
|
|
104 | (1) |
|
6.10 Operational Considerations |
|
|
105 | (2) |
|
6.10.1 Load Balancing and Elasticity |
|
|
105 | (1) |
|
6.10.2 Load Balancing and Overload |
|
|
106 | (1) |
|
6.10.3 Load Balancing and Release Management |
|
|
107 | (1) |
|
6.11 Load Balancing and Application Service Quality |
|
|
107 | (4) |
|
6.11.1 Service Availability |
|
|
107 | (1) |
|
|
108 | (1) |
|
6.11.3 Service Reliability |
|
|
108 | (1) |
|
6.11.4 Service Accessibility |
|
|
109 | (1) |
|
6.11.5 Service Retainability |
|
|
109 | (1) |
|
6.11.6 Service Throughput |
|
|
109 | (1) |
|
6.11.7 Service Timestamp Accuracy |
|
|
109 | (2) |
|
|
111 | (16) |
|
|
111 | (5) |
|
|
112 | (1) |
|
7.1.2 Failure Containment and Recovery |
|
|
112 | (2) |
|
7.1.3 Failure Containment and Virtualization |
|
|
114 | (2) |
|
|
116 | (6) |
|
7.2.1 Single Points of Failure |
|
|
116 | (1) |
|
7.2.2 Single Points of Failure and Virtualization |
|
|
117 | (2) |
|
7.2.3 Affinity and Anti-affinity Considerations |
|
|
119 | (1) |
|
7.2.4 No SPOF Assurance in Cloud Computing |
|
|
120 | (1) |
|
7.2.5 No SPOF and Application Data |
|
|
121 | (1) |
|
7.3 Extreme Solution Coresidency |
|
|
122 | (2) |
|
7.3.1 Extreme Solution Coresidency Risks |
|
|
123 | (1) |
|
7.4 Multitenancy and Solution Containers |
|
|
124 | (3) |
|
|
127 | (18) |
|
|
128 | (1) |
|
8.2 Traditional Capacity Management |
|
|
129 | (1) |
|
8.3 Traditional Overload Control |
|
|
129 | (2) |
|
8.4 Capacity Management and Virtualization |
|
|
131 | (2) |
|
8.5 Capacity Management in Cloud |
|
|
133 | (2) |
|
8.6 Storage Elasticity Considerations |
|
|
135 | (1) |
|
8.7 Elasticity and Overload |
|
|
136 | (1) |
|
8.8 Operational Considerations |
|
|
137 | (1) |
|
|
138 | (2) |
|
8.10 General Elasticity Risks |
|
|
140 | (1) |
|
8.11 Elasticity Failure Scenarios |
|
|
141 | (4) |
|
8.11.1 Elastic Growth Failure Scenarios |
|
|
141 | (2) |
|
8.11.2 Elastic Capacity Degrowth Failure Scenarios |
|
|
143 | (2) |
|
|
145 | (18) |
|
|
145 | (1) |
|
9.2 Traditional Software Upgrade Strategies |
|
|
146 | (7) |
|
9.2.1 Software Upgrade Requirements |
|
|
146 | (2) |
|
9.2.2 Maintenance Windows |
|
|
148 | (1) |
|
9.2.3 Client Considerations for Application Upgrade |
|
|
149 | (1) |
|
9.2.4 Traditional Offline Software Upgrade |
|
|
150 | (1) |
|
9.2.5 Traditional Online Software Upgrade |
|
|
151 | (2) |
|
|
153 | (1) |
|
9.3 Cloud-Enabled Software Upgrade Strategies |
|
|
153 | (5) |
|
9.3.1 Type I Cloud-Enabled Upgrade Strategy: Block Party |
|
|
154 | (2) |
|
9.3.2 Type II Cloud-Enabled Upgrade Strategy: One Driver per Bus |
|
|
156 | (1) |
|
|
157 | (1) |
|
|
158 | (1) |
|
9.5 Role of Service Orchestration in Software Upgrade |
|
|
159 | (2) |
|
9.5.1 Solution-Level Software Upgrade |
|
|
160 | (1) |
|
|
161 | (2) |
|
10 End-To-End Considerations |
|
|
163 | (28) |
|
10.1 End-to-End Service Context |
|
|
163 | (6) |
|
10.2 Three-Layer End-to-End Service Model |
|
|
169 | (8) |
|
10.2.1 Estimating Service Impairments via the Three-Layer Model |
|
|
171 | (1) |
|
10.2.2 End-to-End Service Availability |
|
|
172 | (1) |
|
10.2.3 End-to-End Service Latency |
|
|
173 | (1) |
|
10.2.4 End-to-End Service Reliability |
|
|
174 | (1) |
|
10.2.5 End-to-End Service Accessibility |
|
|
175 | (1) |
|
10.2.6 End-to-End Service Retainability |
|
|
176 | (1) |
|
10.2.7 End-to-End Service Throughput |
|
|
176 | (1) |
|
10.2.8 End-to-End Service Timestamp Accuracy |
|
|
177 | (1) |
|
|
177 | (1) |
|
10.3 Distributed and Centralized Cloud Data Centers |
|
|
177 | (6) |
|
10.3.1 Centralized Cloud Data Centers |
|
|
178 | (1) |
|
10.3.2 Distributed Cloud Data Centers |
|
|
178 | (1) |
|
10.3.3 Service Availability Considerations |
|
|
179 | (2) |
|
10.3.4 Service Latency Considerations |
|
|
181 | (1) |
|
10.3.5 Service Reliability Considerations |
|
|
182 | (1) |
|
10.3.6 Service Accessibility Considerations |
|
|
182 | (1) |
|
10.3.7 Service Retainability Considerations |
|
|
182 | (1) |
|
10.3.8 Resource Distribution Considerations |
|
|
182 | (1) |
|
10.4 Multitiered Solution Architectures |
|
|
183 | (1) |
|
10.5 Disaster Recovery and Geographic Redundancy |
|
|
184 | (7) |
|
10.5.1 Disaster Recovery Objectives |
|
|
184 | (1) |
|
10.5.2 Georedundant Architectures |
|
|
185 | (1) |
|
10.5.3 Service Quality Considerations |
|
|
186 | (1) |
|
10.5.4 Recovery Point Considerations |
|
|
187 | (2) |
|
10.5.5 Mitigating Impact of Disasters with Georedundancy and Availability Zones |
|
|
189 | (2) |
|
|
191 | (112) |
|
11 Accountabilities for Service Quality |
|
|
193 | (20) |
|
11.1 Traditional Accountability |
|
|
193 | (1) |
|
11.2 The Cloud Service Delivery Path |
|
|
194 | (3) |
|
11.3 Cloud Accountability |
|
|
197 | (3) |
|
11.4 Accountability Case Studies |
|
|
200 | (5) |
|
11.4.1 Accountability and Technology Components |
|
|
201 | (2) |
|
11.4.2 Accountability and Elasticity |
|
|
203 | (2) |
|
11.5 Service Quality Gap Model |
|
|
205 | (5) |
|
11.5.1 Application's Resource Facing Service Gap Analysis |
|
|
206 | (2) |
|
11.5.2 Application's Customer Facing Service Gap Analysis |
|
|
208 | (2) |
|
11.6 Service Level Agreements |
|
|
210 | (3) |
|
12 Service Availability Measurement |
|
|
213 | (20) |
|
12.1 Parsimonious Service Measurements |
|
|
214 | (1) |
|
12.2 Traditional Service Availability Measurement |
|
|
215 | (2) |
|
12.3 Evolving Service Availability Measurements |
|
|
217 | (9) |
|
12.3.1 Analyzing Application Evolution |
|
|
218 | (5) |
|
12.3.2 Technology Components |
|
|
223 | (1) |
|
12.3.3 Leveraging Storage-as-a-Service |
|
|
224 | (2) |
|
12.4 Evolving Hardware Reliability Measurement |
|
|
226 | (2) |
|
12.4.1 Virtual Machine Failure Lifecycle |
|
|
226 | (2) |
|
12.5 Evolving Elasticity Service Availability Measurements |
|
|
228 | (1) |
|
12.6 Evolving Release Management Service Availability Measurement |
|
|
229 | (2) |
|
12.7 Service Measurement Outlook |
|
|
231 | (2) |
|
13 Application Service Quality Requirements |
|
|
233 | (10) |
|
13.1 Service Availability Requirements |
|
|
234 | (3) |
|
13.2 Service Latency Requirements |
|
|
237 | (1) |
|
13.3 Service Reliability Requirements |
|
|
237 | (1) |
|
13.4 Service Accessibility Requirements |
|
|
238 | (1) |
|
13.5 Service Retainability Requirements |
|
|
239 | (1) |
|
13.6 Service Throughput Requirements |
|
|
239 | (1) |
|
13.7 Timestamp Accuracy Requirements |
|
|
240 | (1) |
|
13.8 Elasticity Requirements |
|
|
240 | (1) |
|
13.9 Release Management Requirements |
|
|
241 | (1) |
|
13.10 Disaster Recovery Requirements |
|
|
241 | (2) |
|
14 Virtualized Infrastructure Measurement and Management |
|
|
243 | (12) |
|
14.1 Business Context for Infrastructure Service Quality Measurements |
|
|
244 | (1) |
|
14.2 Cloud Consumer Measurement Options |
|
|
245 | (2) |
|
14.3 Impairment Measurement Strategies |
|
|
247 | (5) |
|
14.3.1 Measurement of VM Failure |
|
|
247 | (2) |
|
14.3.2 Measurement of Nondelivery of Configured VM Capacity |
|
|
249 | (1) |
|
14.3.3 Measurement of Delivery of Degraded VM Capacity |
|
|
249 | (1) |
|
14.3.4 Measurement of Tail Latency |
|
|
249 | (1) |
|
14.3.5 Measurement of Clock Event Jitter |
|
|
250 | (1) |
|
14.3.6 Measurement of Clock Drift |
|
|
250 | (1) |
|
14.3.7 Measurement of Failed or Slow Allocation and Startup of VM Instance |
|
|
250 | (1) |
|
14.3.8 Measurements Summary |
|
|
251 | (1) |
|
14.4 Managing Virtualized Infrastructure Impairments |
|
|
252 | (3) |
|
14.4.1 Minimize Application's Sensitivity to Infrastructure Impairments |
|
|
252 | (1) |
|
14.4.2 VM-Level Congestion Detection and Control |
|
|
252 | (1) |
|
14.4.3 Allocate More Virtual Resource Capacity |
|
|
253 | (1) |
|
14.4.4 Terminate Poorly Performing VM Instances |
|
|
253 | (1) |
|
14.4.5 Accept Degraded Performance |
|
|
253 | (1) |
|
14.4.6 Proactive Supplier Management |
|
|
254 | (1) |
|
14.4.7 Reset End Users' Service Quality Expectations |
|
|
254 | (1) |
|
14.4.8 SLA Considerations |
|
|
254 | (1) |
|
14.4.9 Changing Cloud Service Providers |
|
|
254 | (1) |
|
15 Analysis of Cloud-Based Applications |
|
|
255 | (18) |
|
15.1 Reliability Block Diagrams and Side-by-Side Analysis |
|
|
256 | (1) |
|
15.2 IaaS Impairment Effects Analysis |
|
|
257 | (2) |
|
15.3 PaaS Failure Effects Analysis |
|
|
259 | (1) |
|
15.4 Workload Distribution Analysis |
|
|
260 | (2) |
|
15.4.1 Service Quality Analysis |
|
|
261 | (1) |
|
15.4.2 Overload Control Analysis |
|
|
261 | (1) |
|
15.5 Anti-Affinity Analysis |
|
|
262 | (1) |
|
|
263 | (4) |
|
15.6.1 Service Capacity Growth Scenarios |
|
|
264 | (1) |
|
15.6.2 Service Capacity Growth Action Analysis |
|
|
264 | (1) |
|
15.6.3 Service Capacity Degrowth Action Analysis |
|
|
265 | (1) |
|
15.6.4 Storage Capacity Growth Scenarios |
|
|
265 | (1) |
|
15.6.5 Online Storage Capacity Growth Action Analysis |
|
|
266 | (1) |
|
15.6.6 Online Storage Capacity Degrowth Action Analysis |
|
|
266 | (1) |
|
15.7 Release Management Impact Effects Analysis |
|
|
267 | (1) |
|
15.7.1 Service Availability Impact |
|
|
267 | (1) |
|
15.7.2 Server Reliability Impact |
|
|
267 | (1) |
|
15.7.3 Service Accessibility Impact |
|
|
267 | (1) |
|
15.7.4 Service Retainability Impact |
|
|
267 | (1) |
|
15.7.5 Service Throughput Impact |
|
|
267 | (1) |
|
15.8 Recovery Point Objective Analysis |
|
|
268 | (2) |
|
15.9 Recovery Time Objective Analysis |
|
|
270 | (3) |
|
16 Testing Considerations |
|
|
273 | (14) |
|
|
273 | (1) |
|
|
274 | (3) |
|
|
275 | (1) |
|
16.2.2 Application Capacity under Test |
|
|
275 | (1) |
|
16.2.3 Statistical Confidence |
|
|
276 | (1) |
|
16.2.4 Service Disruption Time |
|
|
276 | (1) |
|
16.3 Simulating Infrastructure Impairments |
|
|
277 | (1) |
|
|
278 | (9) |
|
16.4.1 Service Reliability and Latency Testing |
|
|
279 | (1) |
|
16.4.2 Impaired Infrastructure Testing |
|
|
280 | (1) |
|
16.4.3 Robustness Testing |
|
|
280 | (2) |
|
16.4.4 Endurance/Stability Testing |
|
|
282 | (2) |
|
16.4.5 Application Elasticity Testing |
|
|
284 | (1) |
|
|
285 | (1) |
|
16.4.7 Disaster Recovery Testing |
|
|
285 | (1) |
|
16.4.8 Extreme Coresidency Testing |
|
|
286 | (1) |
|
16.4.9 PaaS Technology Component Testing |
|
|
286 | (1) |
|
16.4.10 Automated Regression Testing |
|
|
286 | (1) |
|
16.4.11 Canary Release Testing |
|
|
286 | (1) |
|
|
287 | (16) |
|
17.1 The Application Service Quality Challenge |
|
|
287 | (2) |
|
17.2 Redundancy and Robustness |
|
|
289 | (3) |
|
17.3 Design for Scalability |
|
|
292 | (1) |
|
17.4 Design for Extensibility |
|
|
292 | (1) |
|
|
293 | (1) |
|
17.6 Planning Considerations |
|
|
294 | (2) |
|
17.7 Evolving Traditional Applications |
|
|
296 | (5) |
|
17.7.1 Phase 0: Traditional Application |
|
|
298 | (1) |
|
17.7.2 Phase I: High Service Quality on Virtualized Infrastructure |
|
|
298 | (1) |
|
17.7.3 Phase II: Manual Application Elasticity |
|
|
299 | (1) |
|
17.7.4 Phase III: Automated Release Management |
|
|
299 | (1) |
|
17.7.5 Phase IV: Automated Application Elasticity |
|
|
300 | (1) |
|
17.7.6 Phase V: VM Migration |
|
|
300 | (1) |
|
|
301 | (2) |
Abbreviations |
|
303 | (4) |
References |
|
307 | (4) |
About the Authors |
|
311 | (2) |
Index |
|
313 | |