Muutke küpsiste eelistusi

Service Quality of Cloud-Based Applications [Kõva köide]

, (Alcatel-Lucent Reliability)
  • Formaat: Hardback, 344 pages, kõrgus x laius x paksus: 241x163x23 mm, kaal: 576 g
  • Ilmumisaeg: 07-Feb-2014
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1118763297
  • ISBN-13: 9781118763292
  • Formaat: Hardback, 344 pages, kõrgus x laius x paksus: 241x163x23 mm, kaal: 576 g
  • Ilmumisaeg: 07-Feb-2014
  • Kirjastus: Wiley-IEEE Press
  • ISBN-10: 1118763297
  • ISBN-13: 9781118763292
As cloud architecture is taking the Internet by storm, new applications are created by the thousands and the issue of service quality should never be far from the mind of a putative app designer. This general overview of the cloud infrastructure and its effects on the specific app service quality is written by two experts in the field and authors of numerous previously published books on related subjects. The book is divided into three broad sections: context, analysis and recommendations. The first part describes the concept of service quality and its metrics, the cloud model itself and various infrastructure impairments. The second part, ion analysis, examines specific considerations: failure containment, capacity and release management, load balancing, etc. Finally, the part on recommendations provides an extensive list of suggestions for developers and managers both, including measurements of service availability and virtualized infrastructure, application analysis, and testing considerations. The book concludes with a chapter connecting all the previous conclusions and recommendations into a workable whole and providing more general recommendations for project management. Annotation ©2014 Ringgold, Inc., Portland, OR (protoview.com)

This book explains why applications running on cloud might not deliver the same service reliability, availability, latency and overall quality to end users as they do when the applications are running on traditional (non-virtualized, non-cloud) configurations, and explains what can be done to mitigate that risk.
Figures
xv
Tables and Equations
xxi
1 Introduction
1(6)
1.1 Approach
1(2)
1.2 Target Audience
3(1)
1.3 Organization
3(4)
I Context
7(58)
2 Application Service Quality
9(20)
2.1 Simple Application Model
9(2)
2.2 Service Boundaries
11(1)
2.3 Key Quality and Performance Indicators
12(3)
2.4 Key Application Characteristics
15(2)
2.4.1 Service Criticality
15(1)
2.4.2 Application Interactivity
16(1)
2.4.3 Tolerance to Network Traffic Impairments
17(1)
2.5 Application Service Quality Metrics
17(10)
2.5.1 Service Availability
18(1)
2.5.2 Service Latency
19(5)
2.5.3 Service Reliability
24(1)
2.5.4 Service Accessibility
25(1)
2.5.5 Service Retainability
25(1)
2.5.6 Service Throughput
25(1)
2.5.7 Service Timestamp Accuracy
26(1)
2.5.8 Application-Specific Service Quality Measurements
26(1)
2.6 Technical Service versus Support Service
27(1)
2.6.1 Technical Service Quality
27(1)
2.6.2 Support Service Quality
27(1)
2.7 Security Considerations
28(1)
3 Cloud Model
29(20)
3.1 Roles in Cloud Computing
30(1)
3.2 Cloud Service Models
30(1)
3.3 Cloud Essential Characteristics
31(2)
3.3.1 On-Demand Self-Service
31(1)
3.3.2 Broad Network Access
31(1)
3.3.3 Resource Pooling
32(1)
3.3.4 Rapid Elasticity
32(1)
3.3.5 Measured Service
33(1)
3.4 Simplified Cloud Architecture
33(3)
3.4.1 Application Software
34(1)
3.4.2 Virtual Machine Servers
35(1)
3.4.3 Virtual Machine Server Controllers
35(1)
3.4.4 Cloud Operations Support Systems
36(1)
3.4.5 Cloud Technology Components Offered "as-a-Service"
36(1)
3.5 Elasticity Measurements
36(8)
3.5.1 Density
37(1)
3.5.2 Provisioning Interval
37(2)
3.5.3 Release Interval
39(1)
3.5.4 Scaling In and Out
40(1)
3.5.5 Scaling Up and Down
41(1)
3.5.6 Agility
42(1)
3.5.7 Slew Rate and Linearity
43(1)
3.5.8 Elasticity Speedup
44(1)
3.6 Regions and Zones
44(1)
3.7 Cloud Awareness
45(4)
4 Virtualized Infrastructure Impairments
49(16)
4.1 Service Latency, Virtualization, and the Cloud
50(4)
4.1.1 Virtualization and Cloud Causes of Latency Variation
51(1)
4.1.2 Virtualization Overhead
52(1)
4.1.3 Increased Variability of Infrastructure Performance
53(1)
4.2 VM Failure
54(1)
4.3 Nondelivery of Configured VM Capacity
54(3)
4.4 Delivery of Degraded VM Capacity
57(2)
4.5 Tail Latency
59(1)
4.6 Clock Event Jitter
60(1)
4.7 Clock Drift
61(1)
4.8 Failed or Slow Allocation and Startup of VM Instance
62(1)
4.9 Outlook for Virtualized Infrastructure Impairments
63(2)
II Analysis
65(126)
5 Application Redundancy and Cloud Computing
67(30)
5.1 Failures, Availability, and Simplex Architectures
68(2)
5.2 Improving Software Repair Times via Virtualization
70(2)
5.3 Improving Infrastructure Repair Times via Virtualization
72(3)
5.3.1 Understanding Hardware Repair
72(1)
5.3.2 VM Repair-as-a-Service
72(2)
5.3.3 Discussion
74(1)
5.4 Redundancy and Recoverability
75(5)
5.4.1 Improving Recovery Times via Virtualization
79(1)
5.5 Sequential Redundancy and Concurrent Redundancy
80(4)
5.5.1 Hybrid Concurrent Strategy
83(1)
5.6 Application Service Impact of Virtualization Impairments
84(6)
5.6.1 Service Impact for Simplex Architectures
85(1)
5.6.2 Service Impact for Sequential Redundancy Architectures
85(2)
5.6.3 Service Impact for Concurrent Redundancy Architectures
87(1)
5.6.4 Service Impact for Hybrid Concurrent Architectures
88(2)
5.7 Data Redundancy
90(2)
5.7.1 Data Storage Strategies
90(1)
5.7.2 Data Consistency Strategies
91(1)
5.7.3 Data Architecture Considerations
92(1)
5.8 Discussion
92(5)
5.8.1 Service Quality Impact
93(1)
5.8.2 Concurrency Control
93(1)
5.8.3 Resource Usage
94(1)
5.8.4 Simplicity
94(1)
5.8.5 Other Considerations
95(2)
6 Load Distribution and Balancing
97(14)
6.1 Load Distribution Mechanisms
97(2)
6.2 Load Distribution Strategies
99(1)
6.3 Proxy Load Balancers
99(2)
6.4 Nonproxy Load Distribution
101(1)
6.5 Hierarchy of Load Distribution
102(1)
6.6 Cloud-Based Load Balancing Challenges
103(1)
6.7 The Role of Load Balancing in Support of Redundancy
103(1)
6.8 Load Balancing and Availability Zones
104(1)
6.9 Workload Service Measurements
104(1)
6.10 Operational Considerations
105(2)
6.10.1 Load Balancing and Elasticity
105(1)
6.10.2 Load Balancing and Overload
106(1)
6.10.3 Load Balancing and Release Management
107(1)
6.11 Load Balancing and Application Service Quality
107(4)
6.11.1 Service Availability
107(1)
6.11.2 Service Latency
108(1)
6.11.3 Service Reliability
108(1)
6.11.4 Service Accessibility
109(1)
6.11.5 Service Retainability
109(1)
6.11.6 Service Throughput
109(1)
6.11.7 Service Timestamp Accuracy
109(2)
7 Failure Containment
111(16)
7.1 Failure Containment
111(5)
7.1.1 Failure Cascades
112(1)
7.1.2 Failure Containment and Recovery
112(2)
7.1.3 Failure Containment and Virtualization
114(2)
7.2 Points of Failure
116(6)
7.2.1 Single Points of Failure
116(1)
7.2.2 Single Points of Failure and Virtualization
117(2)
7.2.3 Affinity and Anti-affinity Considerations
119(1)
7.2.4 No SPOF Assurance in Cloud Computing
120(1)
7.2.5 No SPOF and Application Data
121(1)
7.3 Extreme Solution Coresidency
122(2)
7.3.1 Extreme Solution Coresidency Risks
123(1)
7.4 Multitenancy and Solution Containers
124(3)
8 Capacity Management
127(18)
8.1 Workload Variations
128(1)
8.2 Traditional Capacity Management
129(1)
8.3 Traditional Overload Control
129(2)
8.4 Capacity Management and Virtualization
131(2)
8.5 Capacity Management in Cloud
133(2)
8.6 Storage Elasticity Considerations
135(1)
8.7 Elasticity and Overload
136(1)
8.8 Operational Considerations
137(1)
8.9 Workload Whipsaw
138(2)
8.10 General Elasticity Risks
140(1)
8.11 Elasticity Failure Scenarios
141(4)
8.11.1 Elastic Growth Failure Scenarios
141(2)
8.11.2 Elastic Capacity Degrowth Failure Scenarios
143(2)
9 Release Management
145(18)
9.1 Terminology
145(1)
9.2 Traditional Software Upgrade Strategies
146(7)
9.2.1 Software Upgrade Requirements
146(2)
9.2.2 Maintenance Windows
148(1)
9.2.3 Client Considerations for Application Upgrade
149(1)
9.2.4 Traditional Offline Software Upgrade
150(1)
9.2.5 Traditional Online Software Upgrade
151(2)
9.2.6 Discussion
153(1)
9.3 Cloud-Enabled Software Upgrade Strategies
153(5)
9.3.1 Type I Cloud-Enabled Upgrade Strategy: Block Party
154(2)
9.3.2 Type II Cloud-Enabled Upgrade Strategy: One Driver per Bus
156(1)
9.3.3 Discussion
157(1)
9.4 Data Management
158(1)
9.5 Role of Service Orchestration in Software Upgrade
159(2)
9.5.1 Solution-Level Software Upgrade
160(1)
9.6 Conclusion
161(2)
10 End-To-End Considerations
163(28)
10.1 End-to-End Service Context
163(6)
10.2 Three-Layer End-to-End Service Model
169(8)
10.2.1 Estimating Service Impairments via the Three-Layer Model
171(1)
10.2.2 End-to-End Service Availability
172(1)
10.2.3 End-to-End Service Latency
173(1)
10.2.4 End-to-End Service Reliability
174(1)
10.2.5 End-to-End Service Accessibility
175(1)
10.2.6 End-to-End Service Retainability
176(1)
10.2.7 End-to-End Service Throughput
176(1)
10.2.8 End-to-End Service Timestamp Accuracy
177(1)
10.2.9 Reality Check
177(1)
10.3 Distributed and Centralized Cloud Data Centers
177(6)
10.3.1 Centralized Cloud Data Centers
178(1)
10.3.2 Distributed Cloud Data Centers
178(1)
10.3.3 Service Availability Considerations
179(2)
10.3.4 Service Latency Considerations
181(1)
10.3.5 Service Reliability Considerations
182(1)
10.3.6 Service Accessibility Considerations
182(1)
10.3.7 Service Retainability Considerations
182(1)
10.3.8 Resource Distribution Considerations
182(1)
10.4 Multitiered Solution Architectures
183(1)
10.5 Disaster Recovery and Geographic Redundancy
184(7)
10.5.1 Disaster Recovery Objectives
184(1)
10.5.2 Georedundant Architectures
185(1)
10.5.3 Service Quality Considerations
186(1)
10.5.4 Recovery Point Considerations
187(2)
10.5.5 Mitigating Impact of Disasters with Georedundancy and Availability Zones
189(2)
III Recommendations
191(112)
11 Accountabilities for Service Quality
193(20)
11.1 Traditional Accountability
193(1)
11.2 The Cloud Service Delivery Path
194(3)
11.3 Cloud Accountability
197(3)
11.4 Accountability Case Studies
200(5)
11.4.1 Accountability and Technology Components
201(2)
11.4.2 Accountability and Elasticity
203(2)
11.5 Service Quality Gap Model
205(5)
11.5.1 Application's Resource Facing Service Gap Analysis
206(2)
11.5.2 Application's Customer Facing Service Gap Analysis
208(2)
11.6 Service Level Agreements
210(3)
12 Service Availability Measurement
213(20)
12.1 Parsimonious Service Measurements
214(1)
12.2 Traditional Service Availability Measurement
215(2)
12.3 Evolving Service Availability Measurements
217(9)
12.3.1 Analyzing Application Evolution
218(5)
12.3.2 Technology Components
223(1)
12.3.3 Leveraging Storage-as-a-Service
224(2)
12.4 Evolving Hardware Reliability Measurement
226(2)
12.4.1 Virtual Machine Failure Lifecycle
226(2)
12.5 Evolving Elasticity Service Availability Measurements
228(1)
12.6 Evolving Release Management Service Availability Measurement
229(2)
12.7 Service Measurement Outlook
231(2)
13 Application Service Quality Requirements
233(10)
13.1 Service Availability Requirements
234(3)
13.2 Service Latency Requirements
237(1)
13.3 Service Reliability Requirements
237(1)
13.4 Service Accessibility Requirements
238(1)
13.5 Service Retainability Requirements
239(1)
13.6 Service Throughput Requirements
239(1)
13.7 Timestamp Accuracy Requirements
240(1)
13.8 Elasticity Requirements
240(1)
13.9 Release Management Requirements
241(1)
13.10 Disaster Recovery Requirements
241(2)
14 Virtualized Infrastructure Measurement and Management
243(12)
14.1 Business Context for Infrastructure Service Quality Measurements
244(1)
14.2 Cloud Consumer Measurement Options
245(2)
14.3 Impairment Measurement Strategies
247(5)
14.3.1 Measurement of VM Failure
247(2)
14.3.2 Measurement of Nondelivery of Configured VM Capacity
249(1)
14.3.3 Measurement of Delivery of Degraded VM Capacity
249(1)
14.3.4 Measurement of Tail Latency
249(1)
14.3.5 Measurement of Clock Event Jitter
250(1)
14.3.6 Measurement of Clock Drift
250(1)
14.3.7 Measurement of Failed or Slow Allocation and Startup of VM Instance
250(1)
14.3.8 Measurements Summary
251(1)
14.4 Managing Virtualized Infrastructure Impairments
252(3)
14.4.1 Minimize Application's Sensitivity to Infrastructure Impairments
252(1)
14.4.2 VM-Level Congestion Detection and Control
252(1)
14.4.3 Allocate More Virtual Resource Capacity
253(1)
14.4.4 Terminate Poorly Performing VM Instances
253(1)
14.4.5 Accept Degraded Performance
253(1)
14.4.6 Proactive Supplier Management
254(1)
14.4.7 Reset End Users' Service Quality Expectations
254(1)
14.4.8 SLA Considerations
254(1)
14.4.9 Changing Cloud Service Providers
254(1)
15 Analysis of Cloud-Based Applications
255(18)
15.1 Reliability Block Diagrams and Side-by-Side Analysis
256(1)
15.2 IaaS Impairment Effects Analysis
257(2)
15.3 PaaS Failure Effects Analysis
259(1)
15.4 Workload Distribution Analysis
260(2)
15.4.1 Service Quality Analysis
261(1)
15.4.2 Overload Control Analysis
261(1)
15.5 Anti-Affinity Analysis
262(1)
15.6 Elasticity Analysis
263(4)
15.6.1 Service Capacity Growth Scenarios
264(1)
15.6.2 Service Capacity Growth Action Analysis
264(1)
15.6.3 Service Capacity Degrowth Action Analysis
265(1)
15.6.4 Storage Capacity Growth Scenarios
265(1)
15.6.5 Online Storage Capacity Growth Action Analysis
266(1)
15.6.6 Online Storage Capacity Degrowth Action Analysis
266(1)
15.7 Release Management Impact Effects Analysis
267(1)
15.7.1 Service Availability Impact
267(1)
15.7.2 Server Reliability Impact
267(1)
15.7.3 Service Accessibility Impact
267(1)
15.7.4 Service Retainability Impact
267(1)
15.7.5 Service Throughput Impact
267(1)
15.8 Recovery Point Objective Analysis
268(2)
15.9 Recovery Time Objective Analysis
270(3)
16 Testing Considerations
273(14)
16.1 Context for Testing
273(1)
16.2 Test Strategy
274(3)
16.2.1 Cloud Test Bed
275(1)
16.2.2 Application Capacity under Test
275(1)
16.2.3 Statistical Confidence
276(1)
16.2.4 Service Disruption Time
276(1)
16.3 Simulating Infrastructure Impairments
277(1)
16.4 Test Planning
278(9)
16.4.1 Service Reliability and Latency Testing
279(1)
16.4.2 Impaired Infrastructure Testing
280(1)
16.4.3 Robustness Testing
280(2)
16.4.4 Endurance/Stability Testing
282(2)
16.4.5 Application Elasticity Testing
284(1)
16.4.6 Upgrade Testing
285(1)
16.4.7 Disaster Recovery Testing
285(1)
16.4.8 Extreme Coresidency Testing
286(1)
16.4.9 PaaS Technology Component Testing
286(1)
16.4.10 Automated Regression Testing
286(1)
16.4.11 Canary Release Testing
286(1)
17 Connecting the Dots
287(16)
17.1 The Application Service Quality Challenge
287(2)
17.2 Redundancy and Robustness
289(3)
17.3 Design for Scalability
292(1)
17.4 Design for Extensibility
292(1)
17.5 Design for Failure
293(1)
17.6 Planning Considerations
294(2)
17.7 Evolving Traditional Applications
296(5)
17.7.1 Phase 0: Traditional Application
298(1)
17.7.2 Phase I: High Service Quality on Virtualized Infrastructure
298(1)
17.7.3 Phase II: Manual Application Elasticity
299(1)
17.7.4 Phase III: Automated Release Management
299(1)
17.7.5 Phase IV: Automated Application Elasticity
300(1)
17.7.6 Phase V: VM Migration
300(1)
17.8 Concluding Remarks
301(2)
Abbreviations 303(4)
References 307(4)
About the Authors 311(2)
Index 313
ERIC BAUER, MS, is Reliability Engineering Manager in the IP Platforms CTO of Alcatel-Lucent. The holder of more than a dozen U.S. patents, Mr. Bauer is the author or coauthor of Reliability and Availability of Cloud Computing, Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems, Design for Reliability: Information and Computer-Based Systems, and Practical System Reliability.

RANDEE ADAMS, MS, is a Consulting Member of the Technical Staff in the IP Platforms CTO of Alcatel-Lucent. She is the coauthor of Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems and Reliability and Availability of Cloud Computing. Ms. Adams has worked on many projects, including software development, delivery, engineering, architecture, and design.