Muutke küpsiste eelistusi

E-raamat: High Availability IT Services

(IT Consultant, Manchester, UK)
  • Formaat: 537 pages
  • Ilmumisaeg: 17-Dec-2014
  • Kirjastus: Apple Academic Press Inc.
  • Keel: eng
  • ISBN-13: 9781040177761
  • Formaat - EPUB+DRM
  • Hind: 74,09 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 537 pages
  • Ilmumisaeg: 17-Dec-2014
  • Kirjastus: Apple Academic Press Inc.
  • Keel: eng
  • ISBN-13: 9781040177761

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book starts with the basic premise that a service is comprised of the 3Psproducts, processes, and people. Moreover, these entities and their sub-entities interlink to support the services that end users require to run and support a business. This widens the scope of any availability design far beyond hardware and software. It also increases the potential for service failure for reasons beyond just hardware and software; the concept of logical outages.

High Availability IT Services details the considerations for designing and running highly available "services" and not just the systems infrastructure that supports those services. Providing an overview of virtualization and cloud computing, it supplies a detailed look at availability, redundancy, fault tolerance, and security. It also stresses the importance of human factors.

The book starts off by providing an availability primer and detailing the reasons why you need to be concerned with high availability. Next, it outlines the theory of reliability and availability and the elements of actual practices in this high availability (HA) area, including Service Level Agreements (SLAs) and Change Management.

Examining what the major hardware and software vendors have to offer in the HA world, the book considers the ubiquitous world of clouds and virtualization as well as the availability considerations they present.

The book examines high availability concepts and architectures such as reliability, availability, and serviceability (RAS); clusters; grids; and redundant arrays of independent disks (RAID) storage. It also covers the role of security in providing high availability, cluster offerings, emergent Linux clusters, online transaction processing (OLTP), and relational databases.
Foreword xxv
Preface xxvii
Acknowledgments xxxiii
Author xxxv
SECTION I AN AVAILABILITY PRIMER
1 Preamble: A View from 30,000 Feet
3(10)
Do You Know...?
3(1)
Availability in Perspective
4(6)
Murphy's Law of Availability
4(1)
Availability Drivers in Flux: What Percentage of Business Is Critical?
4(2)
Historical View of Availability: The First 7 × 24 Requirements?
6(2)
Historical Availability Scenarios
8(1)
Planar Technology
8(1)
Power-On Self-Test
9(1)
Other Diagnostics
9(1)
Component Repair
9(1)
In-Flight Diagnostics
10(1)
Summary
10(3)
2 Reliability and Availability
13(26)
Introduction to Reliability, Availability, and Serviceability
13(5)
RAS Moves Beyond Hardware
14(1)
Availability: An Overview
15(1)
Some Definitions
15(1)
Quantitative Availability
16(1)
Availability: 7 R's (SNIA)
16(2)
Availability and Change
18(4)
Change All around Us
19(1)
Software: Effect of Change
20(1)
Operations: Effect of Change
20(1)
Monitoring and Change
20(2)
Automation: The Solution?
22(2)
Data Center Automation
22(1)
Network Change/Configuration Automation
23(1)
Automation Vendors
23(1)
Types of Availability
24(4)
Binary Availability
24(1)
Duke of York Availability
25(1)
Hierarchy of Failures
26(1)
Hierarchy Example
26(1)
State Parameters
27(1)
Types of Nonavailability (Outages)
28(3)
Logical Outage Examples
29(2)
Summary
31(1)
Planning for Availability and Recovery
31(2)
Why Bother?
31(1)
What Is a Business Continuity Plan?
31(1)
What Is a BIA?
32(1)
What Is DR?
33(1)
Relationships: BC, BIA, and DR
33(1)
Recovery Logistics
33(1)
Business Continuity
34(1)
Downtime: Who or What Is to Blame?
34(1)
Elements of Failure: Interaction of the Wares
35(2)
Summary
37(1)
DR/BC Source Documents
37(2)
3 Reliability: Background and Basics
39(14)
Introduction
39(1)
IT Structure---Schematic
40(1)
IT Structure---Hardware Overview
40(2)
Service Level Agreements
42(1)
Service Level Agreements: The Dawn of Realism
42(1)
What Is an SLA?
43(1)
Why Is an SLA Important?
43(1)
Service Life Cycle
43(2)
Concept of User Service
45(1)
Elements of Service Management
45(4)
Introduction
45(1)
Scope of Service Management
46(1)
User Support
46(1)
Operations Support
46(1)
Systems Management
47(1)
Service Management Hierarchy
47(1)
The Effective Service
48(1)
Services versus Systems
49(1)
Availability Concepts
49(3)
First Dip in the Water
49(1)
Availability Parameters
50(2)
Summary
52(1)
4 What Is High Availability?
53(52)
IDC and Availability
53(1)
Availability Classification
54(11)
Availability: Outage Analogy
56(1)
A Recovery Analogy
56(1)
Availability: Redundancy
57(1)
Availability: Fault Tolerance
57(1)
Sample List of Availability Requirements
57(1)
System Architecture
57(1)
Availability: Single Node
58(1)
Dynamic Reconfiguration/Hot Repair of System Components
58(1)
Disaster Backup and Recovery
58(1)
System Administration Facilities
59(1)
HA Costs Money, So Why Bother?
59(1)
Cost Impact Analysis
59(1)
HA: Cost versus Benefit
60(1)
Penalty for Nonavailability
60(1)
Organizations: Attitude toward HA
60(1)
Aberdeen Group Study: February 2012
61(1)
Outage Loss Factors (Percentage of Loss)
62(1)
Software Failure Costs
62(2)
Assessing the Cost of HA
64(1)
Performance and Availability
64(1)
HA Design: Top 10 Mistakes
65(1)
The Development of HA
65(4)
Servers
65(2)
Systems and Subsystems Development
67(1)
Production Clusters
67(2)
Availability Architectures
69(3)
RAS Features
69(1)
Hot-Plug Hardware
69(1)
Processors
69(1)
Memory
70(1)
Input/Output
71(1)
Storage
71(1)
Power/Cooling
71(1)
Fault Tolerance
72(1)
Outline of Server Domain Architecture
72(2)
Introduction
72(1)
Domain/LPAR Structure
73(1)
Outline of Cluster Architecture
74(1)
Cluster Configurations: Commercial Cluster
74(1)
Cluster Components
74(3)
Hardware
74(1)
Software
75(1)
Commercial LB
76(1)
Commercial Performance
77(1)
Commercial HA
77(1)
HPC Clusters
77(5)
Generic HPC Cluster
77(1)
HPC Cluster: Oscar Configuration
78(1)
HPC Cluster: Availability
79(1)
HPC Cluster: Applications
79(1)
HA in Scientific Computing
80(1)
Topics in HPC Reliability: Summary
80(1)
Errors in Cluster HA Design
81(1)
Outline of Grid Computing
82(1)
Grid Availability
82(1)
Commercial Grid Computing
83(1)
Outline of RAID Architecture
83(11)
Origins of RAID
83(1)
RAID Architecture and Levels
84(1)
Hardware
84(1)
Software
85(1)
Hardware versus Software RAID
85(1)
RAID Striping: Fundamental to RAID
85(1)
RAID Configurations
86(1)
RAID Components
86(1)
ECC
86(1)
Parity
87(1)
RAID Level 0
87(1)
RAID Level 1
87(1)
RAID Level 3
87(1)
RAID Level 5
88(1)
RAID Level 6
88(1)
RAID Level 10
88(1)
RAID 0 + 1 Schematic
89(1)
RAID 10 Schematic
89(1)
RAID Level 30
89(1)
RAID Level 50
89(1)
RAID Level 51
89(1)
RAID Level 60
90(1)
RAID Level 100
90(1)
Less Relevant RAIDs
90(1)
RAID Level 2
90(1)
RAID Level 4
90(1)
RAID Level 7
90(1)
Standard RAID Storage Efficiency
91(1)
SSDs and RAID
92(1)
SSD Longevity
93(1)
Hybrid RAID: SSD and HDD
93(1)
SSD References
93(1)
Post-RAID Environment
94(3)
Big Data: The Issue
94(1)
Data Loss Overview
95(1)
Big Data: Solutions?
95(1)
Non-RAID RAID
96(1)
Erasure Codes
97(4)
RAID Successor Qualifications
97(1)
EC Overview
98(1)
EC Recovery Scope
99(1)
Self-Healing Storage
100(1)
Summary
101(4)
SECTION II AVAILABILITY THEORY AND PRACTICE
5 High Availability: Theory
105(36)
Some Math
105(4)
Guide to Reliability Graphs
105(1)
Probability Density Function
105(2)
Cumulative Distribution Function
107(1)
Availability Probabilities
107(1)
Lusser's Law
108(1)
Availability Concepts
109(2)
Hardware Reliability: The Bathtub Curve
109(1)
Software Reliability: The Bathtub Curve
110(1)
Simple Math of Availability
111(13)
Availability
111(1)
Nonavailability
112(1)
Mean Time between Failures
112(1)
Mean Time to Repair
112(1)
Online Availability Tool
113(1)
Availability Equation I: Time Factors in an Outage
114(2)
Availability Equation II
116(1)
Effect of Redundant Blocks on Availability
117(1)
Parallel (Redundant) Components
118(1)
Two Parallel Blocks: Example
118(1)
Combinations of Series and Parallel Blocks
119(1)
Complex Systems
120(1)
System Failure Combinations
120(1)
Complex Systems Solution Methods
121(1)
Real-Life Example: Cisco Network Configuration
121(1)
Configuration A
121(1)
Configuration B
122(1)
Summary of Block Considerations
123(1)
Sample Availability Calculations versus Costs
124(1)
Calculation 1 Server Is 99% Available
124(1)
Calculation 2 Server Is 99.99% Available
124(1)
Availability: MTBFs and Failure Rate
124(15)
Availability Factors
125(1)
Planned versus Unplanned Outages
125(1)
Planned Downtime: Planned Downtime Breakdown
126(2)
Unplanned Downtime
128(1)
Security: The New Downtime
128(1)
Disasters: Breakdown of Causes
128(1)
Power: Downtime Causes
129(1)
Power Issues Addenda
129(1)
So What?
130(1)
External Electromagnetic Radiation Addendum
131(1)
Power: Recovery Timescales for Uninterruptible Power Supply
131(1)
Causes of Data Loss
132(1)
Pandemics? Disaster Waiting to Happen?
133(1)
Disasters: Learning the Hard Way
133(1)
Other Downtime Gotchas
133(2)
Downtime Gotchas: Survey Paper
135(1)
Downtime Reduction Initiatives
135(1)
Low Impact Outages
135(1)
Availability: A Lesson in Design
136(1)
Availability: Humor in an Outage---Part I
137(1)
Availability: Humor in an Outage---Part II
137(1)
So What?
137(1)
Application Nonavailability
137(1)
Traditional Outage Reasons
138(1)
Modern Outage Reasons
138(1)
Summary
139(2)
6 High Availability: Practice
141(56)
Central Site
141(1)
Service Domain Concept
141(3)
Sample Domain Architecture
143(1)
Planning for Availability---Starting Point
144(1)
The HA Design Spectrum
145(18)
Availability by Systems Design/Modification
145(1)
Availability by Engineering Design
145(1)
Self-Healing Hardware and Software
145(1)
Self-Healing and Other Items
146(1)
Availability by Application Design: Poor Application Design
147(1)
Conventional Programs
147(1)
Web Applications
147(2)
Availability by Configuration
149(1)
Hardware
149(1)
Data
150(1)
Networks
150(1)
Operating System
150(1)
Environment
150(1)
Availability by Outside Consultancy
151(1)
Availability by Vendor Support
151(1)
Availability by Proactive Monitoring
151(1)
Availability by Technical Support Excellence
152(1)
Availability by Operations Excellence
152(1)
First Class Runbook
153(1)
Software Level Issues
153(1)
System Time
154(1)
Performance and Capacity
154(1)
Data Center Efficiency
154(1)
Availability by Retrospective Analysis
154(1)
Availability by Application Monitoring
155(1)
Availability by Automation
155(1)
Availability by Reactive Recovery
156(1)
Availability by Partnerships
157(1)
Availability by Change Management
158(1)
Availability by Performance/Capacity Management
158(1)
Availability by Monitoring
159(1)
Availability by Cleanliness
159(1)
Availability by Anticipation
159(1)
Predictive Maintenance
159(1)
Availability by Teamwork
160(1)
Availability by Organization
160(1)
Availability by Eternal Vigilance
161(1)
Availability by Location
162(1)
A Word on Documentation
162(1)
Network Reliability/Availability
163(6)
Protocols and Redundancy
163(1)
Network Types
164(1)
Network Outages
164(1)
Network Design for Availability
165(1)
Network Security
166(1)
File Transfer Reliability
167(2)
Network DR
169(1)
Software Reliability
169(6)
Software Quality
169(1)
Software: Output Verification
170(1)
Example 1
171(1)
Example 2
171(1)
Example 3
171(1)
Software Reliability: Problem Flow
171(1)
Software Testing Steps
172(1)
Software Documentation
173(1)
Software Testing Model
173(2)
Software Reliability---Models
175(10)
The Software Scenario
175(1)
SRE Models
175(1)
Model Entities
176(1)
SRE Models: Shape Characterization
177(1)
SRE Models: Time-Based versus Defect-Based
178(1)
Software Reliability Growth Model
178(2)
Software Reliability Model: Defect Count
180(1)
Software Reliability: IEEE Standard 1633-2008
181(1)
Software Reliability: Hardening
182(1)
Software Reliability: Installation
182(1)
Software Reliability: Version Control
183(1)
Software: Penetration Testing
183(1)
Software: Fault Tolerance
184(1)
Software Error Classification
185(1)
Heisenbug
185(1)
Bohrbug
185(1)
Reliability Properties of Software
186(1)
ACID Properties
186(1)
Two-Phase Commit
186(1)
Software Reliability: Current Status
187(1)
Software Reliability: Assessment Questions
188(1)
Software Universe and Summary
188(1)
Subsystem Reliability
189(5)
Hardware Outside the Server
189(1)
Disk Subsystem Reliability
190(1)
Disk Subsystem RAS
190(1)
Tape Reliability/RAS
191(1)
Availability: Other Peripherals
192(1)
Attention to Detail
193(1)
Liveware Reliability
193(1)
Summary
194(3)
Be Prepared for Big Brother!
195(2)
7 High Availability: SLAs, Management, and Methods
197(52)
Introduction
197(1)
Preliminary Activities
198(2)
Pre-Production Activities
198(1)
BC Plan
199(1)
BC: Best Practice
199(1)
Management Disciplines
200(1)
Service Level Agreements
201(8)
SLA Introduction
201(1)
SLA: Availability and QoS
201(1)
Elements of SLAs
201(2)
Types of SLAs
203(1)
Potential Business Benefits of SLAs
203(1)
Potential IT Benefits of SLAs
204(1)
IT Service Delivery
204(1)
SLA: Structure and Samples
205(1)
SLA: How Do We Quantify Availability?
206(1)
SLA: Reporting of Availability
206(1)
Reneging on SLAs
207(2)
HA Management: The Project
209(14)
Start-Up and Design Phase
209(1)
The Management Flow
210(1)
The Management Framework
210(1)
Project Definition Workshop
210(2)
Outline of the PDW
212(1)
PDW Method Overview
212(1)
Project Initiation Document
213(1)
PID Structure and Purpose
213(2)
Multistage PDW
215(1)
Delphi Techniques and Intensive Planning
215(1)
Delphi Technique
215(1)
Delphi: The Steps
216(1)
Intensive Planning
217(1)
FMEA Process
217(1)
FMEA: An Analogy
218(1)
FMEA: The Steps
218(1)
FMECA = FMEA + Criticality
219(1)
Risk Evaluation and Priority: Risk Evaluation Methods
219(1)
Component Failure Impact Analysis
220(1)
CFIA Development---A Walkthrough and Risk Analysis
220(1)
CFIA Table: Schematic
221(1)
Quantitative CFIA
222(1)
CFIA: Other Factors
222(1)
Management of Operations Phase
223(2)
Failure Reporting and Corrective Action System
223(1)
Introduction
223(1)
FRACAS: Steps for Handling Failures
223(2)
HA Operations: Supporting Disciplines
225(14)
War Room
225(1)
War Room Location
225(1)
Documentation
225(1)
Change/Configuration Management
226(1)
Change Management and Control: Best Practice
226(1)
Change Operations
227(1)
Patch Management
228(1)
Performance Management
229(1)
Introduction
229(1)
Overview
229(1)
Security Management
230(1)
Security: Threats or Posturing?
230(1)
Security: Best Practice
231(1)
Problem Determination
231(1)
Problems: Short Term
232(1)
Problems: After the Event
232(1)
Event Management
233(1)
Fault Management
233(1)
Faults and What to Do about Them
233(1)
System Failure: The Response Stages
234(1)
HA Plan B: What's That?
235(1)
Plan B: Example I
235(1)
Plan B: Example II
235(1)
What? IT Problem Recovery without IT?
235(1)
Faults and What Not to Do
236(1)
Outages: Areas for Inaction
236(1)
Problem Management
237(1)
Managing Problems
237(1)
Problems: Best Practice
237(1)
Help Desk Architecture and Implementation
238(1)
Escalation Management
238(1)
Resource Management
238(1)
Service Monitors
239(6)
Availability Measurement
239(1)
Monitor Layers
240(1)
System Resource Monitors
241(1)
Synthetic Workload: Generic Requirements
241(1)
Availability Monitors
242(1)
General EUE Tools
243(1)
Availability Benchmarks
243(1)
Availability: Related Monitors
244(1)
Disaster Recovery
244(1)
The Viewpoint Approach to Documentation
245(1)
Summary
245(4)
SECTION III VENDORS AND HIGH AVAILABILITY
8 High Availability: Vendor Products
249(18)
IBM Availability and Reliability
250(4)
IBM Hardware
250(1)
Virtualization
251(1)
IBM PowerVM
251(1)
IBM Series x
251(1)
IBM Clusters
251(1)
Z Series Parallel Sysplex
251(1)
Sysplex Structure and Purpose
252(1)
Parallel Sysplex Schematic
252(1)
IBM: High Availability Services
253(1)
IBM Future Series/System
253(1)
Oracle Sun HA
254(2)
Sun HA
254(1)
Hardware Range
254(1)
Super Cluster
255(1)
Oracle Sun M5-32
255(1)
Oracle HA Clusters
255(1)
Oracle RAC 12c
255(1)
Hewlett-Packard HA
256(4)
HP Hardware and Software
256(1)
Servers
256(1)
Software
256(1)
Services
256(1)
Servers: Integrity Servers
257(1)
HP NonStop Integrity Servers
258(1)
NonStop Architecture and Stack
258(1)
NonStop Stack Functions
259(1)
Stratus Fault Tolerance
260(1)
Automated Uptime Layer
260(1)
ActiveService Architecture
261(1)
Other Clusters
261(3)
Veritas Clusters (Symantec)
261(1)
Supported Platforms
261(1)
Databases, Applications, and Replicators
262(1)
Linux Clusters
262(1)
Overview
262(1)
Oracle Clusterware
263(1)
SUSE Linux Clustering
263(1)
Red Hat Linux Clustering
263(1)
Linux in the Clouds
263(1)
Linux HPC HA
263(1)
Linux-HA
263(1)
Carrier Grade Linux
263(1)
VMware Clusters
264(1)
The Web and HA
264(1)
Service Availability Software
264(1)
Continuity Software
265(1)
Continuity Software: Services
265(1)
Summary
265(2)
9 High Availability: Transaction Processing and Databases
267(26)
Transaction Processing Systems
267(1)
Some TP Systems: OLTP Availability Requirements
268(1)
TP Systems with Databases
268(3)
The X/Open Distributed Transaction Processing Model: XA and XA+ Concepts
269(1)
CICS and RDBMS
270(1)
Relational Database Systems
271(1)
Some Database History
271(1)
Early RDBMS
271(1)
SQL Server and HA
272(3)
Microsoft SQL Server 2014 Community Technology Preview 1
273(1)
SQL Server HA Basics
273(1)
SQL Server AlwaysOn Solutions
273(1)
Failover Cluster Instances
273(1)
Availability Groups
274(1)
Database Mirroring
274(1)
Log Shipping
274(1)
References
274(1)
Oracle Database and HA
275(2)
Introduction
275(1)
Oracle Databases
275(1)
Oracle 11g (R2.1) HA
275(1)
Oracle 12c
276(1)
Oracle MAA
276(1)
Oracle High Availability Playing Field
276(1)
MySQL
277(1)
MySQL: HA Features
278(1)
MySQL: HA Services and Support
278(1)
IBM DB2 Database and HA
278(2)
DB2 for Windows, UNIX, and Linux
279(1)
DB2 HA Feature
279(1)
High Availability DR
279(1)
DB2 Replication: SQL and Q Replication
280(1)
DB2 for i
280(1)
DB2 10 for z/OS
280(1)
DB2 pureScale
280(1)
InfoSphere Replication Server for z/OS
281(1)
DB2 Cross Platform Development
281(1)
IBM Informix Database and HA
281(3)
Introduction (Informix 11.70)
281(1)
Availability Features
282(1)
Fault Tolerance
282(1)
Informix MACH 11 Clusters
282(1)
Connection Manager
283(1)
Informix 12.1
283(1)
Ingres Database and HA
284(1)
Ingres RDBMS
284(1)
Ingres High Availability Option
284(1)
Sybase Database and HA
285(3)
Sybase High Availability Option
285(1)
Terminology
285(1)
Use of SAP ASE
286(1)
Vendor Availability
286(1)
ASE Cluster Requirements
286(1)
Business Continuity with SAP Sybase
287(1)
NoSQL
287(1)
NonStopSQL Database
288(1)
Summary
289(4)
SECTION IV CLOUDS AND VIRTUALIZATION
10 High Availability: The Cloud and Virtualization
293(14)
Introduction
293(5)
What Is Cloud Computing?
294(1)
Cloud Characteristics
294(1)
Functions of the Cloud
294(1)
Cloud Service Models
295(1)
Cloud Deployment Models
296(1)
Resource Management in the Cloud
297(1)
SLAs and the Cloud
297(1)
Cloud Availability and Security
298(2)
Cloud Availability
298(1)
Cloud Outages: A Review
298(1)
Aberdeen: Cloud Storage Outages
299(1)
Cloud Security
299(1)
Virtualization
300(3)
What Is Virtualization?
300(1)
Full Virtualization
301(1)
Paravirtualization
302(1)
Security Risks in Virtual Environments
303(1)
Vendors and Virtualization
303(3)
IBM PowerVM
303(1)
IBM z/VM
304(1)
VMware VSphere, ESX, and ESXi
304(1)
Microsoft Hyper-V
304(1)
HP Integrity Virtual Machines
304(1)
Linux KVM
304(1)
Solaris Zones
304(1)
Xen
305(1)
Virtualization and HA
305(1)
Virtualization Information Sources
306(1)
Summary
306(1)
11 Disaster Recovery Overview
307(28)
DR Background
307(4)
A DR Lesson from Space
307(1)
Disasters Are Rare Aren't They?
308(1)
Key Message: Be Prepared
308(1)
DR Invocation Reasons: Forrester Survey
309(1)
DR Testing: Kaseya Survey
310(1)
DR: A Point to B Point
310(1)
Backup/Restore
311(7)
Overview
311(1)
Backup Modes
311(1)
Cold (Offline)
311(1)
Warm (Online)
311(1)
Hot (Online)
311(1)
Backup Types
312(1)
Full Backup
312(1)
Incremental Backup
312(1)
Multilevel Incremental Backup
312(1)
Differential Backup
312(1)
Synthetic Backup
312(1)
Progressive Backup
312(1)
Data Deduplication
313(1)
Data Replication
314(1)
Replication Agents
315(1)
Asynchronous Replication
315(1)
Synchronous Replication
316(1)
Heterogeneous Replication
316(1)
Other Types of Backup
316(1)
DR Recovery Time Objective: WAN Optimization
317(1)
Backup Product Assessments
318(3)
Virtualization Review
318(1)
Gartner Quadrant Analysis
318(1)
Backup/Archive: Tape or Disk?
319(1)
Bit Rot
319(1)
Tape Costs
320(1)
DR Concepts and Considerations
321(3)
The DR Scenario
321(1)
Who Is Involved?
321(1)
DR Objectives
322(1)
Recovery Factors
322(1)
Tiers of DR Availability
323(1)
DR and Data Tiering
323(1)
A Key Factor
324(1)
The DR Planning Process
324(6)
DR: The Steps Involved
324(1)
In-House DR
324(3)
DR Requirements in Operations
327(1)
Hardware
327(1)
Software
327(1)
Applications
327(1)
Data
327(1)
DR Cost Considerations
328(1)
The Backup Site
328(1)
Third-Party DR (Outsourcing)
329(1)
DR and the Cloud
329(1)
HA/DR Options Described
329(1)
Disaster Recovery Templates
330(1)
Summary
330(5)
SECTION V APPENDICES AND HARD SUMS
Appendix 1
335(38)
Reliability and Availability: Terminology
335(36)
Summary
371(2)
Appendix 2
373(14)
Availability: MTBF/MTTF/MTTR Discussion
373(8)
Interpretation of MTTR
373(2)
Interpretation of MTTF
375(1)
Interpretation of MTBF
375(1)
MTTF and MTBF---The Difference
375(2)
MTTR: Ramp-Up Time
377(1)
Serial Blocks and Availability---NB
378(1)
Typical MTBF Figures
379(1)
Gathering MTTF/MTBF Figures
380(1)
Outage Records and MTTx Figures
380(1)
MTTF and MTTR Interpretation
381(6)
MTTF versus Lifetime
381(1)
Some MTxx Theory
381(1)
MTBF/MTTF Analogy
382(1)
Final Word on MTxx
382(1)
Forrester/Zenoss MTxx Definitions
383(1)
Summary
384(3)
Appendix 3
387(18)
Your HA/DR Route Map and Kitbag
387(16)
Road to HA/DR
387(1)
The Stages
387(4)
A Short DR Case Study
391(1)
HA and DR: Total Cost of Ownership
392(1)
TCO Factors
392(1)
Cloud TCO
393(1)
TCO Summary
394(1)
Risk Assessment and Management
394(1)
Who Are the Risk Stakeholders?
395(1)
Where Are the Risks?
395(1)
How Is Risk Managed?
395(1)
Availability: Project Risk Management
396(4)
Availability: Deliverables Risk Management
400(2)
Deliverables Risk Management Plan: Specific Risk Areas
402(1)
The IT Role in All This
403(1)
Summary
403(2)
Appendix 4
405(56)
Availability: Math and Other Topics
405(53)
Lesson 1 Multiplication, Summation, and Integration Symbols
405(1)
Mathematical Distributions
405(1)
Lesson 2 General Theory of Reliability and Availability
406(1)
Reliability Distributions
406(4)
Lesson 3 Parallel Components (Blocks)
410(1)
Availability: m-from-n Components
410(1)
m-from-n Examples
410(1)
m-from-n Theory
410(1)
m-from-n Redundant Blocks
411(1)
Active and Standby Redundancy
412(1)
Introduction
412(1)
Summary of Redundancy Systems
412(1)
Types of Redundancy
413(1)
Real m-from-n Example
414(1)
Math of m-from-n Configurations
415(1)
Standby Redundancy
415(1)
An Example of These Equations
415(1)
Online Tool for Parallel Components: Typical Calculation
416(1)
NB: Realistic IT Redundancy
417(1)
Overall Availability Graphs
418(1)
Try This Availability Test
419(1)
Lesson 4 Cluster Speedup Formulae
419(1)
Amdahl's Law
420(1)
Gunther's Law
421(2)
Gustafson's Law
423(1)
Amdahl versus Gunther
424(1)
Speedup: Sun-Ni Law
425(1)
Lesson 5 Some RAID and EC Math
426(1)
RAID Configurations
426(3)
Erasure Codes
429(3)
Lesson 6 Math of Monitoring
432(1)
Ping: Useful Aside
432(3)
Ping Sequence Sample
435(1)
Lesson 7 Software Reliability/Availability
435(1)
Overview
435(1)
Software Reliability Theory
436(1)
The Failure/Defect Density Models
437(7)
Lesson 8 Additional RAS Features
444(1)
Upmarket RAS Features
444(1)
Processor
444(1)
I/O Subsystem
445(1)
Memory Availability
445(1)
Fault Detection and Isolation
445(1)
Clocks and Service Processor
446(1)
Serviceability
446(1)
Predictive Failure Analysis
447(1)
Lesson 9 Triple Modular Redundancy
447(1)
Lesson 10 Cyber Crime, Security, and Availability
448(1)
The Issue
448(1)
The Solution
449(1)
Security Analytics
449(1)
Zero Trust Security Model
449(1)
Security Information Event Management
450(1)
Security Management Flow
450(1)
SIEM Best Practices
451(1)
Security: Denial of Service
452(1)
Security: Insider Threats
452(1)
Security: Mobile Devices (BYOD)
453(1)
BYOD Security Steps
454(1)
Security: WiFi in the Enterprise
455(1)
Security: The Database
455(1)
Distributed DoS
456(1)
Security: DNS Servers
456(1)
Cost of Cyber Crime
457(1)
Cost of Cyber Crime Prevention versus Risk
457(1)
Security Literature
458(1)
Summary
458(3)
Appendix 5
461(18)
Availability: Organizations and References
461(18)
Reliability/Availability Organizations
461(1)
Reliability Information Analysis Center
462(1)
Uptime Institute
462(1)
IEEE Reliability Society
462(1)
Storage Networking Industry Association
463(1)
Availability Digest
463(1)
Service Availability Forum
463(1)
Carnegie Mellon Software Engineering Institute
464(1)
ROC Project---Software Resilience
465(1)
Business Continuity Today
465(1)
Disaster Recovery Institute
465(1)
Business Continuity Institute
466(1)
Information Availability Institute
466(1)
International Working Group on Cloud Computing Resiliency
466(1)
TMMi Foundation
466(1)
Center for Software Reliability
467(1)
CloudTweaks
467(1)
Security Organizations
467(1)
Security? I Can't Be Bothered
467(1)
Cloud Security Alliance
468(1)
CSO Online
468(1)
Dark READING
469(1)
Cyber Security and Information Systems IAC
469(1)
Center for International Security and Cooperation
469(1)
Other Reliability/Security Resources
469(1)
Books, Articles, and Websites
469(1)
Major Reliability/Availability Information Sources
469(1)
Other Information Sources
470(9)
Appendix 6
479(10)
Service Management: Where Next?
479(10)
Information Technology Infrastructure Library
479(1)
ITIL Availability Management
480(1)
Service Architectures
480(3)
Architectures
483(1)
Availability Architectures: HA Documentation
483(1)
Clouds and Architectures
484(5)
Appendix 7
489(2)
Index 491
Dr. Terry Critchley is a retired IT consultant living near Manchester in the United Kingdom. He studied physics at the Manchester University (using some of Rutherford's original equipment!), gained an Honours degree in physics, and 5 years later with a PhD in nuclear physics. He then joined IBM as a Systems Engineer and spent 24 years there in a variety of accounts and specializations, later served in Oracle for 3 years. Terry joined his last company, Sun Microsystems in 1996 and left there in 2001, after planning and running the Sun European Y2000 education, and then spent a year at a major UK bank.

In 1993 he initiated and coauthored a book on Open Systems for the British Computer Society (Open Systems: The Reality) and has recently written this book IT Services High Availability. He is also mining swathes of his old material for his next book, Service Performance and Management.