E-raamat: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

  • Formaat: 562 pages, black & white illustrations
  • Ilmumisaeg: 16-Mar-2017
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491903100
  • Formaat - EPUB+DRM
  • Hind: 39,05 EUR*
  • Lisa soovinimekirja
  • Lisa ostukorvi
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks.
  • Formaat: 562 pages, black & white illustrations
  • Ilmumisaeg: 16-Mar-2017
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491903100

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    E-raamatu lugemiseks on vaja luua Adobe ID ning laadida arvutisse Adobe Digital Editions. Lähemalt siit. E-raamatut saab lugeda ning alla laadida kuni 6'de seadmesse.
    E-raamatut ei saa lugeda Amazon Kindle's. Ülejäänud meie e-poes pakutavad e-lugerid võimaldavad lugeda Adobe ID-ga kaitstud e-raamatuid.

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architecture
Preface xiii
Part I Foundations of Data Systems
1 Reliable, Scalable, and Maintainable Applications
3(24)
Thinking About Data Systems
4(2)
Reliability
6(4)
Hardware Faults
7(1)
Software Errors
8(1)
Human Errors
9(1)
How Important Is Reliability?
10(1)
Scalability
10(8)
Describing Load
11(2)
Describing Performance
13(4)
Approaches for Coping with Load
17(1)
Maintainability
18(4)
Operability: Making Life Easy for Operations
19(1)
Simplicity: Managing Complexity
20(1)
Evolvability: Making Change Easy
21(1)
Summary
22(5)
2 Data Models and Query Languages
27(42)
Relational Model Versus Document Model
28(14)
The Birth of NoSQL
29(1)
The Object-Relational Mismatch
29(4)
Many-to-One and Many-to-Many Relationships
33(3)
Are Document Databases Repeating History?
36(2)
Relational Versus Document Databases Today
38(4)
Query Languages for Data
42(7)
Declarative Queries on the Web
44(2)
MapReduce Querying
46(3)
Graph-Like Data Models
49(14)
Property Graphs
50(2)
The Cypher Query Language
52(1)
Graph Queries in SQL
53(2)
Triple-Stores and SPARQL
55(5)
The Foundation: Datalog
60(3)
Summary
63(6)
3 Storage and Retrieval
69(42)
Data Structures That Power Your Database
70(20)
Hash Indexes
72(4)
SSTables and LSM-Trees
76(3)
B-Trees
79(4)
Comparing B-Trees and LSM-Trees
83(2)
Other Indexing Structures
85(5)
Transaction Processing or Analytics?
90(5)
Data Warehousing
91(2)
Stars and Snowflakes: Schemas for Analytics
93(2)
Column-Oriented Storage
95(8)
Column Compression
97(2)
Sort Order in Column Storage
99(2)
Writing to Column-Oriented Storage
101(1)
Aggregation: Data Cubes and Materialized Views
101(2)
Summary
103(8)
4 Encoding and Evolution
111(40)
Formats for Encoding Data
112(16)
Language-Specific Formats
113(1)
JSON, XML, and Binary Variants
114(3)
Thrift and Protocol Buffers
117(5)
Avro
122(5)
The Merits of Schemas
127(1)
Modes of Dataflow
128(11)
Dataflow Through Databases
129(2)
Dataflow Through Services: REST and RPC
131(5)
Message-Passing Dataflow
136(3)
Summary
139(12)
Part II Distributed Data
5 Replication
151(48)
Leaders and Followers
152(9)
Synchronous Versus Asynchronous Replication
153(2)
Setting Up New Followers
155(1)
Handling Node Outages
156(2)
Implementation of Replication Logs
158(3)
Problems with Replication Lag
161(7)
Reading Your Own Writes
162(2)
Monotonic Reads
164(1)
Consistent Prefix Reads
165(2)
Solutions for Replication Lag
167(1)
Multi-Leader Replication
168(9)
Use Cases for Multi-Leader Replication
168(3)
Handling Write Conflicts
171(4)
Multi-Leader Replication Topologies
175(2)
Leaderless Replication
177(15)
Writing to the Database When a Node Is Down
177(4)
Limitations of Quorum Consistency
181(2)
Sloppy Quorums and Hinted Handoff
183(1)
Detecting Concurrent Writes
184(8)
Summary
192(7)
6 Partitioning
199(22)
Partitioning and Replication
200(1)
Partitioning of Key-Value Data
201(5)
Partitioning by Key Range
202(1)
Partitioning by Hash of Key
203(2)
Skewed Workloads and Relieving Hot Spots
205(1)
Partitioning and Secondary Indexes
206(3)
Partitioning Secondary Indexes by Document
206(2)
Partitioning Secondary Indexes by Term
208(1)
Rebalancing Partitions
209(5)
Strategies for Rebalancing
210(3)
Operations: Automatic or Manual Rebalancing
213(1)
Request Routing
214(2)
Parallel Query Execution
216(1)
Summary
216(5)
7 Transactions
221(52)
The Slippery Concept of a Transaction
222(11)
The Meaning of ACID
223(5)
Single-Object and Multi-Object Operations
228(5)
Weak Isolation Levels
233(18)
Read Committed
234(3)
Snapshot Isolation and Repeatable Read
237(5)
Preventing Lost Updates
242(4)
Write Skew and Phantoms
246(5)
Serializability
251(15)
Actual Serial Execution
252(5)
Two-Phase Locking (2PL)
257(4)
Serializable Snapshot Isolation (SSI)
261(5)
Summary
266(7)
8 The Trouble with Distributed Systems
273(48)
Faults and Partial Failures
274(3)
Cloud Computing and Supercomputing
275(2)
Unreliable Networks
277(10)
Network Faults in Practice
279(1)
Detecting Faults
280(1)
Timeouts and Unbounded Delays
281(3)
Synchronous Versus Asynchronous Networks
284(3)
Unreliable Clocks
287(13)
Monotonic Versus Time-of-Day Clocks
288(1)
Clock Synchronization and Accuracy
289(2)
Relying on Synchronized Clocks
291(4)
Process Pauses
295(5)
Knowledge, Truth, and Lies
300(10)
The Truth Is Defined by the Majority
300(4)
Byzantine Faults
304(2)
System Model and Reality
306(4)
Summary
310(11)
9 Consistency and Consensus
321(68)
Consistency Guarantees
322(2)
Linearizability
324(15)
What Makes a System Linearizable?
325(5)
Relying on Linearizability
330(2)
Implementing Linearizable Systems
332(3)
The Cost of Linearizability
335(4)
Ordering Guarantees
339(13)
Ordering and Causality
339(4)
Sequence Number Ordering
343(5)
Total Order Broadcast
348(4)
Distributed Transactions and Consensus
352(21)
Atomic Commit and Two-Phase Commit (2PC)
354(6)
Distributed Transactions in Practice
360(4)
Fault-Tolerant Consensus
364(6)
Membership and Coordination Services
370(3)
Summary
373(16)
Part III Derived Data
10 Batch Processing
389(50)
Batch Processing with Unix Tools
391(6)
Simple Log Analysis
391(3)
The Unix Philosophy
394(3)
MapReduce and Distributed Filesystems
397(22)
MapReduce Job Execution
399(4)
Reduce-Side Joins and Grouping
403(5)
Map-Side Joins
408(3)
The Output of Batch Workflows
411(3)
Comparing Hadoop to Distributed Databases
414(5)
Beyond MapReduce
419(10)
Materialization of Intermediate State
419(5)
Graphs and Iterative Processing
424(2)
High-Level APIs and Languages
426(3)
Summary
429(10)
11 Stream Processing
439(50)
Transmitting Event Streams
440(11)
Messaging Systems
441(5)
Partitioned Logs
446(5)
Databases and Streams
451(13)
Keeping Systems in Sync
452(2)
Change Data Capture
454(3)
Event Sourcing
457(2)
State, Streams, and Immutability
459(5)
Processing Streams
464(15)
Uses of Stream Processing
465(3)
Reasoning About Time
468(4)
Stream Joins
472(4)
Fault Tolerance
476(3)
Summary
479(10)
12 The Future of Data Systems
489(64)
Data Integration
490(9)
Combining Specialized Tools by Deriving Data
490(4)
Batch and Stream Processing
494(5)
Unbundling Databases
499(16)
Composing Data Storage Technologies
499(5)
Designing Applications Around Dataflow
504(5)
Observing Derived State
509(6)
Aiming for Correctness
515(18)
The End-to-End Argument for Databases
516(5)
Enforcing Constraints
521(3)
Timeliness and Integrity
524(4)
Trust, but Verify
528(5)
Doing the Right Thing
533(10)
Predictive Analytics
533(3)
Privacy and Tracking
536(7)
Summary
543(10)
Glossary 553(6)
Index 559
Martin Kleppmann is a Senior Software Engineer at LinkedIn. He is a co-founder of Rapportive, a startup that was acquired by LinkedIn.