Foreword |
|
xvii | |
Preface |
|
xix | |
Acknowledgments |
|
xxi | |
About This Book |
|
xxii | |
Part 1 Introduction |
|
1 | (34) |
|
1 NoSQL: It's about making intelligent choices |
|
|
3 | (12) |
|
|
4 | (2) |
|
1.2 NoSQL business drivers |
|
|
6 | (2) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
8 | (5) |
|
Case study: LiveJournal's Memcache |
|
|
9 | (1) |
|
Case study: Google's MapReduce-use commodity hardware to create search indexes |
|
|
10 | (1) |
|
Case study: Google's Bigtable-a table with a billion rows and a million columns |
|
|
11 | (1) |
|
Case study: Amazon's Dynamo-accept an order 24 hours a day, 7 days a week |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
12 | (1) |
|
|
13 | (2) |
|
|
15 | (20) |
|
2.1 Keeping components simple to promote reuse |
|
|
15 | (2) |
|
2.2 Using application tiers to simplify design |
|
|
17 | (4) |
|
2.3 Speeding performance by strategic use of RAM, SSD and disk |
|
|
21 | (1) |
|
2.4 Using consistent hashing to keep your cache current |
|
|
22 | (2) |
|
2.5 Comparing ACID and BASE-two methods of reliable database transactions |
|
|
24 | (4) |
|
RDBMS transaction control using ACID |
|
|
25 | (2) |
|
Non-RDBMS transaction control using BASE |
|
|
27 | (1) |
|
2.6 Achieving horizontal scalability with database sharding |
|
|
28 | (2) |
|
2.7 Understanding trade-offs with Brewer's CAP theorem |
|
|
30 | (2) |
|
|
32 | (1) |
|
|
33 | (1) |
|
|
33 | (2) |
Part 2 Database Patterns |
|
35 | (90) |
|
3 Foundational data architecture patterns |
|
|
37 | (25) |
|
3.1 What is a data architecture pattern? |
|
|
38 | (1) |
|
3.2 Understanding the row-store design pattern used in RDBMSs |
|
|
39 | (4) |
|
|
39 | (2) |
|
|
41 | (1) |
|
Analyzing the strengths and weaknesses of the row-store pattern |
|
|
42 | (1) |
|
3.3 Example: Using joins in a sales order |
|
|
43 | (2) |
|
3.4 Reviewing RDBMS implementation features |
|
|
45 | (6) |
|
|
45 | (2) |
|
Fixed data definition language and typed columns |
|
|
47 | (1) |
|
Using RDBMS views for security and access control |
|
|
48 | (1) |
|
RDBMS replication and synchronization |
|
|
49 | (2) |
|
3.5 Analyzing historical data with OLAP, data warehouse and business intelligence systems |
|
|
51 | (6) |
|
How data flows from operational systems to analytical systems |
|
|
52 | (2) |
|
Getting familiar with OLAP concepts |
|
|
54 | (1) |
|
Ad hoc reporting using aggregates |
|
|
55 | (2) |
|
3.6 Incorporating high availability and read-mostly systems |
|
|
57 | (1) |
|
3.7 Using hash trees in revision control systems and database synchronization |
|
|
58 | (2) |
|
|
60 | (1) |
|
|
60 | (1) |
|
|
61 | (1) |
|
4 NoSQL data architecture patterns |
|
|
62 | (34) |
|
|
63 | (9) |
|
What is a key-value store? |
|
|
64 | (1) |
|
Benefits of using a key-value store |
|
|
65 | (3) |
|
|
68 | (2) |
|
Use case: storing web pages in a key-value store |
|
|
70 | (1) |
|
Use case: Amazon simple storage service (S3) |
|
|
71 | (1) |
|
|
72 | (9) |
|
Overview of a graph store |
|
|
72 | (2) |
|
Linking external data with the RDF standard |
|
|
74 | (1) |
|
Use cases for graph stores |
|
|
75 | (6) |
|
4.3 Column family (Bigtable) stores |
|
|
81 | (5) |
|
|
82 | (1) |
|
Understanding column family keys |
|
|
82 | (1) |
|
Benefits of column family systems |
|
|
83 | (2) |
|
Case study: storing analytical information in Bigtable |
|
|
85 | (1) |
|
Case study: Google Maps stores geographic information in Bigtable |
|
|
85 | (1) |
|
Case study: using a column family to store user preferences |
|
|
86 | (1) |
|
|
86 | (5) |
|
|
87 | (1) |
|
|
88 | (1) |
|
|
88 | (1) |
|
|
89 | (1) |
|
Document store implementations |
|
|
89 | (1) |
|
Case study: ad server with MongoDB |
|
|
90 | (1) |
|
Case study: CouchDB, a large-scale object database |
|
|
91 | (1) |
|
4.5 Variations of NoSQL architectural patterns |
|
|
91 | (4) |
|
Customization for RAM or SSD stores |
|
|
92 | (1) |
|
|
92 | (1) |
|
|
93 | (2) |
|
|
95 | (1) |
|
|
95 | (1) |
|
|
96 | (29) |
|
5.1 What is a native XML database? |
|
|
97 | (3) |
|
5.2 Building applications with a native XML database |
|
|
100 | (10) |
|
Loading data can be as simple as drag-and-drop |
|
|
101 | (1) |
|
Using collections to group your XML documents |
|
|
102 | (3) |
|
Applying simple queries to transform complex data with XPath |
|
|
105 | (1) |
|
Transforming your data with XQuery |
|
|
106 | (3) |
|
Updating documents with XQuery updates |
|
|
109 | (1) |
|
XQuery full-text search standards |
|
|
110 | (1) |
|
5.3 Using XML standards within native XML databases |
|
|
110 | (2) |
|
5.4 Designing and validating your data with XML Schema and Schematron |
|
|
112 | (3) |
|
|
112 | (1) |
|
Using Schematron to check document rules |
|
|
113 | (2) |
|
5.5 Extending XQuery with custom modules |
|
|
115 | (1) |
|
5.6 Case study: using NoSQL at the Office of the Historian at the Department of State |
|
|
115 | (4) |
|
5.7 Case study: managing financial derivatives with MarkLogic |
|
|
119 | (3) |
|
Why financial derivatives are difficult to store in RDBMSs |
|
|
119 | |
|
An investment bank switches from |
|
|
20 | (99) |
|
RDBMSs to one native XML system |
|
|
119 | (2) |
|
Business benefits of moving to a native XML document store |
|
|
121 | (1) |
|
|
122 | (1) |
|
|
122 | (1) |
|
|
123 | (2) |
Part 3 NoSQL Solutions |
|
125 | (82) |
|
6 Using NoSQL to manage big data |
|
|
127 | (27) |
|
6.1 What is a big data NoSQL solution? |
|
|
128 | (4) |
|
6.2 Getting linear scaling in your data center |
|
|
132 | (1) |
|
6.3 Understanding linear scalability and expressivity |
|
|
133 | (2) |
|
6.4 Understanding the types of big data problems |
|
|
135 | (1) |
|
6.5 Analyzing big data with a shared-nothing architecture |
|
|
136 | (1) |
|
6.6 Choosing distribution models: master-slave versus peer-to-peer |
|
|
137 | (2) |
|
6.7 Using MapReduce to transform your data over distributed systems |
|
|
139 | (4) |
|
MapReduce and distributed filesystems |
|
|
140 | (2) |
|
How MapReduce allows efficient transformation of big data problems |
|
|
142 | (1) |
|
6.8 Four ways that NoSQL systems handle big data problems |
|
|
143 | (3) |
|
Moving queries to the data, not data to the queries |
|
|
143 | (1) |
|
Using hash rings to evenly distribute data on a cluster |
|
|
144 | (1) |
|
Using replication to scale reads |
|
|
145 | (1) |
|
Letting the database distribute queries evenly to data nodes |
|
|
146 | (1) |
|
6.9 Case study: event log processing with Apache Flume |
|
|
146 | (4) |
|
Challenges of event log data analysis |
|
|
147 | (1) |
|
How Apache Flume works to gather distributed event data |
|
|
148 | (1) |
|
|
149 | (1) |
|
6.10 Case study: computer-aided discovery of health care fraud |
|
|
150 | (2) |
|
What is health care fraud detection? |
|
|
150 | (1) |
|
Using graphs and custom shared-memory hardware to detect health care fraud |
|
|
151 | (1) |
|
|
152 | (1) |
|
|
153 | (1) |
|
7 Finding information with NoSQL search |
|
|
154 | (18) |
|
7.1 What is NoSQL search? |
|
|
155 | (1) |
|
|
155 | (3) |
|
Comparing Boolean, full-text keyword and structured search models |
|
|
155 | (1) |
|
Examining the most common types of search |
|
|
156 | (2) |
|
7.3 Strategies and methods that make NoSQL search effective |
|
|
158 | (3) |
|
7.4 Using document structure to improve search quality |
|
|
161 | (1) |
|
7.5 Measuring search quality |
|
|
162 | (1) |
|
7.6 In-node indexes versus remote search services |
|
|
163 | (1) |
|
7.7 Case study: using MapReduce to create reverse indexes |
|
|
164 | (2) |
|
7.8 Case study: searching technical documentation |
|
|
166 | (2) |
|
What is technical document search? |
|
|
166 | (1) |
|
Retaining document structure in a NoSQL document store |
|
|
167 | (1) |
|
7.9 Case study: searching domain-specific languages-findability and reuse |
|
|
168 | (2) |
|
7.10 Apply your knowledge |
|
|
170 | (1) |
|
|
170 | (1) |
|
|
171 | (1) |
|
8 Building high-availability solutions with NoSQL |
|
|
172 | (20) |
|
8.1 What is a high-availability NoSQL database? |
|
|
173 | (1) |
|
8.2 Measuring availability of NoSQL databases |
|
|
174 | (4) |
|
Case study: the Amazon's S3 SLA |
|
|
176 | (1) |
|
Predicting system availability |
|
|
176 | (1) |
|
|
177 | (1) |
|
8.3 NoSQL strategies for high availability |
|
|
178 | (6) |
|
Using a load balancer to direct traffic to the least busy node |
|
|
178 | (1) |
|
Using high-availability distributed filesystems with NoSQL databases |
|
|
179 | (1) |
|
Case study: using HDFS as a high-availability filesystem to store master data |
|
|
180 | (2) |
|
Using a managed NoSQL service |
|
|
182 | (1) |
|
Case study: using Amazon DynamoDB for a high-availability data store |
|
|
182 | (2) |
|
8.4 Case study: using Apache Cassandra as a high-availability column family store |
|
|
184 | (3) |
|
Configuring data to node mappings with Cassandra |
|
|
185 | (2) |
|
8.5 Case study: using Couchbase as a high-availability document store |
|
|
187 | (2) |
|
|
189 | (1) |
|
|
190 | (2) |
|
9 Increasing agility with NoSQL |
|
|
192 | (15) |
|
9.1 What is software agility? |
|
|
193 | (3) |
|
Apply your knowledge: local or cloud-based deployment? |
|
|
195 | (1) |
|
|
196 | (3) |
|
9.3 Using document stores to avoid object-relational mapping |
|
|
199 | (2) |
|
9.4 Case study: using XRX to manage complex forms |
|
|
201 | (4) |
|
What are complex business forms? |
|
|
201 | (1) |
|
Using XRX to replace client JavaScript and object-relational mapping |
|
|
202 | (3) |
|
Understanding the impact of XRX on agility |
|
|
205 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
Part 4 Advanced Topics |
|
207 | (68) |
|
10 NoSQL and functional programming |
|
|
209 | (23) |
|
10.1 What is functional programming? |
|
|
210 | (9) |
|
Imperative programming is managing program state |
|
|
211 | (2) |
|
Functional programming is parallel transformation without side effects |
|
|
213 | (3) |
|
Comparing imperative and functional programming at scale |
|
|
216 | (1) |
|
Using referential transparency to avoid recalculating transforms |
|
|
217 | (2) |
|
10.2 Case study: using NetKernel to optimize web page content assembly |
|
|
219 | (3) |
|
Assembling nested content and tracking component dependencies |
|
|
219 | (1) |
|
Using NetKernel to optimize component regeneration |
|
|
220 | (2) |
|
10.3 Examples of functional programming languages |
|
|
222 | (1) |
|
10.4 Making the transition from imperative to functional programming |
|
|
223 | (3) |
|
Using functions as a parameter of a function |
|
|
223 | (1) |
|
Using recursion to process unstructured document data |
|
|
224 | (1) |
|
Moving from mutable to immutable variables |
|
|
224 | (1) |
|
Removing loops and conditionals |
|
|
224 | (1) |
|
The new cognitive style: from capturing state to isolated transforms |
|
|
225 | (1) |
|
Quality, validation and consistent unit testing |
|
|
225 | (1) |
|
Concurrency in functional programming |
|
|
226 | (1) |
|
10.5 Case study: building NoSQL systems with Erlang |
|
|
226 | (3) |
|
10.6 Apply your knowledge |
|
|
229 | (1) |
|
|
230 | (1) |
|
|
231 | (1) |
|
11 Security: protecting data in your NoSQL systems |
|
|
232 | (22) |
|
11.1 A security model for NoSQL databases |
|
|
233 | (4) |
|
Using services to mitigate the need for in-database security |
|
|
235 | (1) |
|
Using data warehouses and OLAP to mitigate the need for in-database security |
|
|
235 | (1) |
|
Summary of application versus database-layer security benefits |
|
|
236 | (1) |
|
11.2 Gathering your security requirements |
|
|
237 | (9) |
|
|
237 | (3) |
|
|
240 | (2) |
|
|
242 | (1) |
|
Encryption and digital signatures |
|
|
243 | (2) |
|
Protecting pubic websites from denial of service and injection attacks |
|
|
245 | (1) |
|
11.3 Case Study: access controls on key-value store-Amazon S3 |
|
|
246 | (3) |
|
Identity and Access Management (IAM) |
|
|
247 | (1) |
|
Access-control lists (ACL) |
|
|
247 | (1) |
|
|
248 | (1) |
|
11.4 Case study: using key visibility with Apache Accumulo |
|
|
249 | (1) |
|
11.5 Case study: using MarkLogic's RBAC model in secure publishing |
|
|
250 | (2) |
|
Using the MarkLogic RBAC security model to protect documents |
|
|
250 | (1) |
|
Using MarkLogic in secure publishing |
|
|
251 | (1) |
|
Benefits of the MarkLogic security model |
|
|
252 | (1) |
|
|
252 | (1) |
|
|
253 | (1) |
|
12 Selecting the right NoSQL solution |
|
|
254 | (21) |
|
12.1 What is architecture trade-off analysis? |
|
|
255 | (2) |
|
12.2 Team dynamics of database architecture selection |
|
|
257 | (3) |
|
|
258 | (1) |
|
Accounting for experience bias |
|
|
259 | (1) |
|
Using outside consultants |
|
|
259 | (1) |
|
12.3 Steps in architectural trade-off analysis |
|
|
260 | (3) |
|
12.4 Analysis through decomposition: quality trees |
|
|
263 | (4) |
|
Sample quality attributes |
|
|
264 | (2) |
|
Evaluating hybrid and cloud architectures |
|
|
266 | (1) |
|
12.5 Communicating the results to stakeholders |
|
|
267 | (4) |
|
Using quality trees as navigational maps |
|
|
267 | (2) |
|
|
269 | (1) |
|
Using quality trees to communicate project risks |
|
|
270 | (1) |
|
12.6 Finding the right proof-of-architecture pilot project |
|
|
271 | (2) |
|
|
273 | (1) |
|
|
274 | (1) |
Index |
|
275 | |