Preface |
|
xi | |
|
|
1 | (20) |
|
Why Now? Putting Database Technologies in Context |
|
|
2 | (7) |
|
1960s--1980s: Hierarchical Data |
|
|
3 | (1) |
|
1980s--2000s: Entity-Relationship |
|
|
4 | (1) |
|
|
5 | (2) |
|
|
7 | (2) |
|
|
9 | (3) |
|
Complex Problems and Complex Systems |
|
|
10 | (1) |
|
Complex Problems in Business |
|
|
10 | (2) |
|
Making Technology Decisions to Solve Complex Problems |
|
|
12 | (8) |
|
So You Have Graph Data. What's Next? |
|
|
15 | (4) |
|
Seeing the Bigger Picture |
|
|
19 | (1) |
|
Getting Started on Your Journey with Graph Thinking |
|
|
20 | (1) |
|
2 Evolving from Relational to Graph Thinking |
|
|
21 | (26) |
|
Chapter Preview: Translating Relational Concepts to Graph Terminology |
|
|
21 | (1) |
|
Relational Versus Graph: What's the Difference? |
|
|
22 | (3) |
|
Data for Our Running Example |
|
|
23 | (2) |
|
|
25 | (3) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (5) |
|
Fundamental Elements of a Graph |
|
|
28 | (1) |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
30 | (1) |
|
|
31 | (2) |
|
The Graph Schema Language |
|
|
33 | (10) |
|
Vertex Labels and Edge Labels |
|
|
33 | (1) |
|
|
34 | (1) |
|
|
35 | (3) |
|
Self-Referencing Edge Labels |
|
|
38 | (1) |
|
Multiplicity of Your Graph |
|
|
38 | (3) |
|
|
41 | (2) |
|
Relational Versus Graph: Decisions to Consider |
|
|
43 | (1) |
|
|
43 | (1) |
|
|
43 | (1) |
|
Mixing Database Design with Application Purpose |
|
|
44 | (1) |
|
|
44 | (3) |
|
3 Getting Started: A Simple Customer 360 |
|
|
47 | (34) |
|
Chapter Preview: Relational Versus Graph |
|
|
48 | (1) |
|
The Foundational Use Case for Graph Data: C360 |
|
|
48 | (3) |
|
Why Do Businesses Care About C360? |
|
|
50 | (1) |
|
Implementing a C360 Application in a Relational System |
|
|
51 | (10) |
|
|
51 | (3) |
|
Relational Implementation |
|
|
54 | (4) |
|
|
58 | (3) |
|
Implementing a C360 Application in a Graph System |
|
|
61 | (14) |
|
|
62 | (1) |
|
|
63 | (7) |
|
|
70 | (5) |
|
Relational Versus Graph: How to Choose? |
|
|
75 | (3) |
|
Relational Versus Graph: Data Modeling |
|
|
75 | (1) |
|
Relational Versus Graph: Representing Relationships |
|
|
76 | (1) |
|
Relational Versus Graph: Query Languages |
|
|
76 | (1) |
|
Relational Versus Graph: Main Points |
|
|
77 | (1) |
|
|
78 | (3) |
|
|
79 | (1) |
|
Making a Technology Choice for Your C360 Application |
|
|
79 | (2) |
|
4 Exploring Neighborhoods in Development |
|
|
81 | (36) |
|
Chapter Preview: Building a More Realistic Customer 360 |
|
|
81 | (1) |
|
|
82 | (13) |
|
Should This Be a Vertex or an Edge? |
|
|
83 | (3) |
|
Lost Yet? Let Us Walk You Through Direction |
|
|
86 | (3) |
|
A Graph Has No Name: Common Mistakes in Naming |
|
|
89 | (2) |
|
Our Full Development Graph Model |
|
|
91 | (2) |
|
|
93 | (1) |
|
Our Thoughts on the Importance of Data, Queries, and the End User |
|
|
94 | (1) |
|
Implementation Details for Exploring Neighborhoods in Development |
|
|
95 | (2) |
|
Generating More Data for Our Expanded Example |
|
|
97 | (1) |
|
|
97 | (9) |
|
Advanced Gremlin: Shaping Your Query Results |
|
|
106 | (9) |
|
Shaping Query Results with the project(), fold(), and unfold() Steps |
|
|
107 | (3) |
|
Removing Data from the Results with the where(neq()) Pattern |
|
|
110 | (2) |
|
Planning for Robust Result Payloads with the coalesce() Step |
|
|
112 | (3) |
|
Moving from Development into Production |
|
|
115 | (2) |
|
5 Exploring Neighborhoods in Production |
|
|
117 | (38) |
|
Chapter Preview: Understanding Distributed Graph Data in Apache Cassandra |
|
|
119 | (1) |
|
Working with Graph Data in Apache Cassandra |
|
|
120 | (16) |
|
The Most Important Topic to Understand About Data Modeling: Primary Keys |
|
|
120 | (1) |
|
Partition Keys and Data Locality in a Distributed Environment |
|
|
121 | (5) |
|
Understanding Edges, Part 1: Edges in Adjacency Lists |
|
|
126 | (2) |
|
Understanding Edges, Part 2: Clustering Columns |
|
|
128 | (4) |
|
Understanding Edges, Part 3: Materialized Views for Traversals |
|
|
132 | (4) |
|
|
136 | (6) |
|
Finding Indexes with an Intelligent Index Recommendation System |
|
|
140 | (2) |
|
Production Implementation Details |
|
|
142 | (10) |
|
Materialized Views and Adding Time onto Edges |
|
|
142 | (2) |
|
Our Final C360 Production Schema |
|
|
144 | (2) |
|
|
146 | (3) |
|
Updating Our Gremlin Queries to Use Time on Edges |
|
|
149 | (3) |
|
Moving On to More Complex, Distributed Graph Problems |
|
|
152 | (3) |
|
Our First 10 Tips to Get from Development to Production |
|
|
152 | (3) |
|
6 Using Trees in Development |
|
|
155 | (36) |
|
Chapter Preview: Navigating Trees, Hierarchical Data, and Cycles |
|
|
155 | (1) |
|
Seeing Hierarchies and Nested Data: Three Examples |
|
|
156 | (3) |
|
Hierarchical Data in a Bill of Materials |
|
|
156 | (1) |
|
Hierarchical Data in Version Control Systems |
|
|
157 | (1) |
|
Hierarchical Data in Self-Organizing Networks |
|
|
157 | (1) |
|
Why Graph Technology for Hierarchical Data? |
|
|
158 | (1) |
|
Finding Your Way Through a Forest of Terminology |
|
|
159 | (3) |
|
|
159 | (1) |
|
Depth in Walks, Paths, and Cycles |
|
|
160 | (2) |
|
Understanding Hierarchies with Our Sensor Data |
|
|
162 | (12) |
|
|
163 | (7) |
|
Conceptual Model Using the GSL Notation |
|
|
170 | (1) |
|
|
171 | (3) |
|
Before We Build Our Queries |
|
|
174 | (1) |
|
Querying from Leaves to Roots in Development |
|
|
174 | (10) |
|
Where Has This Sensor Sent Information To? |
|
|
175 | (3) |
|
From This Sensor, What Was Its Path to Any Tower? |
|
|
178 | (6) |
|
From Bottom Up to Top Down |
|
|
184 | (1) |
|
Querying from Roots to Leaves in Development |
|
|
184 | (6) |
|
Setup Query: Which Tower Has the Most Sensor Connections So That We Could Explore It for Our Example? |
|
|
185 | (1) |
|
Which Sensors Have Connected Directly to Georgetown? |
|
|
186 | (1) |
|
Find All Sensors That Connected to Georgetown |
|
|
187 | (2) |
|
Depth Limiting in Recursion |
|
|
189 | (1) |
|
|
190 | (1) |
|
7 Using Trees in Production |
|
|
191 | (34) |
|
Chapter Preview: Understanding Branching Factor, Depth, and Time on Edges |
|
|
191 | (1) |
|
Understanding Time in the Sensor Data |
|
|
192 | (8) |
|
Final Thoughts on Time Series Data in Graphs |
|
|
200 | (1) |
|
Understanding Branching Factor in Our Example |
|
|
200 | (3) |
|
What Is Branching Factor? |
|
|
201 | (1) |
|
How Do We Get Around Branching Factor? |
|
|
202 | (1) |
|
Production Schema for Our Sensor Data |
|
|
203 | (2) |
|
Querying from Leaves to Roots in Production |
|
|
205 | (8) |
|
Where Has This Sensor Sent Information to, and at What Time? |
|
|
205 | (1) |
|
From This Sensor, Find All Trees up to a Tower by Time |
|
|
206 | (3) |
|
From This Sensor, Find a Valid Tree |
|
|
209 | (2) |
|
Advanced Gremlin: Understanding the where().by() Pattern |
|
|
211 | (2) |
|
Querying from Roots to Leaves in Production |
|
|
213 | (5) |
|
Which Sensors Have Connected to Georgetown Directly, by Time? |
|
|
214 | (1) |
|
What Valid Paths Can We Find from Georgetown Down to All Sensors? |
|
|
215 | (3) |
|
Applying Your Queries to Tower Failure Scenarios |
|
|
218 | (5) |
|
Applying the Final Results of Our Complex Problem |
|
|
223 | (1) |
|
Seeing the Forest for the Trees |
|
|
223 | (2) |
|
8 Finding Paths in Development |
|
|
225 | (36) |
|
Chapter Preview: Quantifying Trust in Networks |
|
|
226 | (1) |
|
Thinking About Trust: Three Examples |
|
|
226 | (3) |
|
How Much Do You Trust That Open Invitation? |
|
|
226 | (1) |
|
How Defensible Is an Investigators Story? |
|
|
227 | (1) |
|
How Do Companies Model Package Delivery? |
|
|
228 | (1) |
|
Fundamental Concepts About Paths |
|
|
229 | (5) |
|
|
230 | (2) |
|
Depth-First Search and Breadth-First Search |
|
|
232 | (1) |
|
Learning to See Application Features as Different Path Problems |
|
|
233 | (1) |
|
Finding Paths in a Trust Network |
|
|
234 | (6) |
|
|
234 | (2) |
|
A Brief Primer on Bitcoin Terminology |
|
|
236 | (1) |
|
Creating Our Development Schema |
|
|
236 | (1) |
|
|
237 | (1) |
|
Exploring Communities of Trust |
|
|
238 | (2) |
|
Understanding Traversals with Our Bitcoin Trust Network |
|
|
240 | (6) |
|
Which Addresses Are in the First Neighborhood? |
|
|
240 | (1) |
|
Which Addresses Are in the Second Neighborhood? |
|
|
241 | (1) |
|
Which Addresses Are in the Second Neighborhood, but Not the First? |
|
|
242 | (2) |
|
Evaluation Strategies with the Gremlin Query Language |
|
|
244 | (1) |
|
Pick a Random Address to Use for Our Example |
|
|
245 | (1) |
|
|
246 | (15) |
|
Finding Paths of a Fixed Length |
|
|
247 | (3) |
|
Finding Paths of Any Length |
|
|
250 | (3) |
|
Augmenting Our Paths with the Trust Scores |
|
|
253 | (6) |
|
Do You Trust This Person? |
|
|
259 | (2) |
|
9 Finding Paths in Production |
|
|
261 | (30) |
|
Chapter Preview: Understanding Weights, Distance, and Pruning |
|
|
262 | (1) |
|
Weighted Paths and Search Algorithms |
|
|
262 | (5) |
|
Shortest Weighted Path Problem Definition |
|
|
263 | (1) |
|
Shortest Weighted Path Search Optimizations |
|
|
264 | (3) |
|
Normalization of Edge Weights for Shortest Path Problems |
|
|
267 | (10) |
|
Normalizing the Edge Weights |
|
|
267 | (5) |
|
|
272 | (1) |
|
Exploring the Normalized Edge Weights |
|
|
273 | (4) |
|
Some Thoughts Before Moving On to Shortest Weighted Path Queries |
|
|
277 | (1) |
|
Shortest Weighted Path Queries |
|
|
277 | (11) |
|
Building a Shortest Weighted Path Query for Production |
|
|
278 | (10) |
|
Weighted Paths and Trust in Production |
|
|
288 | (3) |
|
10 Recommendations in Development |
|
|
291 | (34) |
|
Chapter Preview: Collaborative Filtering for Movie Recommendations |
|
|
292 | (1) |
|
Recommendation System Examples |
|
|
292 | (3) |
|
How We Give Recommendations in Healthcare |
|
|
292 | (1) |
|
How We Experience Recommendations in Social Media |
|
|
293 | (1) |
|
How We Use Deeply Connected Data for Recommendations in Ecommerce |
|
|
294 | (1) |
|
An Introduction to Collaborative Filtering |
|
|
295 | (8) |
|
Understanding the Problem and Domain |
|
|
295 | (2) |
|
Collaborative Filtering with Graph Data |
|
|
297 | (1) |
|
Recommendations via Item-Based Collaborative Filtering with Graph Data |
|
|
298 | (1) |
|
Three Different Models for Ranking Recommendations |
|
|
299 | (4) |
|
Movie Data: Schema, Loading, and Query Review |
|
|
303 | (1) |
|
Data Model for Movie Recommendations |
|
|
303 | (15) |
|
Schema Code for Movie Recommendations |
|
|
305 | (2) |
|
|
307 | (4) |
|
Neighborhood Queries in the Movie Data |
|
|
311 | (3) |
|
Tree Queries in the Movie Data |
|
|
314 | (2) |
|
Path Queries in the Movie Data |
|
|
316 | (2) |
|
Item-Based Collaborative Filtering in Gremlin |
|
|
318 | (7) |
|
Model 1 Counting Paths in the Recommendation Set |
|
|
318 | (1) |
|
|
319 | (3) |
|
|
322 | (2) |
|
Choosing Your Own Adventure: Movies and Graph Problems Edition |
|
|
324 | (1) |
|
11 Simple Entity Resolution in Graphs |
|
|
325 | (24) |
|
Chapter Preview: Merging Multiple Datasets into One Graph |
|
|
325 | (1) |
|
Defining a Different Complex Problem: Entity Resolution |
|
|
326 | (3) |
|
Seeing the Complex Problem |
|
|
328 | (1) |
|
Analyzing the Two Movie Datasets |
|
|
329 | (11) |
|
|
329 | (7) |
|
|
336 | (3) |
|
|
339 | (1) |
|
Matching and Merging the Movie Data |
|
|
340 | (3) |
|
|
340 | (3) |
|
Resolving False Positives |
|
|
343 | (6) |
|
False Positives Found in the MovieLens Dataset |
|
|
343 | (1) |
|
Additional Errors Discovered in the Entity Resolution Process |
|
|
344 | (2) |
|
Final Analysis of the Merging Process |
|
|
346 | (1) |
|
The Role of Graph Structure in Merging Movie Data |
|
|
347 | (2) |
|
12 Recommendations in Production |
|
|
349 | (28) |
|
Chapter Preview: Understanding Shortcut Edges, Precomputation, and Advanced Pruning Techniques |
|
|
350 | (1) |
|
Shortcut Edges for Recommendations in Real Time |
|
|
350 | (1) |
|
Where Our Development Process Doesn't Scale |
|
|
351 | (6) |
|
How We Fix Scaling Issues: Shortcut Edges |
|
|
352 | (1) |
|
Seeing What We Designed to Deliver in Production |
|
|
353 | (1) |
|
Pruning: Different Ways to Precompute Shortcut Edges |
|
|
354 | (2) |
|
Considerations for Updating Your Recommendations |
|
|
356 | (1) |
|
Calculating Shortcut Edges for Our Movie Data |
|
|
357 | (1) |
|
Breaking Down the Complex Problem of Precalculating Shortcut Edges |
|
|
357 | (7) |
|
Addressing the Elephant in the Room: Batch Computation |
|
|
362 | (1) |
|
Production Schema and Data Loading for Movie Recommendations |
|
|
363 | (1) |
|
Production Schema for Movie Recommendations |
|
|
364 | (3) |
|
Production Data Loading for Movie Recommendations |
|
|
365 | (1) |
|
Recommendation Queries with Shortcut Edges |
|
|
366 | (1) |
|
Confirming Our Edges Loaded Correctly |
|
|
367 | (10) |
|
Production Recommendations for Our User |
|
|
368 | (4) |
|
Understanding Response Time in Production by Counting Edge Partitions |
|
|
372 | (3) |
|
Final Thoughts on Reasoning About Distributed Graph Query Performance |
|
|
375 | (2) |
|
|
377 | (6) |
|
|
378 | (4) |
|
|
378 | (1) |
|
|
379 | (1) |
|
|
380 | (1) |
|
|
380 | (2) |
|
|
382 | (1) |
Index |
|
383 | |