Muutke küpsiste eelistusi

Practitioner's Guide to Graph Data: Applying Graph Thinking and Graph Technologies to Solve Complex Problems [Pehme köide]

  • Formaat: Paperback / softback, 420 pages, kõrgus x laius: 233x178 mm, Illustrations, unspecified
  • Ilmumisaeg: 09-Apr-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492044075
  • ISBN-13: 9781492044079
Teised raamatud teemal:
  • Pehme köide
  • Hind: 75,81 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 89,19 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 420 pages, kõrgus x laius: 233x178 mm, Illustrations, unspecified
  • Ilmumisaeg: 09-Apr-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492044075
  • ISBN-13: 9781492044079
Teised raamatud teemal:
Graph data closes the gap between the way humans and computers view the world. While computers rely on static rows and columns of data, people navigate and reason about life through relationships. This practical guide demonstrates how graph data brings these two approaches together. By working with concepts from graph theory, database schema, distributed systems, and data analysis, you'll arrive at a unique intersection known as graph thinking.

Authors Denise Koessler Gosnell and Matthias Broecheler show data engineers, data scientists, and data analysts how to solve complex problems with graph databases. You'll explore templates for building with graph technology, along with examples that demonstrate how teams think about graph data within an application.

Build an example application architecture with relational and graph technologies Use graph technology to build a Customer 360 application, the most popular graph data pattern today Dive into hierarchical data and troubleshoot a new paradigm that comes from working with graph data Find paths in graph data and learn why your trust in different paths motivates and informs your preferences Use collaborative filtering to design a Netflix-inspired recommendation system
Preface xi
1 Graph Thinking
1(20)
Why Now? Putting Database Technologies in Context
2(7)
1960s--1980s: Hierarchical Data
3(1)
1980s--2000s: Entity-Relationship
4(1)
2000s--2020s: NoSQL
5(2)
2020s--?: Graph
7(2)
What Is Graph Thinking?
9(3)
Complex Problems and Complex Systems
10(1)
Complex Problems in Business
10(2)
Making Technology Decisions to Solve Complex Problems
12(8)
So You Have Graph Data. What's Next?
15(4)
Seeing the Bigger Picture
19(1)
Getting Started on Your Journey with Graph Thinking
20(1)
2 Evolving from Relational to Graph Thinking
21(26)
Chapter Preview: Translating Relational Concepts to Graph Terminology
21(1)
Relational Versus Graph: What's the Difference?
22(3)
Data for Our Running Example
23(2)
Relational Data Modeling
25(3)
Entities and Attributes
26(1)
Building Up to an ERD
27(1)
Concepts in Graph Data
28(5)
Fundamental Elements of a Graph
28(1)
Adjacency
29(1)
Neighborhoods
30(1)
Distance
30(1)
Degree
31(2)
The Graph Schema Language
33(10)
Vertex Labels and Edge Labels
33(1)
Properties
34(1)
Edge Direction
35(3)
Self-Referencing Edge Labels
38(1)
Multiplicity of Your Graph
38(3)
Full Example Graph Model
41(2)
Relational Versus Graph: Decisions to Consider
43(1)
Data Modeling
43(1)
Understanding Graph Data
43(1)
Mixing Database Design with Application Purpose
44(1)
Summary
44(3)
3 Getting Started: A Simple Customer 360
47(34)
Chapter Preview: Relational Versus Graph
48(1)
The Foundational Use Case for Graph Data: C360
48(3)
Why Do Businesses Care About C360?
50(1)
Implementing a C360 Application in a Relational System
51(10)
Data Models
51(3)
Relational Implementation
54(4)
Example C360 Queries
58(3)
Implementing a C360 Application in a Graph System
61(14)
Data Models
62(1)
Graph Implementation
63(7)
Example C360 Queries
70(5)
Relational Versus Graph: How to Choose?
75(3)
Relational Versus Graph: Data Modeling
75(1)
Relational Versus Graph: Representing Relationships
76(1)
Relational Versus Graph: Query Languages
76(1)
Relational Versus Graph: Main Points
77(1)
Summary
78(3)
Why Not Relational?
79(1)
Making a Technology Choice for Your C360 Application
79(2)
4 Exploring Neighborhoods in Development
81(36)
Chapter Preview: Building a More Realistic Customer 360
81(1)
Graph Data Modeling 101
82(13)
Should This Be a Vertex or an Edge?
83(3)
Lost Yet? Let Us Walk You Through Direction
86(3)
A Graph Has No Name: Common Mistakes in Naming
89(2)
Our Full Development Graph Model
91(2)
Before We Start Building
93(1)
Our Thoughts on the Importance of Data, Queries, and the End User
94(1)
Implementation Details for Exploring Neighborhoods in Development
95(2)
Generating More Data for Our Expanded Example
97(1)
Basic Gremlin Navigation
97(9)
Advanced Gremlin: Shaping Your Query Results
106(9)
Shaping Query Results with the project(), fold(), and unfold() Steps
107(3)
Removing Data from the Results with the where(neq()) Pattern
110(2)
Planning for Robust Result Payloads with the coalesce() Step
112(3)
Moving from Development into Production
115(2)
5 Exploring Neighborhoods in Production
117(38)
Chapter Preview: Understanding Distributed Graph Data in Apache Cassandra
119(1)
Working with Graph Data in Apache Cassandra
120(16)
The Most Important Topic to Understand About Data Modeling: Primary Keys
120(1)
Partition Keys and Data Locality in a Distributed Environment
121(5)
Understanding Edges, Part 1: Edges in Adjacency Lists
126(2)
Understanding Edges, Part 2: Clustering Columns
128(4)
Understanding Edges, Part 3: Materialized Views for Traversals
132(4)
Graph Data Modeling 201
136(6)
Finding Indexes with an Intelligent Index Recommendation System
140(2)
Production Implementation Details
142(10)
Materialized Views and Adding Time onto Edges
142(2)
Our Final C360 Production Schema
144(2)
Bulk Loading Graph Data
146(3)
Updating Our Gremlin Queries to Use Time on Edges
149(3)
Moving On to More Complex, Distributed Graph Problems
152(3)
Our First 10 Tips to Get from Development to Production
152(3)
6 Using Trees in Development
155(36)
Chapter Preview: Navigating Trees, Hierarchical Data, and Cycles
155(1)
Seeing Hierarchies and Nested Data: Three Examples
156(3)
Hierarchical Data in a Bill of Materials
156(1)
Hierarchical Data in Version Control Systems
157(1)
Hierarchical Data in Self-Organizing Networks
157(1)
Why Graph Technology for Hierarchical Data?
158(1)
Finding Your Way Through a Forest of Terminology
159(3)
Trees, Roots, and Leaves
159(1)
Depth in Walks, Paths, and Cycles
160(2)
Understanding Hierarchies with Our Sensor Data
162(12)
Understand the Data
163(7)
Conceptual Model Using the GSL Notation
170(1)
Implement Schema
171(3)
Before We Build Our Queries
174(1)
Querying from Leaves to Roots in Development
174(10)
Where Has This Sensor Sent Information To?
175(3)
From This Sensor, What Was Its Path to Any Tower?
178(6)
From Bottom Up to Top Down
184(1)
Querying from Roots to Leaves in Development
184(6)
Setup Query: Which Tower Has the Most Sensor Connections So That We Could Explore It for Our Example?
185(1)
Which Sensors Have Connected Directly to Georgetown?
186(1)
Find All Sensors That Connected to Georgetown
187(2)
Depth Limiting in Recursion
189(1)
Going Back in Time
190(1)
7 Using Trees in Production
191(34)
Chapter Preview: Understanding Branching Factor, Depth, and Time on Edges
191(1)
Understanding Time in the Sensor Data
192(8)
Final Thoughts on Time Series Data in Graphs
200(1)
Understanding Branching Factor in Our Example
200(3)
What Is Branching Factor?
201(1)
How Do We Get Around Branching Factor?
202(1)
Production Schema for Our Sensor Data
203(2)
Querying from Leaves to Roots in Production
205(8)
Where Has This Sensor Sent Information to, and at What Time?
205(1)
From This Sensor, Find All Trees up to a Tower by Time
206(3)
From This Sensor, Find a Valid Tree
209(2)
Advanced Gremlin: Understanding the where().by() Pattern
211(2)
Querying from Roots to Leaves in Production
213(5)
Which Sensors Have Connected to Georgetown Directly, by Time?
214(1)
What Valid Paths Can We Find from Georgetown Down to All Sensors?
215(3)
Applying Your Queries to Tower Failure Scenarios
218(5)
Applying the Final Results of Our Complex Problem
223(1)
Seeing the Forest for the Trees
223(2)
8 Finding Paths in Development
225(36)
Chapter Preview: Quantifying Trust in Networks
226(1)
Thinking About Trust: Three Examples
226(3)
How Much Do You Trust That Open Invitation?
226(1)
How Defensible Is an Investigators Story?
227(1)
How Do Companies Model Package Delivery?
228(1)
Fundamental Concepts About Paths
229(5)
Shortest Paths
230(2)
Depth-First Search and Breadth-First Search
232(1)
Learning to See Application Features as Different Path Problems
233(1)
Finding Paths in a Trust Network
234(6)
Source Data
234(2)
A Brief Primer on Bitcoin Terminology
236(1)
Creating Our Development Schema
236(1)
Loading Data
237(1)
Exploring Communities of Trust
238(2)
Understanding Traversals with Our Bitcoin Trust Network
240(6)
Which Addresses Are in the First Neighborhood?
240(1)
Which Addresses Are in the Second Neighborhood?
241(1)
Which Addresses Are in the Second Neighborhood, but Not the First?
242(2)
Evaluation Strategies with the Gremlin Query Language
244(1)
Pick a Random Address to Use for Our Example
245(1)
Shortest Path Queries
246(15)
Finding Paths of a Fixed Length
247(3)
Finding Paths of Any Length
250(3)
Augmenting Our Paths with the Trust Scores
253(6)
Do You Trust This Person?
259(2)
9 Finding Paths in Production
261(30)
Chapter Preview: Understanding Weights, Distance, and Pruning
262(1)
Weighted Paths and Search Algorithms
262(5)
Shortest Weighted Path Problem Definition
263(1)
Shortest Weighted Path Search Optimizations
264(3)
Normalization of Edge Weights for Shortest Path Problems
267(10)
Normalizing the Edge Weights
267(5)
Updating Our Graph
272(1)
Exploring the Normalized Edge Weights
273(4)
Some Thoughts Before Moving On to Shortest Weighted Path Queries
277(1)
Shortest Weighted Path Queries
277(11)
Building a Shortest Weighted Path Query for Production
278(10)
Weighted Paths and Trust in Production
288(3)
10 Recommendations in Development
291(34)
Chapter Preview: Collaborative Filtering for Movie Recommendations
292(1)
Recommendation System Examples
292(3)
How We Give Recommendations in Healthcare
292(1)
How We Experience Recommendations in Social Media
293(1)
How We Use Deeply Connected Data for Recommendations in Ecommerce
294(1)
An Introduction to Collaborative Filtering
295(8)
Understanding the Problem and Domain
295(2)
Collaborative Filtering with Graph Data
297(1)
Recommendations via Item-Based Collaborative Filtering with Graph Data
298(1)
Three Different Models for Ranking Recommendations
299(4)
Movie Data: Schema, Loading, and Query Review
303(1)
Data Model for Movie Recommendations
303(15)
Schema Code for Movie Recommendations
305(2)
Loading the Movie Data
307(4)
Neighborhood Queries in the Movie Data
311(3)
Tree Queries in the Movie Data
314(2)
Path Queries in the Movie Data
316(2)
Item-Based Collaborative Filtering in Gremlin
318(7)
Model 1 Counting Paths in the Recommendation Set
318(1)
Model 2 NPS-Inspired
319(3)
Model 3 Normalized NPS
322(2)
Choosing Your Own Adventure: Movies and Graph Problems Edition
324(1)
11 Simple Entity Resolution in Graphs
325(24)
Chapter Preview: Merging Multiple Datasets into One Graph
325(1)
Defining a Different Complex Problem: Entity Resolution
326(3)
Seeing the Complex Problem
328(1)
Analyzing the Two Movie Datasets
329(11)
MovieLens Dataset
329(7)
Kaggle Dataset
336(3)
Development Schema
339(1)
Matching and Merging the Movie Data
340(3)
Our Matching Process
340(3)
Resolving False Positives
343(6)
False Positives Found in the MovieLens Dataset
343(1)
Additional Errors Discovered in the Entity Resolution Process
344(2)
Final Analysis of the Merging Process
346(1)
The Role of Graph Structure in Merging Movie Data
347(2)
12 Recommendations in Production
349(28)
Chapter Preview: Understanding Shortcut Edges, Precomputation, and Advanced Pruning Techniques
350(1)
Shortcut Edges for Recommendations in Real Time
350(1)
Where Our Development Process Doesn't Scale
351(6)
How We Fix Scaling Issues: Shortcut Edges
352(1)
Seeing What We Designed to Deliver in Production
353(1)
Pruning: Different Ways to Precompute Shortcut Edges
354(2)
Considerations for Updating Your Recommendations
356(1)
Calculating Shortcut Edges for Our Movie Data
357(1)
Breaking Down the Complex Problem of Precalculating Shortcut Edges
357(7)
Addressing the Elephant in the Room: Batch Computation
362(1)
Production Schema and Data Loading for Movie Recommendations
363(1)
Production Schema for Movie Recommendations
364(3)
Production Data Loading for Movie Recommendations
365(1)
Recommendation Queries with Shortcut Edges
366(1)
Confirming Our Edges Loaded Correctly
367(10)
Production Recommendations for Our User
368(4)
Understanding Response Time in Production by Counting Edge Partitions
372(3)
Final Thoughts on Reasoning About Distributed Graph Query Performance
375(2)
13 Epilogue
377(6)
Where to Go from Here?
378(4)
Graph Algorithms
378(1)
Distributed Graphs
379(1)
Graph Theory
380(1)
Network Theory
380(2)
Stay in Touch
382(1)
Index 383
Dr. Denise Gosnell's passion for examining, applying, and evangelizing the applications of graph data was ignited during her apprenticeship under Dr. Teresa Haynes and Dr. Debra Knisley during her first NSF Fellowship. This group's work was one of the earliest applications of neural networks and graph theoretic structure in predictive computational biology. Since then, Dr. Gosnell has built, published, patented, and spoke on dozens of topics related to graph theory, graph algorithms, graph databases, and applications of graph data across all industry verticals.

Currently, Dr. Gosnell is with DataStax where she aspires to build upon her experiences as a data scientist and graph architect. Prior to her role with DataStax, she built software solutions for and spoke at over a dozen conferences on permissioned blockchains, machine learning applications of graph analytics, and data science within the healthcare industry.

Dr. Matthias Broecheler is a technologist and entrepreneur with substantial research anddevelopment experience who is focused on disruptive software technologies and understanding complex systems. Dr. Broecheler's is known as an industry expert in graph databases, relational machine learning, and big data analysis in general. He is a practitioner of lean methodologies and experimentation to drive continuous improvement. Dr. Broecheler is the inventor of the Titan graph database and a founder of Aurelius.