Muutke küpsiste eelistusi

Designing and Building Enterprise Knowledge Graphs [Pehme köide]

This book is a guide to designing and building knowledge graphs from enterprise relational databases in practice. It presents a principled framework centered on mapping patterns to connect relational databases with knowledge graphs, the roles within an organization responsible for the knowledge graph, and the process that combines data and people. The content of this book is applicable to knowledge graphs being built either with property graph or RDF graph technologies.

Knowledge graphs are fulfilling the vision of creating intelligent systems that integrate knowledge and data at large scale. Tech giants have adopted knowledge graphs for the foundation of next-generation enterprise data and metadata management, search, recommendation, analytics, intelligent agents, and more. We are now observing an increasing number of enterprises that seek to adopt knowledge graphs to develop a competitive edge.

In order for enterprises to design and build knowledge graphs, they need to understand the critical data stored in relational databases. How can enterprises successfully adopt knowledge graphs to integrate data and knowledge, without boiling the ocean? This book provides the answers.

Preface xv
Foreword xvii
Anonymous
Foreword xix
Tom Plasterer
Acknowledgments xxi
Disclaimer xxiii
1 Introduction
1(18)
1.1 What is the Problem?
2(8)
1.1.1 Spreadsheet Approach
3(1)
1.1.2 Query Approach
4(1)
1.1.3 Data Warehouse Approach
5(2)
1.1.4 Data Lake Approach
7(1)
1.1.5 Data Wrangling Approach
7(1)
1.1.6 So What?
8(2)
1.2 Knowledge Graphs
10(2)
1.2.1 What is a Knowledge Graph?
10(1)
1.2.2 Why Knowledge Graphs?
10(2)
1.2.3 Why Now?
12(1)
1.3 Background
12(3)
1.3.1 History of Knowledge Graphs
12(1)
1.3.2 Semantics
13(1)
1.3.3 Semantic Web
14(1)
1.3.4 Models, Ontologies, and Schemata
15(1)
1.4 Why This Book?
15(4)
2 Designing Enterprise Knowledge Graphs
19(26)
2.1 Source: Relational Databases
19(1)
2.2 Target: Knowledge Graph
19(12)
2.2.1 RDF Graph
20(1)
2.2.2 Property Graph
21(1)
2.2.3 Knowledge Graph Schema
22(2)
2.2.4 An Abstract Graph Notation Used in This Book
24(2)
2.2.5 Graph Query Languages
26(1)
2.2.6 Identifiers
26(3)
2.2.7 Modeling
29(2)
2.3 Mappings: Relational Database to Knowledge Graph
31(14)
2.3.1 Direct Mapping
32(3)
2.3.2 Custom Mapping
35(4)
2.3.3 Mapping Languages
39(6)
3 Mapping Design Patterns
45(52)
3.1 Direct Custom Mapping Patterns
45(6)
3.1.1 Direct Concept
45(1)
3.1.2 Direct Concept Attribute
46(2)
3.1.3 Direct Relationship
48(1)
3.1.4 Direct Relationship Attribute
49(2)
3.2 Complex Custom Concept Mapping Patterns
51(7)
3.2.1 Complex Concept: Conditions
51(1)
3.2.2 Complex Concept: Data as a Concept
52(3)
3.2.3 Complex Concept: Join
55(2)
3.2.4 Complex Concept: Distinct
57(1)
3.3 Complex Custom Attribute Mapping Patterns
58(21)
3.3.1 Complex Concept Attribute: CONCAT
58(1)
3.3.2 Complex Concept Attribute: Math
59(2)
3.3.3 Complex Concept Attribute: CASE
61(3)
3.3.4 Complex Concept Attribute: NULL
64(1)
3.3.5 Complex Concept Attribute: JOIN
65(1)
3.3.6 Complex Concept Attribute: LEFT JOIN
66(2)
3.3.7 Complex Concept Attribute: Duplicate
68(2)
3.3.8 Complex Concept Attribute: Constant Table
70(2)
3.3.9 Complex Concept Attribute: Constant Attribute
72(2)
3.3.10 Complex Concept Attribute: Constant Value
74(2)
3.3.11 Complex Concept Attribute: EAV
76(3)
3.4 Complex Custom Relationship Mapping Patterns
79(18)
3.4.1 Relationship: Many to Many
79(2)
3.4.2 Relationship: One to Many without Duplicates
81(1)
3.4.3 Relationship: One to Many with Duplicates
82(3)
3.4.4 Relationship: One to One with Duplicates
85(2)
3.4.5 Relationship: Constant Table
87(1)
3.4.6 Relationship: Constant Attribute
88(3)
3.4.7 Relationship: Constant Value
91(2)
3.4.8 Relationship: Bidrectional
93(4)
4 Building Enterprise Knowledge Graphs
97(32)
4.1 People
97(2)
4.1.1 Data Producers and Consumers
97(1)
4.1.2 Data Product Manager
98(1)
4.2 Process
99(24)
4.2.1 Phase 1: Knowledge Capture
101(3)
4.2.2 Phase 2: Knowledge Implementation
104(3)
4.2.3 Phase 3: Knowledge Access
107(1)
4.2.4 An E-Commerce Use Case
108(15)
4.3 Tools
123(6)
4.3.1 Metadata Management
123(1)
4.3.2 Knowledge Management
124(1)
4.3.3 Data Management
125(3)
4.3.4 Additional Tools
128(1)
5 What's Next?
129(4)
5.1 Couldn't I Have Done This with a Relational Database?
129(1)
5.2 Isn't this Just Master Data Management?
129(1)
5.3 Knowledge Graphs and AI
129(4)
5.3.1 Symbolic Reasoning
130(1)
5.3.2 Non-Symbolic Reasoning
131(2)
6 Conclusions
133(2)
6.1 It's All a Graph!
133(1)
6.2 Mapping Patterns
133(1)
6.3 You Need a Data Team
133(1)
6.4 Be Agile, Start Small, Don't Boil the Ocean
134(1)
Bibliography 135(6)
Authors' Biographies 141
Juan Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his research. Juan's goal is to reliably create knowledge from inscrutable data. His academic and industry work has been on designing and building knowledge graphs for enterprise data integration where he has researched and developed technologies for semantic and graph data virtualization, ontology and graph data modeling and schema mapping, and data integration methodologies. Juan serves as a bridge between academia and industry through standardization committees, like serving as the co-chair of the Property Graph Schema Working Group and a past member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC), as well as a past invited expert member and standards editor at the World Wide Web Consortium (W3C).

Juan holds a Ph.D. in Computer Science from The University of Texas at Austin. He is the recipient of the NSF Graduate Research Fellowship, 2nd place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at International Semantic Web Conference 2014, and the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics.

Ora Lassila is a Principal Graph Technolgist in the Amazon Neptune graph database team. Earlier, he was a Managing Director at State Street, heading efforts to adopt ontologies and graph databases. Before that, he worked as a technology architect at Pegasystems, as an architect and technology strategist at Nokia Location & Commerce (later renamed HERE), and prior to that as a Research Fellow at the Nokia Research Center. He was an elected member of the Advisory Board of the World Wide Web Consortium (W3C) in 19982013, and represented Nokia in the W3C Advisory Committee in 19982002. In 19961997 he was a Visiting Scientist at MIT Laboratory for Computer Science, working with W3C and launching the Resource Description Framework (RDF) standard; he served as a co-editor of the original RDF Model and Syntax specification. Much of his research work at the Nokia Research Center focused on the Semantic Web and particularly its applications to mobile and ubiquitous computing. He collaborated with several U.S. universities, and was an active participant in the DARPA Agent Markup Language (DAML) program.

His positions before that include Project Manager at the Robotics Institute of Carnegie Mellon University and Research Scientist at the Computer Science Laboratory of Helsinki University of Technology. He has also worked as a software engineer in several companies (including his own start-up). He is the author of more than 100 conference papers and journal articles. He holds a Ph.D. in Computer Science from the Helsinki University of Technology (renamed Aalto University some years ago). Ora is the recipient of the Best Student Paper award of the 1989 Scandinavian Conference on AI, the Grand Prize of the 1989 Usenix Obfuscated C Code Contest, and the Semantic Web Science Association's 10-year award.