Preface |
|
ix | |
|
1 Introduction to Data Lakes |
|
|
1 | (24) |
|
|
3 | (4) |
|
|
5 | (1) |
|
|
6 | (1) |
|
Creating a Successful Data Lake |
|
|
7 | (5) |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
9 | (2) |
|
|
11 | (1) |
|
Roadmap to Data Lake Success |
|
|
12 | (8) |
|
|
13 | (1) |
|
|
14 | (1) |
|
Setting Up the Data Lake for Self-Service |
|
|
15 | (5) |
|
|
20 | (4) |
|
Data Lakes in the Public Cloud |
|
|
20 | (1) |
|
|
21 | (3) |
|
|
24 | (1) |
|
|
25 | (24) |
|
The Drive for Self-Service Data---The Birth of Databases |
|
|
25 | (3) |
|
The Analytics Imperative---The Birth of Data Warehousing |
|
|
28 | (1) |
|
The Data Warehouse Ecosystem |
|
|
29 | (18) |
|
Storing and Querying the Data |
|
|
31 | (6) |
|
Loading the Data---Data Integration Tools |
|
|
37 | (4) |
|
Organizing and Managing the Data |
|
|
41 | (5) |
|
|
46 | (1) |
|
|
47 | (2) |
|
3 Introduction to Big Data and Data Science |
|
|
49 | (14) |
|
Hadoop Leads the Historic Shift to Big Data |
|
|
50 | (5) |
|
|
50 | (1) |
|
How Processing and Storage Interact in a MapReduce Job |
|
|
51 | (2) |
|
|
53 | (1) |
|
|
53 | (2) |
|
|
55 | (1) |
|
What Should Your Analytics Organization Focus On? |
|
|
56 | (3) |
|
|
59 | (3) |
|
|
60 | (1) |
|
|
61 | (1) |
|
|
62 | (1) |
|
|
63 | (12) |
|
The What and Why of Hadoop |
|
|
63 | (3) |
|
Preventing Proliferation of Data Puddles |
|
|
66 | (1) |
|
Taking Advantage of Big Data |
|
|
67 | (7) |
|
Leading with Data Science |
|
|
67 | (3) |
|
Strategy 1 Offload Existing Functionality |
|
|
70 | (1) |
|
Strategy 2 Data Lakes for New Projects |
|
|
71 | (1) |
|
Strategy 3 Establish a Central Point of Governance |
|
|
72 | (1) |
|
Which Way Is Right for You? |
|
|
73 | (1) |
|
|
74 | (1) |
|
5 From Data Ponds/Big Data Warehouses to Data Lakes |
|
|
75 | (22) |
|
Essential Functions of a Data Warehouse |
|
|
76 | (3) |
|
Dimensional Modeling for Analytics |
|
|
77 | (1) |
|
Integrating Data from Disparate Sources |
|
|
78 | (1) |
|
Preserving History Using Slowly Changing Dimensions |
|
|
78 | (1) |
|
Limitations of the Data Warehouse as a Historical Repository |
|
|
78 | (1) |
|
|
79 | (4) |
|
Keeping History in a Data Pond |
|
|
79 | (2) |
|
Implementing Slowly Changing Dimensions in a Data Pond |
|
|
81 | (2) |
|
Growing Data Ponds into a Data Lake---Loading Data That's Not in the Data Warehouse |
|
|
83 | (4) |
|
|
83 | (1) |
|
|
84 | (2) |
|
Internet of Things (IoT) and Other Streaming Data |
|
|
86 | (1) |
|
|
87 | (2) |
|
|
89 | (1) |
|
|
90 | (2) |
|
|
92 | (3) |
|
|
93 | (1) |
|
|
93 | (1) |
|
Real-Time Applications and Data Products |
|
|
93 | (2) |
|
|
95 | (2) |
|
6 Optimizing for Self-Service |
|
|
97 | (24) |
|
The Beginnings of Self-Service |
|
|
98 | (2) |
|
|
100 | (13) |
|
Finding and Understanding Data---Documenting the Enterprise |
|
|
101 | (2) |
|
|
103 | (7) |
|
|
110 | (2) |
|
Preparing Data for Analysis |
|
|
112 | (1) |
|
Data Wrangling in the Data Lake |
|
|
113 | (3) |
|
Situating Data Preparation in Hadoop |
|
|
113 | (1) |
|
Common Use Cases for Data Preparation |
|
|
114 | (2) |
|
Analyzing and Visualizing |
|
|
116 | (1) |
|
The New World of Self-Service Business Intelligence |
|
|
116 | (4) |
|
The New Analytic Workflow |
|
|
117 | (1) |
|
Gatekeepers to Shopkeepers |
|
|
118 | (1) |
|
|
119 | (1) |
|
|
120 | (1) |
|
7 Architecting the Data Lake |
|
|
121 | (16) |
|
|
121 | (6) |
|
|
123 | (1) |
|
|
123 | (2) |
|
|
125 | (1) |
|
|
125 | (2) |
|
|
127 | (2) |
|
Advantages of Keeping Data Lakes Separate |
|
|
127 | (1) |
|
Advantages of Merging the Data Lakes |
|
|
128 | (1) |
|
|
129 | (2) |
|
|
131 | (5) |
|
|
131 | (1) |
|
|
132 | (2) |
|
|
134 | (2) |
|
|
136 | (1) |
|
8 Cataloging the Data Lake |
|
|
137 | (20) |
|
|
137 | (8) |
|
|
138 | (5) |
|
|
143 | (2) |
|
|
145 | (2) |
|
|
146 | (1) |
|
|
147 | (4) |
|
Sensitive Data Management and Access Control |
|
|
147 | (2) |
|
|
149 | (2) |
|
|
151 | (1) |
|
|
152 | (1) |
|
|
153 | (1) |
|
Tools for Building a Catalog |
|
|
154 | (2) |
|
|
155 | (1) |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (22) |
|
Authorization or Access Control |
|
|
158 | (1) |
|
Tag-Based Data Access Policies |
|
|
159 | (3) |
|
Deidentifying Sensitive Data |
|
|
162 | (5) |
|
Data Sovereignty and Regulatory Compliance |
|
|
165 | (2) |
|
Self-Service Access Management |
|
|
167 | (10) |
|
|
171 | (6) |
|
|
177 | (2) |
|
10 Industry-Specific Perspectives |
|
|
179 | (18) |
|
Big Data in Financial Services |
|
|
180 | (10) |
|
Consumers, Digitization, and Data Are Changing Finance as We Know It |
|
|
180 | (2) |
|
|
182 | (3) |
|
New Opportunities Offered by New Data |
|
|
185 | (3) |
|
Key Processes in Making Use of the Data Lake |
|
|
188 | (2) |
|
Value Added by Data Lakes in Financial Services |
|
|
190 | (2) |
|
Data Lakes in the Insurance Industry |
|
|
192 | (1) |
|
|
193 | (2) |
|
|
195 | (2) |
Index |
|
197 | |