|
|
xi | |
|
|
xvii | |
The Authors |
|
xix | |
Acknowledgments |
|
xxi | |
Foreword |
|
xxiii | |
Notation |
|
xxv | |
|
|
1 | (8) |
|
1.1 Objectives of this Book |
|
|
4 | (3) |
|
|
7 | (1) |
|
|
7 | (2) |
|
2 Big Data Concepts, Techniques, and Technologies |
|
|
9 | (28) |
|
|
10 | (2) |
|
2.2 Big Data Characteristics |
|
|
12 | (4) |
|
|
16 | (5) |
|
2.3.1 Big Data General Dilemmas |
|
|
16 | (1) |
|
2.3.2 Challenges in the Big Data Life Cycle |
|
|
17 | (2) |
|
2.3.3 Big Data in Secure, Private, and Monitored Environments |
|
|
19 | (1) |
|
2.3.4 Organizational Change |
|
|
20 | (1) |
|
2.4 Techniques for Big Data Solutions |
|
|
21 | (9) |
|
2.4.1 Big Data Life Cycle and Requirements |
|
|
23 | (1) |
|
2.4.1.1 General Steps to Process and Analyze Big Data |
|
|
23 | (2) |
|
2.4.1.2 Architectural and Infrastructural Requirements |
|
|
25 | (2) |
|
2.4.2 The Lambda Architecture |
|
|
27 | (1) |
|
2.4.3 Towards Standardization: the NIST Reference Architecture |
|
|
28 | (2) |
|
2.5 Big Data Technologies |
|
|
30 | (7) |
|
2.5.1 Hadoop and Related Projects |
|
|
30 | (2) |
|
2.5.2 Landscape of Distributed SQL Engines |
|
|
32 | (3) |
|
2.5.3 Other Technologies for Big Data Analytics |
|
|
35 | (2) |
|
3 OLTP-oriented Databases for Big Data Environments |
|
|
37 | (56) |
|
3.1 NoSQL and NewSQL: an Overview |
|
|
38 | (3) |
|
|
41 | (47) |
|
3.2.1 Key-value Databases |
|
|
41 | (1) |
|
|
41 | (1) |
|
|
42 | (7) |
|
3.2.2 Column-oriented Databases |
|
|
49 | (1) |
|
|
50 | (1) |
|
|
51 | (6) |
|
3.2.2.3 From Relational Models to HBase Data Models |
|
|
57 | (12) |
|
3.2.3 Document-oriented Databases |
|
|
69 | (1) |
|
|
69 | (2) |
|
|
71 | (8) |
|
|
79 | (1) |
|
|
79 | (3) |
|
|
82 | (6) |
|
3.3 NewSQL Databases and Translytical Databases |
|
|
88 | (5) |
|
4 OLAP-oriented Databases for Big Data Environments |
|
|
93 | (50) |
|
4.1 Hive: the De Facto SQL-on-Hadoop Engine |
|
|
94 | (25) |
|
4.1.1 Data Storage Formats |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
100 | (5) |
|
|
105 | (2) |
|
|
107 | (4) |
|
|
111 | (1) |
|
|
112 | (1) |
|
4.1.2 Partitions and Buckets |
|
|
113 | (6) |
|
4.2 From Dimensional Models to Tabular Models |
|
|
119 | (12) |
|
4.2.1 Primary Data Tables |
|
|
121 | (4) |
|
4.2.2 Derived Data Tables |
|
|
125 | (6) |
|
4.3 Optimizing OLAP workloads with Druid |
|
|
131 | (12) |
|
5 Design and Implementation of Big Data Warehouses |
|
|
143 | (34) |
|
5.1 Big Data Warehousing: an Overview |
|
|
144 | (3) |
|
5.2 Model of Logical Components and Data Flows |
|
|
147 | (11) |
|
5.2.1 Data Provider and Data Consumer |
|
|
149 | (1) |
|
5.2.2 Big Data Application Provider |
|
|
149 | (2) |
|
5.2.3 Big Data Framework Provider |
|
|
151 | (1) |
|
5.2.3.1 Messaging/Communications, Resource Management, and Infrastructures |
|
|
152 | (1) |
|
|
153 | (1) |
|
5.2.3.3 Storage: Data Organization and Distribution |
|
|
154 | (3) |
|
5.2.4 System Orchestrator and Security, Privacy, and Management |
|
|
157 | (1) |
|
5.3 Model of Technological Infrastructure |
|
|
158 | (5) |
|
5.4 Method for Data Modeling |
|
|
163 | (14) |
|
5.4.1 Analytical Objects and their Related Concepts |
|
|
164 | (3) |
|
5.4.2 Joining, Uniting, and Materializing Analytical Objects |
|
|
167 | (2) |
|
5.4.3 Dimensional Big Data with Outsourced Descriptive Families |
|
|
169 | (2) |
|
5.4.4 Data Modeling Best Practices |
|
|
171 | (1) |
|
5.4.4.1 Using Null Values |
|
|
171 | (1) |
|
5.4.4.2 Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes |
|
|
172 | (1) |
|
5.4.4.3 Immutable vs. Mutable Records |
|
|
173 | (1) |
|
5.4.5 Data Modeling Advantages and Disadvantages |
|
|
174 | (3) |
|
6 Big Data Warehouses Modeling: From Theory to Practice |
|
|
177 | (20) |
|
6.1 Multinational Bicycle Wholesale and Manufacturing |
|
|
178 | (5) |
|
6.1.1 Fully Flat or Fully Dimensional Data Models |
|
|
180 | (1) |
|
|
181 | (1) |
|
6.1.3 Streaming and Random Access on Mutable Analytical Objects |
|
|
182 | (1) |
|
|
183 | (5) |
|
6.2.1 Unnecessary Complementary Analytical Objects and Update Problems |
|
|
183 | (2) |
|
6.2.1.1 The Traditional Way of Handling SCD-like Scenarios |
|
|
185 | (1) |
|
6.2.1.2 A New Way of Handling SCD-like Scenarios |
|
|
185 | (1) |
|
6.2.2 Joining Complementary Analytical Objects |
|
|
186 | (1) |
|
6.2.3 Data Science Models and Insights as a Core Value |
|
|
186 | (1) |
|
6.2.4 Partition Keys for Streaming and Batch Analytical Objects |
|
|
187 | (1) |
|
|
188 | (4) |
|
6.3.1 Simpler Data Models: Dynamic Partitioning Schemas |
|
|
189 | (1) |
|
6.3.2 Considerations for Spatial Objects |
|
|
189 | (1) |
|
6.3.3 Analyzing Non-Existing Events |
|
|
190 | (1) |
|
6.3.4 Wide Descriptive Families |
|
|
190 | (1) |
|
6.3.5 The Need for Joins in Data CPE Workloads |
|
|
191 | (1) |
|
6.4 Code Version Control System |
|
|
192 | (1) |
|
6.5 A Global Database of Society - The GDELT Project |
|
|
193 | (1) |
|
|
194 | (3) |
|
7 Fueling Analytical Objects in Big Data Warehouses |
|
|
197 | (22) |
|
7.1 From Traditional Data Warehouses |
|
|
198 | (2) |
|
7.2 From OLTP NoSQL Databases |
|
|
200 | (2) |
|
7.3 From Semi-structured Data Sources |
|
|
202 | (2) |
|
7.4 From Streaming Data Sources |
|
|
204 | (6) |
|
7.5 Using Data Science Models |
|
|
210 | (9) |
|
7.5.1 Data Mining/Machine Learning Models for Structured Data |
|
|
211 | (5) |
|
7.5.2 Text Mining, Image Mining, and Video Mining Models |
|
|
216 | (3) |
|
8 Evaluating the Performance of Big Data Warehouses |
|
|
219 | (26) |
|
|
220 | (3) |
|
8.1.1 Data Model and Queries |
|
|
220 | (1) |
|
8.1.2 System Architecture and Infrastructure |
|
|
221 | (2) |
|
|
223 | (13) |
|
8.2.1 Comparing Flat Analytical Objects with Star Schemas |
|
|
223 | (4) |
|
8.2.2 Improving Performance with Adequate Data Partitioning |
|
|
227 | (3) |
|
8.2.3 The Impact of Dimensions' Size in Star Schemas |
|
|
230 | (2) |
|
8.2.4 The Impact of Nested Structures in Analytical Objects |
|
|
232 | (2) |
|
8.2.5 Drill Across Queries and Window and Analytics Functions |
|
|
234 | (2) |
|
|
236 | (6) |
|
8.3.1 The Impact of Data Volume in the Streaming Storage Component |
|
|
236 | (3) |
|
8.3.2 Considerations for Effective and Efficient Streaming OLAP |
|
|
239 | (3) |
|
8.4 SQL-on-Hadoop Systems under Multi-User Environments |
|
|
242 | (3) |
|
9 Big Data Warehousing in Smart Cities |
|
|
245 | (18) |
|
9.1 Logical Components, Data Flows, and Technological Infrastructure |
|
|
246 | (5) |
|
9.1.1 SusCity Architecture |
|
|
247 | (3) |
|
9.1.2 SusCity Infrastructure |
|
|
250 | (1) |
|
|
251 | (4) |
|
9.2.1 Buildings Characteristics as an Outsourced Descriptive Family |
|
|
254 | (1) |
|
9.2.2 Nested Structures in Analytical Objects |
|
|
255 | (1) |
|
9.3 The Inter-storage Pipeline |
|
|
255 | (1) |
|
9.4 The SusCity Data Visualization Platform |
|
|
256 | (7) |
|
9.4.1 City's Energy Consumption |
|
|
257 | (1) |
|
9.4.2 City's Energy Grid Simulations |
|
|
258 | (1) |
|
9.4.3 Buildings' Performance Analysis and Simulation |
|
|
258 | (2) |
|
9.4.4 Mobility Patterns Analysis |
|
|
260 | (3) |
|
|
263 | (8) |
|
10.1 Synopsis of the Book |
|
|
265 | (5) |
|
10.2 Contributions to the State of the Art |
|
|
270 | (1) |
References |
|
271 | (10) |
Index |
|
281 | |