| Introduction |
|
xv | |
|
|
|
1 | (36) |
|
Chapter 1 Industry Needs and Solutions |
|
|
3 | (16) |
|
What's So Big About Big Data? |
|
|
4 | (1) |
|
A Brief History of Hadoop |
|
|
5 | (1) |
|
|
|
5 | (1) |
|
|
|
6 | (1) |
|
|
|
6 | (13) |
|
Derivative Works and Distributions |
|
|
7 | (1) |
|
|
|
8 | (1) |
|
|
|
9 | (2) |
|
Important Apache Projects for Hadoop |
|
|
11 | (6) |
|
|
|
17 | (1) |
|
|
|
17 | (2) |
|
Chapter 2 Microsoft's Approach to Big Data |
|
|
19 | (18) |
|
A Story of "Better Together" |
|
|
19 | (1) |
|
Competition in the Ecosystem |
|
|
20 | (5) |
|
|
|
21 | (1) |
|
|
|
21 | (2) |
|
|
|
23 | (2) |
|
Microsoft's Contribution to SQL in Hadoop |
|
|
25 | (1) |
|
|
|
25 | (12) |
|
|
|
26 | (3) |
|
|
|
29 | (4) |
|
|
|
33 | (3) |
|
|
|
36 | (1) |
|
Part II Setting Up for Big Data with Microsoft |
|
|
37 | (28) |
|
Chapter 3 Configuring Your First Big Data Environment |
|
|
39 | (26) |
|
|
|
39 | (1) |
|
|
|
40 | (1) |
|
|
|
40 | (15) |
|
On-Premise Installation: Single-Node Installation |
|
|
41 | (10) |
|
HDInsight Service: Installing in the Cloud |
|
|
51 | (1) |
|
Windows Azure Storage Explorer Options |
|
|
52 | (3) |
|
Validating Your New Cluster |
|
|
55 | (3) |
|
Logging into HDInsight Service |
|
|
55 | (2) |
|
Verify HDP Functionality in the Logs |
|
|
57 | (1) |
|
|
|
58 | (7) |
|
|
|
58 | (2) |
|
|
|
60 | (3) |
|
|
|
63 | (2) |
|
Part III Storing and Managing Big Data |
|
|
65 | (86) |
|
Chapter 4 HDFS, Hive, HBase, and HCatalog |
|
|
67 | (18) |
|
Exploring the Hadoop Distributed File System |
|
|
68 | (7) |
|
Explaining the HDFS Architecture |
|
|
69 | (3) |
|
|
|
72 | (3) |
|
Exploring Hive: The Hadoop Data Warehouse Platform |
|
|
75 | (3) |
|
Designing, Building, and Loading Tables |
|
|
76 | (1) |
|
|
|
77 | (1) |
|
Configuring the Hive ODBC Driver |
|
|
77 | (1) |
|
Exploring HCatalog: HDFS Table and Metadata Management |
|
|
78 | (2) |
|
Exploring HBase: An HDFS Column-Oriented Database |
|
|
80 | (5) |
|
|
|
81 | (1) |
|
Defining and Populating an HBase Table |
|
|
82 | (1) |
|
|
|
83 | (1) |
|
|
|
84 | (1) |
|
Chapter 5 Storing and Managing Data in HDFS |
|
|
85 | (20) |
|
Understanding the Fundamentals of HDFS |
|
|
86 | (6) |
|
|
|
87 | (2) |
|
|
|
89 | (1) |
|
|
|
90 | (2) |
|
Using Common Commands to Interact with HDFS |
|
|
92 | (1) |
|
Interfaces for Working with HDFS |
|
|
92 | (8) |
|
File Manipulation Commands |
|
|
94 | (3) |
|
Administrative Functions in HDFS |
|
|
97 | (3) |
|
Moving and Organizing Data in HDFS |
|
|
100 | (5) |
|
|
|
100 | (1) |
|
Implementing Data Structures for Easier Management |
|
|
101 | (1) |
|
|
|
102 | (1) |
|
|
|
103 | (2) |
|
Chapter 6 Adding Structure with Hive |
|
|
105 | (28) |
|
Understanding Hive's Purpose and Role |
|
|
106 | (11) |
|
Providing Structure for Unstructured Data |
|
|
107 | (7) |
|
Enabling Data Access and Transformation |
|
|
114 | (1) |
|
Differentiating Hive from Traditional RDBMS Systems |
|
|
115 | (1) |
|
|
|
116 | (1) |
|
Creating and Querying Basic Tables |
|
|
117 | (9) |
|
|
|
117 | (1) |
|
|
|
118 | (3) |
|
|
|
121 | (2) |
|
|
|
123 | (3) |
|
Using Advanced Data Structures with Hive |
|
|
126 | (7) |
|
Setting Up Partitioned Tables |
|
|
126 | (2) |
|
Loading Partitioned Tables |
|
|
128 | (1) |
|
|
|
129 | (1) |
|
Creating Indexes for Tables |
|
|
130 | (1) |
|
|
|
131 | (2) |
|
Chapter 7 Expanding Your Capability with HBase and HCatalog |
|
|
133 | (18) |
|
|
|
134 | (6) |
|
|
|
134 | (2) |
|
Loading Data into an HBase Table |
|
|
136 | (2) |
|
|
|
138 | (1) |
|
Loading and Querying HBase |
|
|
139 | (1) |
|
Managing Data with HCatalog |
|
|
140 | (3) |
|
Working with HCatalog and Hive |
|
|
140 | (1) |
|
|
|
141 | (2) |
|
|
|
143 | (1) |
|
|
|
143 | (2) |
|
Integrating HCatalog with Pig and Hive |
|
|
145 | (4) |
|
Using HBase or Hive as a Data Warehouse |
|
|
149 | (2) |
|
|
|
150 | (1) |
|
Part IV Working with Your Big Data |
|
|
151 | (52) |
|
Chapter 8 Effective Big Data ETL with SSIS, Pig, and Sqoop |
|
|
153 | (24) |
|
Combining Big Data and SQL Server Tools for Better Solutions |
|
|
154 | (2) |
|
|
|
154 | (1) |
|
Transferring Data Between Hadoop and SQL Server |
|
|
155 | (1) |
|
Working with SSIS and Hive |
|
|
156 | (5) |
|
|
|
157 | (4) |
|
Configuring Your Packages |
|
|
161 | (6) |
|
|
|
165 | (2) |
|
Getting the Best Performance from SSIS |
|
|
167 | (1) |
|
Transferring Data with Sqoop |
|
|
167 | (4) |
|
Copying Data from SQL Server |
|
|
168 | (2) |
|
Copying Data to SQL Server |
|
|
170 | (1) |
|
Using Pig for Data Movement |
|
|
171 | (4) |
|
Transforming Data with Pig |
|
|
171 | (3) |
|
Using Pig and SSIS Together |
|
|
174 | (1) |
|
|
|
175 | (2) |
|
|
|
175 | (1) |
|
|
|
175 | (1) |
|
|
|
176 | (1) |
|
|
|
176 | (1) |
|
Chapter 9 Data Research and Advanced Data Cleansing with Pig and Hive |
|
|
177 | (26) |
|
|
|
178 | (14) |
|
|
|
178 | (1) |
|
Taking Advantage of Built-in Functions |
|
|
179 | (1) |
|
Executing User-defined Functions |
|
|
180 | (2) |
|
|
|
182 | (7) |
|
Building Your Own UDFs for Pig |
|
|
189 | (3) |
|
|
|
192 | (11) |
|
|
|
192 | (1) |
|
|
|
192 | (3) |
|
Extending Hive with Map-reduce Scripts |
|
|
195 | (3) |
|
Creating a Custom Map-reduce Script |
|
|
198 | (1) |
|
Creating Your Own UDFs for Hive |
|
|
199 | (2) |
|
|
|
201 | (2) |
|
Part V Big Data and SQL Server Together |
|
|
203 | (132) |
|
Chapter 10 Data Warehouses and Hadoop Integration |
|
|
205 | (52) |
|
|
|
206 | (1) |
|
Challenges Faced by Traditional Data Warehouse Architectures |
|
|
207 | (9) |
|
|
|
207 | (6) |
|
|
|
213 | (3) |
|
Hadoop's Impact on the Data Warehouse Market |
|
|
216 | (4) |
|
|
|
216 | (1) |
|
Code First (Schema Later) |
|
|
217 | (1) |
|
|
|
218 | (1) |
|
Throw Compute at the Problem |
|
|
218 | (2) |
|
Introducing Parallel Data Warehouse (PDW) |
|
|
220 | (15) |
|
|
|
221 | (1) |
|
|
|
222 | (2) |
|
|
|
224 | (11) |
|
|
|
235 | (22) |
|
|
|
235 | (14) |
|
Business Use Cases for Polybase Today |
|
|
249 | (2) |
|
Speculating on the Future for Polybase |
|
|
251 | (4) |
|
|
|
255 | (2) |
|
Chapter 11 Visualizing Big Data with Microsoft BI |
|
|
257 | (28) |
|
|
|
258 | (5) |
|
|
|
258 | (1) |
|
|
|
258 | (1) |
|
|
|
259 | (2) |
|
|
|
261 | (1) |
|
|
|
261 | (2) |
|
Self-service Big Data with PowerPivot |
|
|
263 | (14) |
|
Setting Up the ODBC Driver |
|
|
263 | (2) |
|
|
|
265 | (7) |
|
|
|
272 | (1) |
|
|
|
273 | (1) |
|
|
|
274 | (3) |
|
Rapid Big Data Exploration with Power View |
|
|
277 | (4) |
|
Spatial Exploration with Power Map |
|
|
281 | (4) |
|
|
|
283 | (2) |
|
Chapter 12 Big Data Analytics |
|
|
285 | (12) |
|
Data Science, Data Mining, and Predictive Analytics |
|
|
286 | (2) |
|
|
|
286 | (1) |
|
|
|
287 | (1) |
|
|
|
288 | (1) |
|
Building a Recommendation Engine |
|
|
289 | (8) |
|
|
|
291 | (1) |
|
Running a User-to-user Recommendation Job |
|
|
292 | (3) |
|
Running an Item-to-item Recommendation Job |
|
|
295 | (1) |
|
|
|
296 | (1) |
|
Chapter 13 Big Data and the Cloud |
|
|
297 | (26) |
|
|
|
298 | (1) |
|
Exploring Big Data Cloud Providers |
|
|
299 | (1) |
|
|
|
299 | (1) |
|
|
|
300 | (1) |
|
Setting Up a Big Data Sandbox in the Cloud |
|
|
300 | (15) |
|
Getting Started with Amazon EMR |
|
|
301 | (6) |
|
Getting Started with HDInsight |
|
|
307 | (8) |
|
Storing Your Data in the Cloud |
|
|
315 | (8) |
|
|
|
316 | (1) |
|
|
|
317 | (1) |
|
Exploring Big Data Storage Tools |
|
|
318 | (1) |
|
|
|
319 | (2) |
|
|
|
321 | (1) |
|
|
|
321 | (2) |
|
Chapter 14 Big Data in the Real World |
|
|
323 | (12) |
|
Common Industry Analytics |
|
|
324 | (3) |
|
|
|
324 | (1) |
|
|
|
325 | (1) |
|
|
|
325 | (1) |
|
|
|
326 | (1) |
|
|
|
326 | (1) |
|
Marketing Social Sentiment |
|
|
327 | (1) |
|
|
|
327 | (8) |
|
|
|
328 | (1) |
|
A New Ecosystem of Technologies |
|
|
328 | (2) |
|
|
|
330 | (3) |
|
|
|
333 | (2) |
|
Part VI Moving Your Big Data Forward |
|
|
335 | (44) |
|
Chapter 15 Building and Executing Your Big Data Plan |
|
|
337 | (14) |
|
Gaining Sponsor and Stakeholder Buy-In |
|
|
338 | (4) |
|
|
|
338 | (1) |
|
|
|
339 | (2) |
|
|
|
341 | (1) |
|
Defining the Criteria for Success |
|
|
342 | (1) |
|
Identifying Technical Challenges |
|
|
342 | (3) |
|
|
|
342 | (2) |
|
|
|
344 | (1) |
|
Identifying Operational Challenges |
|
|
345 | (3) |
|
Planning for Setup/Configuration |
|
|
345 | (2) |
|
Planning for Ongoing Maintenance |
|
|
347 | (1) |
|
|
|
348 | (3) |
|
The Hand Off to Operations |
|
|
348 | (1) |
|
|
|
349 | (1) |
|
|
|
350 | (1) |
|
Chapter 16 Operational Big Data Management |
|
|
351 | (28) |
|
Hybrid Big Data Environments: Cloud and On-Premise Solutions Working Together |
|
|
352 | (1) |
|
Ongoing Data Integration with Cloud and On-Premise Solutions |
|
|
353 | (1) |
|
Integration Thoughts for Big Data |
|
|
354 | (2) |
|
Backups and High Availability Your Big Data Environment |
|
|
356 | (3) |
|
|
|
356 | (2) |
|
|
|
358 | (1) |
|
Big Data Solution Governance |
|
|
359 | (1) |
|
Creating Operational Analytics |
|
|
360 | (19) |
|
System Center Operations Manager for HDP |
|
|
361 | (1) |
|
Installing the Ambari SCOM Management Pack |
|
|
362 | (9) |
|
Monitoring with the Ambari SCOM Management Pack |
|
|
371 | (6) |
|
|
|
377 | (2) |
| Index |
|
379 | |