Preface |
|
xi | |
1 Introduction to Apache Drill |
|
1 | (8) |
|
|
2 | (7) |
|
|
2 | (1) |
|
|
3 | (1) |
|
A Word About Drill's Performance |
|
|
4 | (1) |
|
A Very Brief History of Big Data |
|
|
5 | (2) |
|
Drill in the Big Data Ecosystem |
|
|
7 | (1) |
|
Comparing Drill with Similar Tools |
|
|
7 | (2) |
2 Installing and Running Drill |
|
9 | (8) |
|
Preparing Your Machine for Drill |
|
|
10 | (2) |
|
Special Configuration Instructions for Windows Installations |
|
|
10 | (2) |
|
Installing Drill on Windows |
|
|
12 | (1) |
|
Starting Drill on a Windows Machine |
|
|
12 | (1) |
|
Installing Drill in Embedded Mode on macOS or Linux |
|
|
13 | (1) |
|
Starting Drill on macOS or Linux in Embedded Mode |
|
|
13 | (1) |
|
Installing Drill in Distributed Mode on macOS or Linux |
|
|
14 | (2) |
|
Preparing Your Cluster for Drill |
|
|
15 | (1) |
|
Starting Drill in Distributed Mode |
|
|
16 | (1) |
|
Connecting to the Cluster |
|
|
16 | (1) |
|
|
16 | (1) |
3 Overview of Apache Drill |
|
17 | (16) |
|
The Apache Hadoop Ecosystem |
|
|
17 | (4) |
|
Drill Is a Low-Latency Query Engine |
|
|
18 | (1) |
|
Distributed Processing with HDFS |
|
|
18 | (1) |
|
Elements of a Drill System |
|
|
19 | (1) |
|
Drill Operation: The 30,000-Foot View |
|
|
20 | (1) |
|
Drill Is a Query Engine, Not a Database |
|
|
20 | (1) |
|
|
21 | (9) |
|
|
21 | (1) |
|
|
22 | (1) |
|
|
22 | (4) |
|
|
26 | (2) |
|
|
28 | (2) |
|
|
30 | (3) |
4 Querying Delimited Data |
|
33 | (32) |
|
Ways of Querying Data with Drill |
|
|
33 | (1) |
|
|
34 | (1) |
|
|
34 | (10) |
|
|
35 | (1) |
|
|
36 | (1) |
|
Specifying a Default Data Source |
|
|
37 | (2) |
|
Accessing Columns in a Query |
|
|
39 | (2) |
|
Delimited Data with Column Headers |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (1) |
|
Understanding Drill Data Types |
|
|
44 | (2) |
|
Cleaning and Preparing Data Using String Manipulation Functions |
|
|
46 | (3) |
|
Complex Data Conversion Functions |
|
|
48 | (1) |
|
Working with Dates and Times in Drill |
|
|
49 | (4) |
|
Converting Strings to Dates |
|
|
50 | (1) |
|
|
51 | (1) |
|
Date Arithmetic and Manipulation |
|
|
51 | (1) |
|
Date and Time Functions in Drill |
|
|
52 | (1) |
|
|
53 | (1) |
|
Data Analysis Using Drill |
|
|
54 | (8) |
|
Summarizing Data with Aggregate Functions |
|
|
55 | (7) |
|
Common Problems in Querying Delimited Data |
|
|
62 | (2) |
|
|
62 | (1) |
|
Illegal Characters in Column Headers |
|
|
63 | (1) |
|
Reserved Words in Column Names |
|
|
63 | (1) |
|
|
64 | (1) |
5 Analyzing Complex and Nested Data |
|
65 | (22) |
|
|
65 | (12) |
|
|
66 | (2) |
|
Accessing Maps (Key-Value Pairs) in Drill |
|
|
68 | (1) |
|
|
69 | (8) |
|
Analyzing Log Files with Drill |
|
|
77 | (8) |
|
Configuring Drill to Read HTTPD Web Server Logs |
|
|
77 | (1) |
|
|
78 | (4) |
|
Other Log Analysis with Drill |
|
|
82 | (3) |
|
|
85 | (2) |
6 Connecting Drill to Data Sources |
|
87 | (20) |
|
Querying Multiple Data Sources |
|
|
88 | (18) |
|
Configuring a New Storage Plug-in |
|
|
88 | (1) |
|
Connecting Drill to a Relational Database |
|
|
89 | (4) |
|
Querying Data in Hadoop from Drill |
|
|
93 | (1) |
|
Connecting to and Querying HBase from Drill |
|
|
93 | (2) |
|
Querying Hive Data from Drill |
|
|
95 | (2) |
|
Connecting to and Querying Streaming Data with Drill and Kafka |
|
|
97 | (2) |
|
Connecting to and Querying Kudu |
|
|
99 | (1) |
|
Connecting to and Querying MongoDB from Drill |
|
|
100 | (1) |
|
Connecting Drill to Cloud Storage |
|
|
100 | (4) |
|
Querying Time Series Data from Drill and OpenTSDB |
|
|
104 | (2) |
|
|
106 | (1) |
7 Connecting to Drill |
|
107 | (26) |
|
Understanding Drill's Interfaces |
|
|
107 | (5) |
|
|
108 | (1) |
|
|
109 | (2) |
|
|
111 | (1) |
|
Connecting to Drill with Python |
|
|
112 | (4) |
|
Using drillpy to Query Drill |
|
|
112 | (1) |
|
Connecting to Drill Using pydrill |
|
|
113 | (1) |
|
Other Ways of Connecting to Drill from Python |
|
|
114 | (2) |
|
Connecting to Drill Using R |
|
|
116 | (2) |
|
Querying Drill from R Using sergeant |
|
|
116 | (2) |
|
Connecting to Drill Using Java |
|
|
118 | (1) |
|
|
119 | (2) |
|
|
119 | (1) |
|
|
120 | (1) |
|
Interacting with Drill from PHP |
|
|
120 | (1) |
|
Querying Drill Using Node.js |
|
|
121 | (1) |
|
Using Drill as a Data Source in BI Tools |
|
|
121 | (11) |
|
Exploring Data with Apache Zeppelin and Drill |
|
|
122 | (5) |
|
Exploring Data with Apache Superset |
|
|
127 | (5) |
|
|
132 | (1) |
8 Data Engineering with Drill |
|
133 | (42) |
|
|
133 | (3) |
|
|
133 | (1) |
|
Data Life Cycle: Data Exploration to Production |
|
|
134 | (1) |
|
|
135 | (1) |
|
|
136 | (4) |
|
|
136 | (1) |
|
|
136 | (1) |
|
|
137 | (2) |
|
|
139 | (1) |
|
|
139 | (1) |
|
|
140 | (2) |
|
Format Plug-ins and Format Configuration |
|
|
140 | (1) |
|
|
141 | (1) |
|
|
141 | (1) |
|
Schema Inference Overview |
|
|
142 | (2) |
|
|
144 | (16) |
|
Schema Inference for Delimited Data |
|
|
146 | (5) |
|
|
151 | (2) |
|
Schema Inference for JSON |
|
|
153 | (2) |
|
Ambiguous Numeric Schemas |
|
|
155 | (5) |
|
Aligning Schemas Across Files |
|
|
160 | (2) |
|
|
162 | (6) |
|
|
164 | (3) |
|
|
167 | (1) |
|
Using Drill with the Parquet File Format |
|
|
168 | (1) |
|
Schema Evolution in Parquet |
|
|
169 | (1) |
|
Partitioning Data Directories |
|
|
169 | (4) |
|
Defining a Table Workspace |
|
|
172 | (1) |
|
Working with Queries in Production |
|
|
173 | (1) |
|
Capturing Schema Mapping in Views |
|
|
173 | (1) |
|
Running Challenging Queries in Scripts |
|
|
173 | (1) |
|
|
174 | (1) |
9 Deploying Drill in Production |
|
175 | (22) |
|
|
175 | (9) |
|
|
176 | (1) |
|
|
176 | (2) |
|
|
178 | (1) |
|
|
179 | (1) |
|
|
180 | (1) |
|
|
181 | (1) |
|
Distributing Drill Binaries and Configuration |
|
|
182 | (1) |
|
Starting the Drill Cluster |
|
|
183 | (1) |
|
|
184 | (4) |
|
Working with Apache Hadoop HDFS |
|
|
184 | (1) |
|
|
185 | (3) |
|
|
188 | (2) |
|
|
190 | (3) |
|
User-Defined Functions and Custom Plug-ins |
|
|
190 | (1) |
|
|
190 | (1) |
|
|
191 | (1) |
|
|
192 | (1) |
|
|
193 | (2) |
|
Monitoring the Drill Process |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
194 | (1) |
|
|
195 | (1) |
|
|
195 | (1) |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
196 | (1) |
10 Setting Up Your Development Environment |
|
197 | (4) |
|
|
197 | (1) |
|
Creating the Drill Build Environment |
|
|
198 | (1) |
|
Setting Up Git and Getting the Source Code |
|
|
198 | (1) |
|
Building Drill from Source |
|
|
199 | (1) |
|
|
199 | (1) |
|
|
200 | (1) |
11 Writing Drill User-Defined Functions |
|
201 | (20) |
|
Use Case: Finding and Filtering Valid Credit Card Numbers |
|
|
201 | (1) |
|
How User-Defined Functions Work in Drill |
|
|
202 | (1) |
|
Structure of a Simple Drill UDF |
|
|
203 | (8) |
|
|
203 | (2) |
|
|
205 | (4) |
|
|
209 | (1) |
|
|
209 | (2) |
|
Building and Installing Your UDF |
|
|
211 | (1) |
|
Statically Installing a UDF |
|
|
211 | (1) |
|
Dynamically Installing a UDF |
|
|
211 | (1) |
|
Complex Functions: UDFs That Return Maps or Arrays |
|
|
212 | (3) |
|
Example: Extracting User Agent Metadata |
|
|
213 | (1) |
|
|
213 | (2) |
|
Writing Aggregate User-Defined Functions |
|
|
215 | (5) |
|
The Aggregate Function API |
|
|
216 | (1) |
|
Example Aggregate UDF: Kendall's Rank Correlation Coefficient |
|
|
217 | (3) |
|
|
220 | (1) |
12 Writing a Format Plug-in |
|
221 | (38) |
|
The Example Regex Format Plug-in |
|
|
221 | (1) |
|
Creating the "Easy" Format Plug-in |
|
|
222 | (5) |
|
Creating the Maven pom.xml File |
|
|
223 | (2) |
|
Creating the Plug-in Package |
|
|
225 | (1) |
|
Drill Module Configuration |
|
|
225 | (1) |
|
Format Plug-in Configuration |
|
|
226 | (1) |
|
Cautions Before Getting Started |
|
|
226 | (1) |
|
Creating the Regex Plug-in Configuration Class |
|
|
227 | (3) |
|
Copyright Headers and Code Format |
|
|
228 | (1) |
|
Testing the Configuration |
|
|
228 | (1) |
|
Fixing Configuration Problems |
|
|
229 | (1) |
|
|
230 | (1) |
|
Creating the Format Plug-in Class |
|
|
230 | (6) |
|
|
232 | (1) |
|
|
233 | (1) |
|
|
233 | (1) |
|
|
234 | (1) |
|
How Drill Finds Your Plug-in |
|
|
235 | (1) |
|
|
236 | (14) |
|
|
238 | (1) |
|
|
239 | (1) |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
241 | (1) |
|
|
241 | (1) |
|
Column Projection Accounting |
|
|
242 | (1) |
|
|
243 | (1) |
|
|
243 | (1) |
|
|
243 | (2) |
|
|
245 | (1) |
|
|
246 | (1) |
|
Drill's Columnar Structure |
|
|
246 | (1) |
|
|
247 | (1) |
|
|
248 | (1) |
|
Loading Data into Vectors |
|
|
248 | (1) |
|
|
249 | (1) |
|
|
250 | (3) |
|
Testing the Wildcard Case |
|
|
250 | (1) |
|
Testing Explicit Projection |
|
|
251 | (1) |
|
|
251 | (1) |
|
|
252 | (1) |
|
|
253 | (4) |
|
|
253 | (1) |
|
Default Format Configuration |
|
|
253 | (1) |
|
|
254 | (1) |
|
|
255 | (1) |
|
Contributing to Drill: The Pull Request |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
256 | (1) |
|
|
257 | (2) |
13 Unique Uses of Drill |
|
259 | (18) |
|
Finding Photos Taken Within a Geographic Region |
|
|
259 | (1) |
|
|
260 | (6) |
|
|
261 | (1) |
|
The Excel Custom Record Reader |
|
|
262 | (4) |
|
Using the Excel Format Plug-in |
|
|
266 | (1) |
|
Network Packet Analysis (PCAP) with Drill |
|
|
266 | (5) |
|
Examples of Queries Using PCAP Data Files |
|
|
267 | (4) |
|
Analyzing Twitter Data with Drill |
|
|
271 | (1) |
|
Using Drill in a Machine Learning Pipeline |
|
|
272 | (4) |
|
Making Predictions Within Drill |
|
|
272 | (1) |
|
Building and Serializing a Model |
|
|
272 | (1) |
|
|
273 | (2) |
|
Making Predictions Using the UDF |
|
|
275 | (1) |
|
|
276 | (1) |
A List of Drill Functions |
|
277 | (14) |
B Drill Formatting Strings |
|
291 | (2) |
Index |
|
293 | |