Foreword |
|
xiii | |
Preface |
|
xv | |
Acknowledgments |
|
xvii | |
About this book |
|
xix | |
About the author |
|
xxiv | |
About the cover illustration |
|
xxv | |
|
|
1 | (18) |
|
1.1 What is Julia and why is it useful? |
|
|
2 | (4) |
|
1.2 Key features of Julia from a data scientist's perspective |
|
|
6 | (4) |
|
Julia is fast because it is a compiled language |
|
|
6 | (2) |
|
Julia provides full support for interactive workflows |
|
|
8 | (1) |
|
Julia programs are highly reusable and easy to compose together |
|
|
8 | (1) |
|
Julia has a built-in state-of-the-art package manager |
|
|
9 | (1) |
|
It is easy to integrate existing code with Julia |
|
|
10 | (1) |
|
1.3 Usage scenarios of tools presented in the book |
|
|
10 | (1) |
|
|
11 | (2) |
|
1.5 What data analysis skills will you learn? |
|
|
13 | (1) |
|
1.6 How can Julia be used for data analysis? |
|
|
13 | (6) |
|
PART 1 Getting started with Julia |
|
|
19 | (164) |
|
|
20 | (3) |
|
|
23 | (3) |
|
2.3 Using the most important control-flow constructs |
|
|
26 | (10) |
|
Computations depending on a Boolean condition |
|
|
26 | (6) |
|
|
32 | (1) |
|
|
33 | (2) |
|
A first approach to calculating the winsorized mean |
|
|
35 | (1) |
|
|
36 | (8) |
|
Defining functions using the function keyword |
|
|
37 | (1) |
|
Positional and key word arguments of functions |
|
|
37 | (2) |
|
Rules for passing arguments to functions |
|
|
39 | (1) |
|
Short syntax for defining simple functions |
|
|
39 | (1) |
|
|
40 | (1) |
|
|
41 | (1) |
|
Function-naming convention in Julia |
|
|
42 | (1) |
|
A simplified definition of a function computing the winsorized mean |
|
|
43 | (1) |
|
2.5 Understanding variable scoping rules |
|
|
44 | (5) |
|
3 Julia's support for scaling projects |
|
|
49 | (21) |
|
3.1 Understandingjulia's type system |
|
|
50 | (5) |
|
A single function in Julia may have multiple methods |
|
|
50 | (1) |
|
Types in Julia are arranged in a hierarchy |
|
|
51 | (1) |
|
Finding all supertypes of a type |
|
|
52 | (1) |
|
Finding all subtypes of a type |
|
|
52 | (1) |
|
|
53 | (1) |
|
Deciding what type restrictions to put in method signature |
|
|
54 | (1) |
|
3.2 Using multiple dispatch in Julia |
|
|
55 | (4) |
|
Rules for defining methods of a function |
|
|
55 | (1) |
|
|
56 | (1) |
|
Improved implementation of winsorized mean |
|
|
57 | (2) |
|
3.3 Working with packages and modules |
|
|
59 | (6) |
|
What is a module in Julia? |
|
|
59 | (2) |
|
How can packages be used in Julia? |
|
|
61 | (2) |
|
Using Stats Base.jl to compute the winsorized mean |
|
|
63 | (2) |
|
|
65 | (5) |
|
4 Working with collections in Julia |
|
|
70 | (53) |
|
|
70 | (18) |
|
Getting the data into a matrix |
|
|
72 | (4) |
|
Computing basic statistics of the data stored in a matrix |
|
|
76 | (2) |
|
|
78 | (3) |
|
Performance considerations of copying vs. making a view |
|
|
81 | (1) |
|
Calculating correlations between variables |
|
|
82 | (1) |
|
Fitting a linear regression |
|
|
83 | (3) |
|
Plotting the Anscombe's quartet data |
|
|
86 | (2) |
|
4.2 Mapping key-value pairs with dictionaries |
|
|
88 | (5) |
|
4.3 Structuring your data by using named tuples |
|
|
93 | (8) |
|
Defining named tuples and accessing their contents |
|
|
94 | (1) |
|
Analyzing Anscombe's quartet data stored in a named tuple |
|
|
95 | (1) |
|
Understanding composite types and mutability of values in Julia |
|
|
96 | (4) |
|
Advanced topics on handling collections |
|
|
100 | (1) |
|
5.1 Vectorizing your code using broadcasting |
|
|
101 | (11) |
|
Understanding syntax and meaning of broadcasting in Julia |
|
|
101 | (2) |
|
Expanding length-1 dimensions in broadcasting |
|
|
103 | (3) |
|
Protecting collections from being broadcasted over |
|
|
106 | (3) |
|
Analyzing Anscombe's quartet data using broadcasting |
|
|
109 | (3) |
|
5.2 Defining methods with parametric types |
|
|
112 | (5) |
|
Most collection types in Julia are parametric |
|
|
112 | (2) |
|
Rules for sub typing of parametric types |
|
|
114 | (2) |
|
Using sub typing rules to define the covariance function |
|
|
116 | (1) |
|
5.3 Integrating with Python |
|
|
117 | (6) |
|
Preparing data for dimensionality reduction using t-SNE |
|
|
117 | (1) |
|
Calling Python from Julia |
|
|
118 | (2) |
|
Visualizing the results of the t-SNE algorithm |
|
|
120 | (3) |
|
|
123 | (31) |
|
6.1 Getting and inspecting the data |
|
|
124 | (4) |
|
Downloading files from the web |
|
|
125 | (1) |
|
Using common techniques of string construction |
|
|
125 | (2) |
|
Reading the contents of a file |
|
|
127 | (1) |
|
|
128 | (2) |
|
6.3 Using regular expressions to work with strings |
|
|
130 | (2) |
|
Working with regular expressions |
|
|
130 | (1) |
|
Writing a parser of a single line of movies.dat file |
|
|
131 | (1) |
|
6.4 Extracting a subset from a string with indexing |
|
|
132 | (3) |
|
UTF-8 encoding of strings in Julia |
|
|
132 | (1) |
|
Character vs. byte indexing of strings |
|
|
133 | (1) |
|
|
134 | (1) |
|
|
135 | (1) |
|
6.5 Analyzinggenrefrequencyinmovies.dat |
|
|
135 | (5) |
|
Finding common movie genres |
|
|
135 | (2) |
|
Understanding genre popularity evolution over the years |
|
|
137 | (3) |
|
|
140 | (3) |
|
|
140 | (1) |
|
|
141 | (2) |
|
6.7 Using fixed-width string types to improve performance |
|
|
143 | (3) |
|
Available fixed-width strings |
|
|
143 | (1) |
|
Performance of fixed-width strings |
|
|
144 | (2) |
|
6.8 Compressing vectors of strings with PooledArrays.jl |
|
|
146 | (5) |
|
Creating a file containing flower names |
|
|
146 | (1) |
|
Reading in the data to a vector and compressing it |
|
|
147 | (1) |
|
Understanding the internal design of PooledArray |
|
|
148 | (3) |
|
6.9 Choosing appropriate storage for collections of strings |
|
|
151 | (3) |
|
7 Handling time-series data and missing values |
|
|
154 | (29) |
|
7.1 Understanding the NBP Web API |
|
|
155 | (8) |
|
Getting the data via a web browser |
|
|
155 | (2) |
|
Getting the data by using Julia |
|
|
157 | (2) |
|
Handling cases when an NBP Web API query fails |
|
|
159 | (4) |
|
7.2 Working with missing data in Julia |
|
|
163 | (6) |
|
Definition of the missing value |
|
|
163 | (1) |
|
Working with missing values |
|
|
164 | (5) |
|
7.3 Getting time-series data from the NBP Web API |
|
|
169 | (4) |
|
|
170 | (2) |
|
Fetching data from the NBP Web API for a range of dates |
|
|
172 | (1) |
|
7.4 Analyzing data fetched from the NBP Web API |
|
|
173 | (10) |
|
Computing summary statistics |
|
|
174 | (1) |
|
Finding which days of the week have the most missing values |
|
|
174 | (1) |
|
Plotting the PLN/USD exchange rate |
|
|
175 | (8) |
|
PART 2 Toolbox for data analysis |
|
|
183 | (210) |
|
8 First steps with data frames |
|
|
185 | (24) |
|
8.1 Fetching, unpacking, and inspecting the data |
|
|
187 | (3) |
|
Downloading the file from the web |
|
|
187 | (1) |
|
Working with bzip2 archives |
|
|
188 | (2) |
|
|
190 | (1) |
|
8.2 Loading the data to a data frame |
|
|
190 | (6) |
|
Reading a CSV file into a data frame |
|
|
190 | (2) |
|
Inspecting the contents of a data frame |
|
|
192 | (3) |
|
Saving a data frame to a CSV file |
|
|
195 | (1) |
|
8.3 Getting a column out of a data frame |
|
|
196 | (7) |
|
Understanding the data frame's storage model |
|
|
196 | (1) |
|
Treating a data frame column as a property |
|
|
197 | (3) |
|
Getting a column by using data frame indexing |
|
|
200 | (2) |
|
Visualizing data stored in columns of a data frame |
|
|
202 | (1) |
|
8.4 Reading and writing data frames using different formats |
|
|
203 | (6) |
|
|
204 | (1) |
|
|
205 | (4) |
|
9 Getting data from a data frame |
|
|
209 | (24) |
|
9.1 Advanced data frame indexing |
|
|
210 | (15) |
|
Getting a reduced puzzles data frame |
|
|
212 | (3) |
|
Overview of allowed column selectors |
|
|
215 | (5) |
|
Overview of allowed row-subsetting values |
|
|
220 | (3) |
|
Making views of data frame objects |
|
|
223 | (2) |
|
9.2 Analyzing the relationship between puzzle difficulty and popularity |
|
|
225 | (8) |
|
Calculating mean puzzle popularity by its rating |
|
|
225 | (4) |
|
|
229 | (4) |
|
10 Creating data frame objects |
|
|
233 | (32) |
|
10.1 Reviewing the most important ways to create a data frame |
|
|
234 | (14) |
|
Creating a data frame from a matrix |
|
|
235 | (2) |
|
Creating a data frame from vectors |
|
|
237 | (7) |
|
Creating a data frame using a Tables.jl interface |
|
|
244 | (2) |
|
Plotting a correlation matrix of data stored in a data frame |
|
|
246 | (2) |
|
10.2 Creating data frames incrementally |
|
|
248 | (17) |
|
Vertically concatenating data frames |
|
|
248 | (5) |
|
Appending a table to a data frame |
|
|
253 | (3) |
|
Adding a new row to an existing data frame |
|
|
256 | (1) |
|
Storing simulation results in a data frame |
|
|
257 | (8) |
|
11 Converting and grouping data frames |
|
|
265 | (26) |
|
11.1 Converting a data frame to other value types |
|
|
266 | (14) |
|
|
268 | (1) |
|
Conversion to a named tuple of vectors |
|
|
269 | (7) |
|
|
276 | (4) |
|
11.2 Grouping data frame objects |
|
|
280 | (11) |
|
Preparing the source data frame |
|
|
280 | (1) |
|
|
281 | (1) |
|
Getting group keys of a grouped data frame |
|
|
282 | (1) |
|
Indexing a grouped data frame with a single value |
|
|
283 | (2) |
|
Comparing performance of indexing methods |
|
|
285 | (1) |
|
Indexing a grouped data frame with multiple values |
|
|
286 | (2) |
|
Iterating a grouped data frame |
|
|
288 | (3) |
|
12 Mutating and transforming data frames |
|
|
291 | (36) |
|
12.1 Getting and loading the GitHub developers data set |
|
|
292 | (11) |
|
|
293 | (1) |
|
Fetching GitHub developer data from the web |
|
|
294 | (2) |
|
Implementing a function that extracts data from a ZIP file |
|
|
296 | (2) |
|
Reading the GitHub developer data into a data frame |
|
|
298 | (5) |
|
12.2 Computing additional node features |
|
|
303 | (8) |
|
Creating a SimpleGraph object |
|
|
303 | (2) |
|
Computing features of nodes by using the Graphs.jl package |
|
|
305 | (2) |
|
Counting a node's web and machine learning neighbors |
|
|
307 | (4) |
|
12.3 Using the split-apply-combine approach to predict the developer's type |
|
|
311 | (10) |
|
Computing summary statistics of web and machine learning developer features |
|
|
311 | (4) |
|
Visualizing the relationship between the number of web and machine learning neighbors of a node |
|
|
315 | (4) |
|
Fitting a logistic regression model predicting developer type |
|
|
319 | (2) |
|
12.4 Reviewing data frame mutation operations |
|
|
321 | (6) |
|
Performing low-level API operations |
|
|
321 | (2) |
|
Using the insertcols! function to mutate a data frame |
|
|
323 | (4) |
|
13 Advanced transformations of data frames |
|
|
327 | (66) |
|
13.1 Getting and preprocessing the police stop data set |
|
|
328 | (9) |
|
Loading all required packages |
|
|
328 | (1) |
|
Introducing the &chain macro |
|
|
329 | (2) |
|
Getting the police stop data set |
|
|
331 | (2) |
|
Comparing functions that perform operations on columns |
|
|
333 | (3) |
|
Using short forms of operation specification syntax |
|
|
336 | (1) |
|
13.2 Investigating the violation column |
|
|
337 | (8) |
|
Finding the most frequent violations |
|
|
337 | (3) |
|
Vectorizing functions by using the ByRow wrapper |
|
|
340 | (1) |
|
|
341 | (1) |
|
Using convenience syntax to get the number of rows of a data frame |
|
|
341 | (1) |
|
|
342 | (1) |
|
Using advanced functionalities of Data Frames Meta.jl |
|
|
343 | (2) |
|
13.3 Preparing data for making predictions |
|
|
345 | (9) |
|
Performing initial transformation of the data |
|
|
345 | (2) |
|
Working with categorical data |
|
|
347 | (2) |
|
|
349 | (1) |
|
|
350 | (3) |
|
Dropping rows of a data frame that hold missing values |
|
|
353 | (1) |
|
13.4 Building a predictive model of arrest probability |
|
|
354 | (7) |
|
Splitting the data into train and test data sets |
|
|
354 | (2) |
|
Fitting a logistic regression model |
|
|
356 | (1) |
|
Evaluating the quality of a model's predictions |
|
|
357 | (4) |
|
13.5 Reviewing functionalities provided by DataFrames.jl |
|
|
361 | (5) |
|
Creating web services for sharing data analysis results |
|
|
365 | (1) |
|
14.1 Pricing financial options by using a Monte Carlo simulation |
|
|
366 | (6) |
|
Calculating the payoff of an Asian option definition |
|
|
366 | (2) |
|
Computing the value of an Asian option |
|
|
368 | (1) |
|
|
369 | (1) |
|
Using a numerical approach to computing the Asian option value |
|
|
370 | (2) |
|
14.2 Implementing the option pricing simulator |
|
|
372 | (7) |
|
Starting Julia with multiple-thread support |
|
|
372 | (1) |
|
Computing the option payofffor a single sample of stock prices |
|
|
373 | (2) |
|
Computing the option value |
|
|
375 | (4) |
|
14.3 Creating a web service serving the Asian option valuation |
|
|
379 | (4) |
|
A general approach to building a web service |
|
|
379 | (2) |
|
Creating a web service using Genie.jl |
|
|
381 | (2) |
|
|
383 | (1) |
|
14.4 Using the Asian option pricing web service |
|
|
383 | (10) |
|
Sending a single request to the web service |
|
|
384 | (2) |
|
Collecting responses to multiple requests from a web service in a data frame |
|
|
386 | (1) |
|
Unnesting a column of a data frame |
|
|
387 | (2) |
|
Plotting the results of Asian option pricing |
|
|
389 | (4) |
Appendix A First steps with Julia |
|
393 | (12) |
Appendix B Solutions to exercises |
|
405 | (22) |
Appendix C Julia packages for data science |
|
427 | (4) |
Index |
|
431 | |