Muutke küpsiste eelistusi

E-raamat: Julia for Data Analysis

  • Formaat: 472 pages
  • Ilmumisaeg: 14-Feb-2023
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638351788
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 51,64 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 472 pages
  • Ilmumisaeg: 14-Feb-2023
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638351788
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Master core data analysis skills using Julia. Julia for Data Analysis is a fascinating, hands-on projects guide you through time series data, predictive models, popularity ranking, and more.

With this book, you will learn how to:





Read and write data in various formats Work with tabular data, including subsetting, grouping, and transforming Visualise your data using plots Perform statistical analysis Build predictive models Create complex data processing pipelines

Julia was designed for the unique needs of data scientists: it's expressive and easy-to-use whilst also delivering super fast code execution.

Julia for Data Analysis teaches you how to perform core data science tasks with this amazing language. It is written by Bogumi Kamiski, a top contributor to Julia, #1 Julia answerer on StackOverflow, and a lead developer of Julia's core data package DataFrames.jl.

You will learn how to write production-quality code in Julia, and utilize Julia's core features for data gathering, visualisation, and working with data frames. Plus, the engaging hands-on projects get you into the action quickly.

About the technology Julia is a huge step forward for data science and scientific computing. It is a powerful high-performance programming language with many developer-friendly features like garbage collection, dynamic typing, just-in-time compilation, and a flexible approach to concurrent, parallel, and distributed computing. Although Julia's strong numerical programming features make it a favorite of data scientists, it is also an awesome general purpose programming language.





About the reader For data scientists familiar with Python or R. No experience with Julia required.

Arvustused

"A brilliant guide to data analysis with Julia." Kevin Cheung, Carleton University

"One of the best structured and well-written presentations of language fundamentals and analysis concepts that I've encountered. I highly recommend this book." Maureen Metzger, Kumanu

"A solid, hands-on, and really enjoyable introduction to Julia." Sonja Krause-Harder, Elastic

Foreword xiii
Preface xv
Acknowledgments xvii
About this book xix
About the author xxiv
About the cover illustration xxv
1 Introduction
1(18)
1.1 What is Julia and why is it useful?
2(4)
1.2 Key features of Julia from a data scientist's perspective
6(4)
Julia is fast because it is a compiled language
6(2)
Julia provides full support for interactive workflows
8(1)
Julia programs are highly reusable and easy to compose together
8(1)
Julia has a built-in state-of-the-art package manager
9(1)
It is easy to integrate existing code with Julia
10(1)
1.3 Usage scenarios of tools presented in the book
10(1)
1.4 Julia's drawbacks
11(2)
1.5 What data analysis skills will you learn?
13(1)
1.6 How can Julia be used for data analysis?
13(6)
PART 1 Getting started with Julia
19(164)
2.1 Representing values
20(3)
2.2 Defining variables
23(3)
2.3 Using the most important control-flow constructs
26(10)
Computations depending on a Boolean condition
26(6)
Loops
32(1)
Compound expressions
33(2)
A first approach to calculating the winsorized mean
35(1)
2.4 Defining functions
36(8)
Defining functions using the function keyword
37(1)
Positional and key word arguments of functions
37(2)
Rules for passing arguments to functions
39(1)
Short syntax for defining simple functions
39(1)
Anonymous functions
40(1)
Do blocks
41(1)
Function-naming convention in Julia
42(1)
A simplified definition of a function computing the winsorized mean
43(1)
2.5 Understanding variable scoping rules
44(5)
3 Julia's support for scaling projects
49(21)
3.1 Understandingjulia's type system
50(5)
A single function in Julia may have multiple methods
50(1)
Types in Julia are arranged in a hierarchy
51(1)
Finding all supertypes of a type
52(1)
Finding all subtypes of a type
52(1)
Union of types
53(1)
Deciding what type restrictions to put in method signature
54(1)
3.2 Using multiple dispatch in Julia
55(4)
Rules for defining methods of a function
55(1)
Method ambiguity problem
56(1)
Improved implementation of winsorized mean
57(2)
3.3 Working with packages and modules
59(6)
What is a module in Julia?
59(2)
How can packages be used in Julia?
61(2)
Using Stats Base.jl to compute the winsorized mean
63(2)
3.4 Using macros
65(5)
4 Working with collections in Julia
70(53)
4.1 Working with arrays
70(18)
Getting the data into a matrix
72(4)
Computing basic statistics of the data stored in a matrix
76(2)
Indexing into arrays
78(3)
Performance considerations of copying vs. making a view
81(1)
Calculating correlations between variables
82(1)
Fitting a linear regression
83(3)
Plotting the Anscombe's quartet data
86(2)
4.2 Mapping key-value pairs with dictionaries
88(5)
4.3 Structuring your data by using named tuples
93(8)
Defining named tuples and accessing their contents
94(1)
Analyzing Anscombe's quartet data stored in a named tuple
95(1)
Understanding composite types and mutability of values in Julia
96(4)
Advanced topics on handling collections
100(1)
5.1 Vectorizing your code using broadcasting
101(11)
Understanding syntax and meaning of broadcasting in Julia
101(2)
Expanding length-1 dimensions in broadcasting
103(3)
Protecting collections from being broadcasted over
106(3)
Analyzing Anscombe's quartet data using broadcasting
109(3)
5.2 Defining methods with parametric types
112(5)
Most collection types in Julia are parametric
112(2)
Rules for sub typing of parametric types
114(2)
Using sub typing rules to define the covariance function
116(1)
5.3 Integrating with Python
117(6)
Preparing data for dimensionality reduction using t-SNE
117(1)
Calling Python from Julia
118(2)
Visualizing the results of the t-SNE algorithm
120(3)
6 Working with strings
123(31)
6.1 Getting and inspecting the data
124(4)
Downloading files from the web
125(1)
Using common techniques of string construction
125(2)
Reading the contents of a file
127(1)
6.2 Splitting strings
128(2)
6.3 Using regular expressions to work with strings
130(2)
Working with regular expressions
130(1)
Writing a parser of a single line of movies.dat file
131(1)
6.4 Extracting a subset from a string with indexing
132(3)
UTF-8 encoding of strings in Julia
132(1)
Character vs. byte indexing of strings
133(1)
ASCII strings
134(1)
The Char type
135(1)
6.5 Analyzinggenrefrequencyinmovies.dat
135(5)
Finding common movie genres
135(2)
Understanding genre popularity evolution over the years
137(3)
6.6 Introducing symbols
140(3)
Creating symbols
140(1)
Using symbols
141(2)
6.7 Using fixed-width string types to improve performance
143(3)
Available fixed-width strings
143(1)
Performance of fixed-width strings
144(2)
6.8 Compressing vectors of strings with PooledArrays.jl
146(5)
Creating a file containing flower names
146(1)
Reading in the data to a vector and compressing it
147(1)
Understanding the internal design of PooledArray
148(3)
6.9 Choosing appropriate storage for collections of strings
151(3)
7 Handling time-series data and missing values
154(29)
7.1 Understanding the NBP Web API
155(8)
Getting the data via a web browser
155(2)
Getting the data by using Julia
157(2)
Handling cases when an NBP Web API query fails
159(4)
7.2 Working with missing data in Julia
163(6)
Definition of the missing value
163(1)
Working with missing values
164(5)
7.3 Getting time-series data from the NBP Web API
169(4)
Working with dates
170(2)
Fetching data from the NBP Web API for a range of dates
172(1)
7.4 Analyzing data fetched from the NBP Web API
173(10)
Computing summary statistics
174(1)
Finding which days of the week have the most missing values
174(1)
Plotting the PLN/USD exchange rate
175(8)
PART 2 Toolbox for data analysis
183(210)
8 First steps with data frames
185(24)
8.1 Fetching, unpacking, and inspecting the data
187(3)
Downloading the file from the web
187(1)
Working with bzip2 archives
188(2)
Inspecting the CSV file
190(1)
8.2 Loading the data to a data frame
190(6)
Reading a CSV file into a data frame
190(2)
Inspecting the contents of a data frame
192(3)
Saving a data frame to a CSV file
195(1)
8.3 Getting a column out of a data frame
196(7)
Understanding the data frame's storage model
196(1)
Treating a data frame column as a property
197(3)
Getting a column by using data frame indexing
200(2)
Visualizing data stored in columns of a data frame
202(1)
8.4 Reading and writing data frames using different formats
203(6)
Apache Arrow
204(1)
SQLite
205(4)
9 Getting data from a data frame
209(24)
9.1 Advanced data frame indexing
210(15)
Getting a reduced puzzles data frame
212(3)
Overview of allowed column selectors
215(5)
Overview of allowed row-subsetting values
220(3)
Making views of data frame objects
223(2)
9.2 Analyzing the relationship between puzzle difficulty and popularity
225(8)
Calculating mean puzzle popularity by its rating
225(4)
Fitting LOESS regression
229(4)
10 Creating data frame objects
233(32)
10.1 Reviewing the most important ways to create a data frame
234(14)
Creating a data frame from a matrix
235(2)
Creating a data frame from vectors
237(7)
Creating a data frame using a Tables.jl interface
244(2)
Plotting a correlation matrix of data stored in a data frame
246(2)
10.2 Creating data frames incrementally
248(17)
Vertically concatenating data frames
248(5)
Appending a table to a data frame
253(3)
Adding a new row to an existing data frame
256(1)
Storing simulation results in a data frame
257(8)
11 Converting and grouping data frames
265(26)
11.1 Converting a data frame to other value types
266(14)
Conversion to a matrix
268(1)
Conversion to a named tuple of vectors
269(7)
Other common conversions
276(4)
11.2 Grouping data frame objects
280(11)
Preparing the source data frame
280(1)
Grouping a data frame
281(1)
Getting group keys of a grouped data frame
282(1)
Indexing a grouped data frame with a single value
283(2)
Comparing performance of indexing methods
285(1)
Indexing a grouped data frame with multiple values
286(2)
Iterating a grouped data frame
288(3)
12 Mutating and transforming data frames
291(36)
12.1 Getting and loading the GitHub developers data set
292(11)
Understanding graphs
293(1)
Fetching GitHub developer data from the web
294(2)
Implementing a function that extracts data from a ZIP file
296(2)
Reading the GitHub developer data into a data frame
298(5)
12.2 Computing additional node features
303(8)
Creating a SimpleGraph object
303(2)
Computing features of nodes by using the Graphs.jl package
305(2)
Counting a node's web and machine learning neighbors
307(4)
12.3 Using the split-apply-combine approach to predict the developer's type
311(10)
Computing summary statistics of web and machine learning developer features
311(4)
Visualizing the relationship between the number of web and machine learning neighbors of a node
315(4)
Fitting a logistic regression model predicting developer type
319(2)
12.4 Reviewing data frame mutation operations
321(6)
Performing low-level API operations
321(2)
Using the insertcols! function to mutate a data frame
323(4)
13 Advanced transformations of data frames
327(66)
13.1 Getting and preprocessing the police stop data set
328(9)
Loading all required packages
328(1)
Introducing the &chain macro
329(2)
Getting the police stop data set
331(2)
Comparing functions that perform operations on columns
333(3)
Using short forms of operation specification syntax
336(1)
13.2 Investigating the violation column
337(8)
Finding the most frequent violations
337(3)
Vectorizing functions by using the ByRow wrapper
340(1)
Flattening data frames
341(1)
Using convenience syntax to get the number of rows of a data frame
341(1)
Sorting data frames
342(1)
Using advanced functionalities of Data Frames Meta.jl
343(2)
13.3 Preparing data for making predictions
345(9)
Performing initial transformation of the data
345(2)
Working with categorical data
347(2)
Joining data frames
349(1)
Reshaping data frames
350(3)
Dropping rows of a data frame that hold missing values
353(1)
13.4 Building a predictive model of arrest probability
354(7)
Splitting the data into train and test data sets
354(2)
Fitting a logistic regression model
356(1)
Evaluating the quality of a model's predictions
357(4)
13.5 Reviewing functionalities provided by DataFrames.jl
361(5)
Creating web services for sharing data analysis results
365(1)
14.1 Pricing financial options by using a Monte Carlo simulation
366(6)
Calculating the payoff of an Asian option definition
366(2)
Computing the value of an Asian option
368(1)
Understanding GBM
369(1)
Using a numerical approach to computing the Asian option value
370(2)
14.2 Implementing the option pricing simulator
372(7)
Starting Julia with multiple-thread support
372(1)
Computing the option payofffor a single sample of stock prices
373(2)
Computing the option value
375(4)
14.3 Creating a web service serving the Asian option valuation
379(4)
A general approach to building a web service
379(2)
Creating a web service using Genie.jl
381(2)
Running the web service
383(1)
14.4 Using the Asian option pricing web service
383(10)
Sending a single request to the web service
384(2)
Collecting responses to multiple requests from a web service in a data frame
386(1)
Unnesting a column of a data frame
387(2)
Plotting the results of Asian option pricing
389(4)
Appendix A First steps with Julia 393(12)
Appendix B Solutions to exercises 405(22)
Appendix C Julia packages for data science 427(4)
Index 431
Bogumi Kamiski is one of the lead developers of DataFrames.jl the core package for data manipulation in the Julia ecosystem. He has over 20 years of experience delivering data science projects for corporate customers. He has been teaching data science at the undergraduate and graduate levels for two decades.