Muutke küpsiste eelistusi

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist 2nd ed. [Pehme köide]

  • Formaat: Paperback / softback, 511 pages, kõrgus x laius: 254x178 mm, kaal: 1016 g, 100 Illustrations, black and white; XXVIII, 511 p. 100 illus., 1 Paperback / softback
  • Ilmumisaeg: 24-Jun-2022
  • Kirjastus: APress
  • ISBN-10: 1484281543
  • ISBN-13: 9781484281543
Teised raamatud teemal:
  • Pehme köide
  • Hind: 53,33 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 62,74 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 511 pages, kõrgus x laius: 254x178 mm, kaal: 1016 g, 100 Illustrations, black and white; XXVIII, 511 p. 100 illus., 1 Paperback / softback
  • Ilmumisaeg: 24-Jun-2022
  • Kirjastus: APress
  • ISBN-10: 1484281543
  • ISBN-13: 9781484281543
Teised raamatud teemal:
Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. 

Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You’ll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. 

Modern data analysis requires computational skills and usually a minimum of programming. After reading and using this book, you'll have what you need to get started with R programming with data science applications.  Source code will be available to support your next projects as well.


What You Will Learn
  • Perform data science and analytics using statistics and the R programming language
  • Visualize and explore data, including working with large data sets found in big data
  • Build an R package
  • Test and check your code
  • Practice version control
  • Profile and optimize your code

Who This Book Is For

Those with some data science or analytics background, but not necessarily experience with the R programming language.
About the Author xv
About the Technical Reviewer xvii
Acknowledgments xix
Introduction xxi
Chapter 1 Introduction to R Programming
1(50)
Basic Interaction with R
1(2)
Using R As a Calculator
3(10)
Simple Expressions
4(2)
Assignments
6(3)
Indexing Vectors
9(2)
Vectorized Expressions
11(2)
Comments
13(1)
Functions
13(13)
Getting Documentation for Functions
14(2)
Writing Your Own Functions
16(1)
Summarizing and Vector Functions
17(3)
A Quick Look at Control Flow
20(6)
Factors
26(6)
Data Frames
32(4)
Using R Packages
36(1)
Dealing with Missing Values
37(1)
Data Pipelines
38(12)
Writing Pipelines of Function Calls
39(2)
Writing Functions That Work with Pipelines
41(1)
The Magical "." Argument
42(5)
Other Pipeline Operations
47(2)
Coding and Naming Conventions
49(1)
Exercises
50(1)
Mean of Positive Values
50(1)
Root Mean Square Error
50(1)
Chapter 2 Reproducible Analysis
51(22)
Literate Programming and Integration of Workflow and Documentation
52(1)
Creating an R Markdown/knitr Document in RStudio
53(4)
The YAML Language
57(2)
The Markdown Language
59(7)
Formatting Text
60(4)
Cross-Referencing
64(1)
Bibliographies
65(1)
Controlling the Output (Templates/Stylesheets)
66(1)
Running R Code in Markdown Documents
66(6)
Using chunks when analyzing data (without compiling documents)
69(1)
Caching Results
70(1)
Displaying Data
71(1)
Exercises
72(1)
Create an R Markdown Document
72(1)
Different Output
72(1)
Caching
72(1)
Chapter 3 Data Manipulation
73(48)
Data Already in R
73(2)
Quickly Reviewing Data
75(2)
Reading Data
77(2)
Examples of Reading and Formatting Data Sets
79(13)
Breast Cancer Data set
79(8)
Boston Housing Data Set
87(3)
The readr Package
90(2)
Manipulating Data with dplyr
92(26)
Some Useful dplyr Functions
94(12)
Breast Cancer Data Manipulation
106(4)
Tidying Data with tidyr
110(8)
Exercises
118(3)
Importing Data
118(1)
Using dplyr
119(1)
Using tidyr
119(2)
Chapter 4 Visualizing Data
121(40)
Basic Graphics
121(7)
The Grammar of Graphics and the ggplot2 Package
128(13)
Using qplot()
129(4)
Using Geometries
133(8)
Facets
141(4)
Scaling
145(11)
Themes and Other Graphics Transformations
151(5)
Figures with Multiple Plots
156(4)
Exercises
160(1)
Chapter 5 Working with Large Data Sets
161(18)
Subsample Your Data Before You Analyze the Full Data Set
162(2)
Running Out of Memory During an Analysis
164(2)
Too Large to Plot
166(5)
Too Slow to Analyze
171(2)
Too Large to Load
173(4)
Exercises
177(2)
Subsampling
177(1)
Hex and 2D Density Plots
177(2)
Chapter 6 Supervised Learning
179(60)
Machine Learning
179(1)
Supervised Learning
180(3)
Regression vs. Classification
181(1)
Inference vs. Prediction
182(1)
Specifying Models
183(21)
Linear Regression
183(6)
Logistic Regression (Classification, Really)
189(5)
Model Matrices and Formula
194(10)
Validating Models
204(14)
Evaluating Regression Models
206(3)
Evaluating Classification Models
209(1)
Confusion Matrix
210(3)
Accuracy
213(2)
Sensitivity and Specificity
215(1)
Other Measures
216(2)
More Than Two Classes
218(1)
Sampling Approaches
218(11)
Random Permutations of Your Data
219(4)
Cross-Validation
223(4)
Selecting Random Training and Testing Data
227(2)
Examples of Supervised Learning Packages
229(6)
Decision Trees
230(2)
Random Forests
232(1)
Neural Networks
233(2)
Support Vector Machines
235(1)
Naive Bayes
235(1)
Exercises
236(3)
Fitting Polynomials
236(1)
Evaluating Different Classification Measures
236(1)
Breast Cancer Classification
237(1)
Leave-One-Out Cross-Validation (Slightly More Difficult)
237(1)
Decision Trees
237(1)
Random Forests
237(1)
Neural Networks
238(1)
Support Vector Machines
238(1)
Compare Classification Algorithms
238(1)
Chapter 7 Unsupervised Learning
239(36)
Dimensionality Reduction
239(16)
Principal Component Analysis
240(10)
Multidimensional Scaling
250(5)
Clustering
255(12)
K-means Clustering
255(8)
Hierarchical Clustering
263(4)
Association Rules
267(6)
Exercises
273(2)
Dealing with Missing Data in the HouseVotes84 Data
273(1)
K-means
274(1)
Chapter 8 Project 1: Hitting the Bottle
275(12)
Importing Data
275(1)
Exploring the Data
276(6)
Distribution of Quality Scores
276(1)
Is This Wine Red or White?
277(5)
Fitting Models
282(3)
Exercises
285(2)
Exploring Other Formulas
285(1)
Exploring Different Models
285(1)
Analyzing Your Own Data Set
285(2)
Chapter 9 Deeper into R Programming
287(42)
Expressions
287(3)
Arithmetic Expressions
287(2)
Boolean Expressions
289(1)
Basic Data Types
290(4)
Numeric
291(1)
Integer
291(1)
Complex
292(1)
Logical
292(1)
Character
293(1)
Data Structures
294(12)
Vectors
294(2)
Matrix
296(2)
Lists
298(2)
Indexing
300(4)
Named Values
304(1)
Factors
305(1)
Formulas
305(1)
Control Structures
306(5)
Selection Statements
306(1)
Loops
307(4)
Functions
311(11)
Named Arguments
312(1)
Default Parameters
313(1)
Return Values
314(1)
Lazy Evaluation
315(2)
Scoping
317(5)
Function Names Are Different from Variable Names
322(1)
Recursive Functions
322(3)
Exercises
325(4)
Fibonacci Numbers
325(1)
Outer Product
325(1)
Linear Time Merge
325(1)
Binary Search
326(1)
More Sorting
326(1)
Selecting the K Smallest Element
327(2)
Chapter 10 Working with Vectors and Lists
329(20)
Working with Vectors and Vectorizing Functions
329(13)
Ifelse
332(1)
Vectorizing Functions
332(3)
The apply Family
335(1)
Apply
336(3)
Nothing Good, It Would Seem
339(1)
Lapply
340(2)
Sapply and vapply
342(1)
Advanced Functions
342(5)
Special Names
342(1)
Infix Operators
343(1)
Replacement Functions
344(3)
How Mutable Is Data Anyway?
347(1)
Exercises
348(1)
Between
348(1)
Rmq
348(1)
Chapter 11 Functional Programming
349(24)
Anonymous Functions
349(2)
Higher-Order Functions
351(6)
Functions Taking Functions As Arguments
351(1)
Functions Returning Functions (and Closures)
352(5)
Filter, Map, and Reduce
357(3)
Functional Programming with purrr
360(3)
Functions As Both Input and Output
363(7)
Ellipsis Parameters
368(2)
Exercises
370(3)
Apply_if
370(1)
Power
370(1)
Row and Column Sums
370(1)
Factorial Again
370(1)
Function Composition
371(1)
Implement This Operator
371(2)
Chapter 12 Object-Oriented Programming
373(18)
Immutable Objects and Polymorphic Functions
373(1)
Data Structures
374(2)
Example: Bayesian Linear Model Fitting
374(2)
Classes
376(3)
Polymorphic Functions
379(3)
Defining Your Own Polymorphic Functions
380(2)
Class Hierarchies
382(6)
Specialization As Interface
383(1)
Specialization in Implementations
384(4)
Exercises
388(3)
Shapes
388(1)
Polynomials
389(2)
Chapter 13 Building an R Package
391(18)
Creating an R Package
391(2)
Package Names
392(1)
The Structure of an R Package
392(1)
.Rbuildignore
393(1)
Description
393(6)
Title
394(1)
Version
394(1)
Description
395(1)
Author and Maintainer
395(1)
License
396(1)
Type, Date, Lazy Data
396(1)
URL and BugReports
396(1)
Dependencies
396(1)
Using an Imported Package
397(1)
Using a Suggested Package
398(1)
Namespace
399(1)
R/and man/
400(1)
Checking the Package
400(1)
Roxygen
401(4)
Documenting Functions
401(1)
Import and Export
402(2)
Package Scope vs. Global Scope
404(1)
Internal Functions
404(1)
File Load Order
404(1)
Adding Data to Your Package
405(2)
Null
406(1)
Building an R Package
407(1)
Exercises
407(2)
Chapter 14 Testing and Package Checking
409(10)
Unit Testing
409(3)
Automating Testing
411(1)
Using testthat
412(5)
Writing Good Tests
414(1)
Using Random Numbers in Tests
415(1)
Testing Random Results
416(1)
Checking a Package for Consistency
417(1)
Exercise
417(2)
Chapter 15 Version Control
419(22)
Version Control and Repositories
419(1)
Using Git in RStudio
420(14)
Installing Git
421(1)
Making Changes to Files, Staging Files, and Committing Changes
422(2)
Adding Git to an Existing Project
424(1)
Bare Repositories and Cloning Repositories
425(1)
Pushing Local Changes and Fetching and Pulling Remote Changes
426(2)
Handling Conflicts
428(1)
Working with Branches
429(3)
Typical Workflows Involve Lots of Branches
432(1)
Pushing Branches to the Global Repository
433(1)
GitHub
434(3)
Moving an Existing Repository to GitHub
436(1)
Installing Packages from GitHub
437(1)
Collaborating on GitHub
437(3)
Pull Requests
438(1)
Forking Repositories Instead of Cloning
438(2)
Exercises
440(1)
Chapter 16 Profiling and Optimizing
441(30)
Profiling
441(15)
A Graph-Flow Algorithm
442(14)
Speeding Up Your Code
456(5)
Parallel Execution
461(5)
Switching to C++
466(3)
Exercises
469(2)
Chapter 17 Project 2: Bayesian Linear Regression
471(30)
Bayesian Linear Regression
471(7)
Exercises: Priors and Posteriors
473(3)
Predicting Target Variables for New Predictor Values
476(2)
Formulas and Their Model Matrix
478(9)
Working with Model Matrices in R
480(5)
Exercises
485(1)
Model Matrices Without Response Variables
485(2)
Exercises
487(1)
Interface to a blm Class
487(10)
Constructor
488(1)
Updating Distributions: An Example Interface
489(5)
Designing Your blm Class
494(1)
Model Methods
494(3)
Building an R Package for blm
497(3)
Deciding on the Package Interface
497(1)
Organization of Source Files
498(1)
Document Your Package Interface Well
498(1)
Adding README and NEWS Files to Your Package
499(1)
Testing
500(1)
GitHub
500(1)
Conclusions
501(4)
Data Science
501(1)
Machine Learning
501(1)
Data Analysis
502(1)
R Programming
502(1)
The End
503(2)
Index 505
Thomas Mailund is an associate professor in bioinformatics at Aarhus University, Denmark. His background is in math and computer science but for the last decade his main focus has been on genetics and evolutionary studies, particularly comparative genomics, speciation, and gene flow between emerging species.