Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist 2nd ed. [Pehme köide]

3.50/5 (4 hinnangut Goodreads-ist)

Thomas Mailund

Formaat: Paperback / softback, 511 pages, kõrgus x laius: 254x178 mm, kaal: 1016 g, 100 Illustrations, black and white; XXVIII, 511 p. 100 illus., 1 Paperback / softback
Ilmumisaeg: 24-Jun-2022
Kirjastus: APress
ISBN-10: 1484281543
ISBN-13: 9781484281543

Teised raamatud teemal:

Pehme köide
Hind: 53,33 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 62,74 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 511 pages, kõrgus x laius: 254x178 mm, kaal: 1016 g, 100 Illustrations, black and white; XXVIII, 511 p. 100 illus., 1 Paperback / softback
Ilmumisaeg: 24-Jun-2022
Kirjastus: APress
ISBN-10: 1484281543
ISBN-13: 9781484281543

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9781484281543.html

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.

Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You’ll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this.

Modern data analysis requires computational skills and usually a minimum of programming. After reading and using this book, you'll have what you need to get started with R programming with data science applications. Source code will be available to support your next projects as well.

What You Will Learn

Perform data science and analytics using statistics and the R programming language
Visualize and explore data, including working with large data sets found in big data
Build an R package
Test and check your code
Practice version control
Profile and optimize your code

Who This Book Is For

Those with some data science or analytics background, but not necessarily experience with the R programming language.

About the Author

About the Technical Reviewer

xvii

Acknowledgments

xix

Introduction

xxi

Chapter 1 Introduction to R Programming

(50)

Basic Interaction with R

(2)

Using R As a Calculator

(10)

Simple Expressions

(2)

Assignments

(3)

Indexing Vectors

(2)

Vectorized Expressions

(2)

Comments

(1)

Functions

(13)

Getting Documentation for Functions

(2)

Writing Your Own Functions

(1)

Summarizing and Vector Functions

(3)

A Quick Look at Control Flow

(6)

Factors

(6)

Data Frames

(4)

Using R Packages

(1)

Dealing with Missing Values

(1)

Data Pipelines

(12)

Writing Pipelines of Function Calls

(2)

Writing Functions That Work with Pipelines

(1)

The Magical "." Argument

(5)

Other Pipeline Operations

(2)

Coding and Naming Conventions

(1)

Exercises

(1)

Mean of Positive Values

(1)

Root Mean Square Error

(1)

Chapter 2 Reproducible Analysis

(22)

Literate Programming and Integration of Workflow and Documentation

(1)

Creating an R Markdown/knitr Document in RStudio

(4)

The YAML Language

(2)

The Markdown Language

(7)

Formatting Text

(4)

Cross-Referencing

(1)

Bibliographies

(1)

Controlling the Output (Templates/Stylesheets)

(1)

Running R Code in Markdown Documents

(6)

Using chunks when analyzing data (without compiling documents)

(1)

Caching Results

(1)

Displaying Data

(1)

Exercises

(1)

Create an R Markdown Document

(1)

Different Output

(1)

Caching

(1)

Chapter 3 Data Manipulation

(48)

Data Already in R

(2)

Quickly Reviewing Data

(2)

Reading Data

(2)

Examples of Reading and Formatting Data Sets

(13)

Breast Cancer Data set

(8)

Boston Housing Data Set

(3)

The readr Package

(2)

Manipulating Data with dplyr

(26)

Some Useful dplyr Functions

(12)

Breast Cancer Data Manipulation

106

(4)

Tidying Data with tidyr

110

(8)

Exercises

118

(3)

Importing Data

118

(1)

Using dplyr

119

(1)

Using tidyr

119

(2)

Chapter 4 Visualizing Data

121

(40)

Basic Graphics

121

(7)

The Grammar of Graphics and the ggplot2 Package

128

(13)

Using qplot()

129

(4)

Using Geometries

133

(8)

Facets

141

(4)

Scaling

145

(11)

Themes and Other Graphics Transformations

151

(5)

Figures with Multiple Plots

156

(4)

Exercises

160

(1)

Chapter 5 Working with Large Data Sets

161

(18)

Subsample Your Data Before You Analyze the Full Data Set

162

(2)

Running Out of Memory During an Analysis

164

(2)

Too Large to Plot

166

(5)

Too Slow to Analyze

171

(2)

Too Large to Load

173

(4)

Exercises

177

(2)

Subsampling

177

(1)

Hex and 2D Density Plots

177

(2)

Chapter 6 Supervised Learning

179

(60)

Machine Learning

179

(1)

Supervised Learning

180

(3)

Regression vs. Classification

181

(1)

Inference vs. Prediction

182

(1)

Specifying Models

183

(21)

Linear Regression

183

(6)

Logistic Regression (Classification, Really)

189

(5)

Model Matrices and Formula

194

(10)

Validating Models

204

(14)

Evaluating Regression Models

206

(3)

Evaluating Classification Models

209

(1)

Confusion Matrix

210

(3)

Accuracy

213

(2)

Sensitivity and Specificity

215

(1)

Other Measures

216

(2)

More Than Two Classes

218

(1)

Sampling Approaches

218

(11)

Random Permutations of Your Data

219

(4)

Cross-Validation

223

(4)

Selecting Random Training and Testing Data

227

(2)

Examples of Supervised Learning Packages

229

(6)

Decision Trees

230

(2)

Random Forests

232

(1)

Neural Networks

233

(2)

Support Vector Machines

235

(1)

Naive Bayes

235

(1)

Exercises

236

(3)

Fitting Polynomials

236

(1)

Evaluating Different Classification Measures

236

(1)

Breast Cancer Classification

237

(1)

Leave-One-Out Cross-Validation (Slightly More Difficult)

237

(1)

Decision Trees

237

(1)

Random Forests

237

(1)

Neural Networks

238

(1)

Support Vector Machines

238

(1)

Compare Classification Algorithms

238

(1)

Chapter 7 Unsupervised Learning

239

(36)

Dimensionality Reduction

239

(16)

Principal Component Analysis

240

(10)

Multidimensional Scaling

250

(5)

Clustering

255

(12)

K-means Clustering

255

(8)

Hierarchical Clustering

263

(4)

Association Rules

267

(6)

Exercises

273

(2)

Dealing with Missing Data in the HouseVotes84 Data

273

(1)

K-means

274

(1)

Chapter 8 Project 1: Hitting the Bottle

275

(12)

Importing Data

275

(1)

Exploring the Data

276

(6)

Distribution of Quality Scores

276

(1)

Is This Wine Red or White?

277

(5)

Fitting Models

282

(3)

Exercises

285

(2)

Exploring Other Formulas

285

(1)

Exploring Different Models

285

(1)

Analyzing Your Own Data Set

285

(2)

Chapter 9 Deeper into R Programming

287

(42)

Expressions

287

(3)

Arithmetic Expressions

287

(2)

Boolean Expressions

289

(1)

Basic Data Types

290

(4)

Numeric

291

(1)

Integer

291

(1)

Complex

292

(1)

Logical

292

(1)

Character

293

(1)

Data Structures

294

(12)

Vectors

294

(2)

Matrix

296

(2)

Lists

298

(2)

Indexing

300

(4)

Named Values

304

(1)

Factors

305

(1)

Formulas

305

(1)

Control Structures

306

(5)

Selection Statements

306

(1)

Loops

307

(4)

Functions

311

(11)

Named Arguments

312

(1)

Default Parameters

313

(1)

Return Values

314

(1)

Lazy Evaluation

315

(2)

Scoping

317

(5)

Function Names Are Different from Variable Names

322

(1)

Recursive Functions

322

(3)

Exercises

325

(4)

Fibonacci Numbers

325

(1)

Outer Product

325

(1)

Linear Time Merge

325

(1)

Binary Search

326

(1)

More Sorting

326

(1)

Selecting the K Smallest Element

327

(2)

Chapter 10 Working with Vectors and Lists

329

(20)

Working with Vectors and Vectorizing Functions

329

(13)

Ifelse

332

(1)

Vectorizing Functions

332

(3)

The apply Family

335

(1)

Apply

336

(3)

Nothing Good, It Would Seem

339

(1)

Lapply

340

(2)

Sapply and vapply

342

(1)

Advanced Functions

342

(5)

Special Names

342

(1)

Infix Operators

343

(1)

Replacement Functions

344

(3)

How Mutable Is Data Anyway?

347

(1)

Exercises

348

(1)

Between

348

(1)

Rmq

348

(1)

Chapter 11 Functional Programming

349

(24)

Anonymous Functions

349

(2)

Higher-Order Functions

351

(6)

Functions Taking Functions As Arguments

351

(1)

Functions Returning Functions (and Closures)

352

(5)

Filter, Map, and Reduce

357

(3)

Functional Programming with purrr

360

(3)

Functions As Both Input and Output

363

(7)

Ellipsis Parameters

368

(2)

Exercises

370

(3)

Apply_if

370

(1)

Power

370

(1)

Row and Column Sums

370

(1)

Factorial Again

370

(1)

Function Composition

371

(1)

Implement This Operator

371

(2)

Chapter 12 Object-Oriented Programming

373

(18)

Immutable Objects and Polymorphic Functions

373

(1)

Data Structures

374

(2)

Example: Bayesian Linear Model Fitting

374

(2)

Classes

376

(3)

Polymorphic Functions

379

(3)

Defining Your Own Polymorphic Functions

380

(2)

Class Hierarchies

382

(6)

Specialization As Interface

383

(1)

Specialization in Implementations

384

(4)

Exercises

388

(3)

Shapes

388

(1)

Polynomials

389

(2)

Chapter 13 Building an R Package

391

(18)

Creating an R Package

391

(2)

Package Names

392

(1)

The Structure of an R Package

392

(1)

.Rbuildignore

393

(1)

Description

393

(6)

Title

394

(1)

Version

394

(1)

Description

395

(1)

Author and Maintainer

395

(1)

License

396

(1)

Type, Date, Lazy Data

396

(1)

URL and BugReports

396

(1)

Dependencies

396

(1)

Using an Imported Package

397

(1)

Using a Suggested Package

398

(1)

Namespace

399

(1)

R/and man/

400

(1)

Checking the Package

400

(1)

Roxygen

401

(4)

Documenting Functions

401

(1)

Import and Export

402

(2)

Package Scope vs. Global Scope

404

(1)

Internal Functions

404

(1)

File Load Order

404

(1)

Adding Data to Your Package

405

(2)

Null

406

(1)

Building an R Package

407

(1)

Exercises

407

(2)

Chapter 14 Testing and Package Checking

409

(10)

Unit Testing

409

(3)

Automating Testing

411

(1)

Using testthat

412

(5)

Writing Good Tests

414

(1)

Using Random Numbers in Tests

415

(1)

Testing Random Results

416

(1)

Checking a Package for Consistency

417

(1)

Exercise

417

(2)

Chapter 15 Version Control

419

(22)

Version Control and Repositories

419

(1)

Using Git in RStudio

420

(14)

Installing Git

421

(1)

Making Changes to Files, Staging Files, and Committing Changes

422

(2)

Adding Git to an Existing Project

424

(1)

Bare Repositories and Cloning Repositories

425

(1)

Pushing Local Changes and Fetching and Pulling Remote Changes

426

(2)

Handling Conflicts

428

(1)

Working with Branches

429

(3)

Typical Workflows Involve Lots of Branches

432

(1)

Pushing Branches to the Global Repository

433

(1)

GitHub

434

(3)

Moving an Existing Repository to GitHub

436

(1)

Installing Packages from GitHub

437

(1)

Collaborating on GitHub

437

(3)

Pull Requests

438

(1)

Forking Repositories Instead of Cloning

438

(2)

Exercises

440

(1)

Chapter 16 Profiling and Optimizing

441

(30)

Profiling

441

(15)

A Graph-Flow Algorithm

442

(14)

Speeding Up Your Code

456

(5)

Parallel Execution

461

(5)

Switching to C++

466

(3)

Exercises

469

(2)

Chapter 17 Project 2: Bayesian Linear Regression

471

(30)

Bayesian Linear Regression

471

(7)

Exercises: Priors and Posteriors

473

(3)

Predicting Target Variables for New Predictor Values

476

(2)

Formulas and Their Model Matrix

478

(9)

Working with Model Matrices in R

480

(5)

Exercises

485

(1)

Model Matrices Without Response Variables

485

(2)

Exercises

487

(1)

Interface to a blm Class

487

(10)

Constructor

488

(1)

Updating Distributions: An Example Interface

489

(5)

Designing Your blm Class

494

(1)

Model Methods

494

(3)

Building an R Package for blm

497

(3)

Deciding on the Package Interface

497

(1)

Organization of Source Files

498

(1)

Document Your Package Interface Well

498

(1)

Adding README and NEWS Files to Your Package

499

(1)

Testing

500

(1)

GitHub

500

(1)

Conclusions

501

(4)

Data Science

501

(1)

Machine Learning

501

(1)

Data Analysis

502

(1)

R Programming

502

(1)

The End

503

(2)

Index

505

Thomas Mailund is an associate professor in bioinformatics at Aarhus University, Denmark. His background is in math and computer science but for the last decade his main focus has been on genetics and evolutionary studies, particularly comparative genomics, speciation, and gene flow between emerging species.

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist 2nd ed. [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv