Klienditugi: 7440010 (E-R 10-18)

Modern Data Science with R 2nd edition [Kõva köide]

3.00/5 (2 hinnangut Goodreads-ist)

Nicholas J. Horton (Amherst College, Amherst, MA), Daniel T. Kaplan (Smith College, Northhampton, MA), Benjamin S. Baumer (Smith College, Northhampton, MA)

Formaat: Hardback, 632 pages, kõrgus x laius: 254x178 mm, kaal: 1300 g
Sari: Chapman & Hall/CRC Texts in Statistical Science
Ilmumisaeg: 14-Apr-2021
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367191490
ISBN-13: 9780367191498

Teised raamatud teemal:

Probability & statistics - (Hetkel poes: 2 nimetust)
Automatic control engineering
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Economic statistics
Computer science - (Hetkel poes: 7 nimetust)
Science: general issues - (Hetkel poes: 21 nimetust)

Kõva köide
Hind: 124,29 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 632 pages, kõrgus x laius: 254x178 mm, kaal: 1300 g
Sari: Chapman & Hall/CRC Texts in Statistical Science
Ilmumisaeg: 14-Apr-2021
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367191490
ISBN-13: 9780367191498

Teised raamatud teemal:

Probability & statistics - (Hetkel poes: 2 nimetust)
Automatic control engineering
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Economic statistics
Computer science - (Hetkel poes: 7 nimetust)
Science: general issues - (Hetkel poes: 21 nimetust)

Püsilink: https://www.kriso.ee/db/9780367191498.html

Märksõnad:

"Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions. The second edition is updatedto reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice. From a review of the first edition: "Modern Data Science with R ... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician)"--

From a review of the first edition: "Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).

Modern Data Science with R

is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.

The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.

Arvustused

"This text continues to be fantastic! There are a number of courses for which I would require this book and others that I would recommend it as a supplement. I would likely require it for courses focused on computing in R or courses in data science. I would include it as a recommended text in introductory and other statistics courses that used R as the software of choice, where this text could be used as a supplemental resource in how to use R to work with data." (Hunter Glanz Cal Poly San Luis Obispo)

"Easy for students to read and relate to the exercises and examples. Many questions and hands-on activities with data sets to practice skills." (Lynn Collen, St. Cloud Stat Univ.)

"I used the first edition of this book as the primary text for an intermediate data science course a few years ago and I liked it very muchI think that the technical breadth, writing style, and level of difficulty are very clear strengths. Also, my students and I found the `tidyverse` approach to be particularly well-suited for teaching and learning Rand I love that the MDSR book includes such complete code. Students can program everything they see in the book, and often times there are tips & tricks for them to discover along the way just by studying expert code provided by the authors. This really sets MDSR apart from other books I considered for the course." (Matthew Beckman, Penn State University) "[ ...] To answer a wide range of modern research questions, this book by Baumer, Kaplan, and Horton features an excellent introduction to data wrangling, visualization, statistical modeling, machine learning, and other advanced statistical applications through the RStudio environment following the tidyverse syntax. [ ...] Overall, Modern Data Science with R, 2nd edition serves as an excellent introductory resource to help develop techniques to extract, transform, visualize, and learn from datasets through the R environment. It focuses on implementing those techniques in R and does not provide a theoretical background for the discussed methods. The book will be a perfect reference for a broad audience ranging from undergraduates in data science courses to advanced graduate students and professionals from a variety of research fields." -Kohma Arai and Vyacheslav Lyubchich, in Technometrics, July 2022

"Overall, I enjoyed reading this book. The authors were very good at creating a complete tool for studying data science. Therefore, I recommend this book, for its content, writing, and organization, to graduate students in data science and statistics. I also recommend the book to professionals who should prepare themselves for the challenges they are going to face in the future with the voluminous and heterogenous amount of data that should be timely analyzed to extract meaningful information to guide action." -Georgios Nikolopoulos, in ISCB News, June 2022

"The authors have successfully completed the job of choosing the content with relevant topics and, deciding the extent of knowledge to be delivered, and finally, putting them in an understandable sequence. This is a well-written book and does not cover much theory. .. The books second edition contents are updated, expanded, revised, split, rewritten and rearranged compared to the first edition. The key changes are the use of recently developed R packages, .... (and) updated exercises in the chapters ..." -Shalabh,in Journal of the Royal Statistical Society Series A, August 2021

"[ This book] provides an excellent basis for statisticians who want to dig deeper into, for example, data handling, for computer scientists who aim to strengthen their knowledge of statistical methods as well as for all other researchers who are interested in data science in general. ... Each section is structured as an interplay between R-code and explanatory text for understanding. The division into several stand-alone segments is an advantage, because the reader may easily choose the section she or he is interested in without missing relevant information. A key feature of the book is its focus on different example data sets that are available via R-packages or from URLs that are embedded in the text. These data sets are used to illustrate the methodology presented using R-code. Their availability allows the reader to reproduce the code while working with the book. ... It can be warmly recommended to practical researchers who seek a comprehensive overview of different topics in data science with focus on implementations in R." -Annika Hoyer, in Biometrical Journal, August 2021

"This text continues to be fantastic! There are a number of courses for which I would require this book and others that I would recommend it as a supplement. I would likely require it for courses focused on computing in R or courses in data science. I would include it as a recommended text in introductory and other statistics courses that used R as the software of choice, where this text could be used as a supplemental resource in how to use R to work with data." -Hunter Glanz, Cal Poly San Luis Obispo

"Easy for students to read and relate to the exercises and examples. Many questions and hands-on activities with data sets to practice skills." -Lynn Collen, St. Cloud Stat University

"I used the first edition of this book as the primary text for an intermediate data science course a few years ago and I liked it very muchI think that the technical breadth, writing style, and level of difficulty are very clear strengths. Also, my students and I found the `tidyverse` approach to be particularly well-suited for teaching and learning Rand I love that the MDSR book includes such complete code. Students can program everything they see in the book, and often times there are tips & tricks for them to discover along the way just by studying expert code provided by the authors. This really sets MDSR apart from other books I considered for the course." -Matthew Beckman, Penn State University

"The authors have covered almost all aspects of data science, a revolutionary field that marries elements of computational thinking and traditional statistical theory. The book can thus equip the readers with the necessary knowledge and skills to extract data from a variety of sources, restructure observations in a form that allows analysis, store data in efficient databases, and work effectively on massive and complex data sets in order to produce actionable information." - Georgios Nikolopoulos, University of Cyprus, ISCB Book Reviews, June 2022.

About the Authors

Preface

xiii

I Part I Introduction to Data Science

(180)

1 Prologue: Why data science?

(6)

1.1 What is data science?

(2)

1.2 Case study: The evolution of sabermetrics

(1)

1.3 Datasets

(1)

1.4 Further resources

(1)

2 Data visualization

(26)

2.1 The 2012 federal election cycle

(7)

2.2 Composing data graphics

(8)

2.3 Importance of data graphics: Challenger

(4)

2.4 Creating effective presentations

(1)

2.5 The wider world of data visualization

(2)

2.6 Further resources

(1)

2.7 Exercises

(1)

2.8 Supplementary exercises

(2)

3 A grammar for graphics

(32)

3.1 A grammar for data graphics

(8)

3.2 Canonical data graphics in R

(10)

3.3 Extended example: Historical baby names

(9)

3.4 Further resources

(1)

3.5 Exercises

(3)

3.6 Supplementary exercises

(2)

4 Data wrangling on one table

(22)

4.1 A grammar for data wrangling

(9)

4.2 Extended example: Ben's time with the Mets

(8)

4.3 Further resources

(1)

4.4 Exercises

(4)

4.5 Supplementary exercises

(1)

5 Data wrangling on multiple tables

(14)

5.1 Inner_Join()

(2)

5.2 Left_Join()

(1)

5.3 Extended example: Manny Ramirez

(7)

5.4 Further resources

(1)

5.5 Exercises

(2)

5.6 Supplementary exercises

101

(2)

6 Tidy data

103

(36)

6.1 Tidy data

103

(9)

6.2 Reshaping data

112

(8)

6.3 Naming conventions

120

(1)

6.4 Data intake

121

(14)

6.5 Further resources

135

(1)

6.6 Exercises

135

(3)

6.7 Supplementary exercises

138

(1)

7 Iteration

139

(20)

7.1 Vectorized operations

139

(3)

7.2 Using across() with dplyr functions

142

(1)

7.3 The map() family of functions

143

(1)

7.4 Iterating over a one-dimensional vector

144

(2)

7.5 Iteration over subgroups

146

(5)

7.6 Simulation

151

(2)

7.7 Extended example: Factors associated with BMI

153

(2)

7.8 Further resources

155

(2)

7.9 Exercises

157

(1)

7.10 Supplementary exercises

157

(2)

8 Data science ethics

159

(22)

8.1 Introduction

159

(1)

8.2 Truthful falsehoods

160

(1)

8.3 Role of data science in society

161

(2)

8.4 Some settings for professional ethics

163

(4)

8.5 Some principles to guide ethical action

167

(4)

8.6 Algorithmic bias

171

(1)

8.7 Data and disclosure

172

(2)

8.8 Reproducibility

174

(1)

8.9 Ethics, collectively

175

(1)

8.10 Professional guidelines for ethical conduct

176

(1)

8.11 Further resources

176

(1)

8.12 Exercises

177

(2)

8.13 Supplementary exercises

179

(2)

II Part II Statistics and Modeling

181

(118)

9 Statistical foundations

183

(24)

9.1 Samples and populations

183

(3)

9.2 Sample statistics

186

(4)

9.3 The bootstrap

190

(4)

9.4 Outliers

194

(2)

9.5 Statistical models: Explaining variation

196

(3)

9.6 Confounding and accounting for other factors

199

(3)

9.7 The perils of p-values

202

(2)

9.8 Further resources

204

(1)

9.9 Exercises

205

(1)

9.10 Supplementary exercises

206

(1)

10 Predictive modeling

207

(22)

10.1 Predictive modeling

208

(1)

10.2 Simple classification models

209

(7)

10.3 Evaluating models

216

(7)

10.4 Extended example: Who has diabetes?

223

(4)

10.5 Further resources

227

(1)

10.6 Exercises

227

(1)

10.7 Supplementary exercises

228

(1)

11 Supervised learning

229

(34)

11.1 Non-regression classifiers

229

(16)

11.2 Parameter tuning

245

(1)

11.3 Example: Evaluation of income models redux

246

(4)

11.4 Extended example: Who has diabetes this time?

250

(5)

11.5 Regularization

255

(3)

11.6 Further resources

258

(1)

11.7 Exercises

259

(2)

11.8 Supplementary exercises

261

(2)

12 Unsupervised learning

263

(18)

12.1 Clustering

263

(7)

12.2 Dimension reduction

270

(8)

12.3 Further resources

278

(1)

12.4 Exercises

278

(1)

12.5 Supplementary exercises

279

(2)

13 Simulation

281

(18)

13.1 Reasoning in reverse

281

(1)

13.2 Extended example: Grouping cancers

282

(3)

13.3 Randomizing functions

285

(1)

13.4 Simulating variability

286

(7)

13.5 Random networks

293

(1)

13.6 Key principles of simulation

293

(3)

13.7 Further resources

296

(1)

13.8 Exercises

296

(2)

13.9 Supplementary exercises

298

(1)

III Part III Topics in Data Science

299

(192)

14 Dynamic and customized data graphics

301

(24)

14.1 Rich Web content using D3.js and htmlwidgets

301

(5)

14.2 Animation

306

(1)

14.3 Flexdashboard

306

(2)

14.4 Interactive web apps with Shiny

308

(5)

14.5 Customization of ggplot2 graphics

313

(4)

14.6 Extended example: Hot dog eating

317

(5)

14.7 Further resources

322

(1)

14.8 Exercises

322

(2)

14.9 Supplementary exercises

324

(1)

15 Database querying using SQL

325

(38)

15.1 From dplyr to SQL

325

(4)

15.2 Flat-file databases

329

(2)

15.3 The SQL universe

331

(1)

15.4 The SQL data manipulation language

332

(20)

15.5 Extended example: FiveThirtyEight flights

352

(8)

15.6 SQL vs. R

360

(1)

15.7 Further resources

360

(1)

15.8 Exercises

360

(2)

15.9 Supplementary exercises

362

(1)

16 Database administration

363

(14)

16.1 Constructing efficient SQL databases

363

(6)

16.2 Changing SQL data

369

(2)

16.3 Extended example: Building a database

371

(4)

16.4 Scalability

375

(1)

16.5 Further resources

375

(1)

16.6 Exercises

375

(1)

16.7 Supplementary exercises

376

(1)

17 Working with geospatial data

377

(30)

17.1 Motivation: What's so great about geospatial data?

377

(3)

17.2 Spatial data structures

380

(2)

17.3 Making maps

382

(9)

17.4 Extended example: Congressional districts

391

(8)

17.5 Effective maps: How (not) to lie

399

(2)

17.6 Projecting polygons

401

(1)

17.7 Playing well with others

402

(1)

17.8 Further resources

403

(1)

17.9 Exercises

404

(1)

17.10 Supplementary exercises

405

(2)

18 Geospatial computations

407

(18)

18.1 Geospatial operations

407

(9)

18.2 Geospatial aggregation

416

(2)

18.3 Geospatial joins

418

(1)

18.4 Extended example: Trail elevations at MacLeish

419

(4)

18.5 Further resources

423

(1)

18.6 Exercises

423

(1)

18.7 Supplementary exercises

424

(1)

19 Text as data

425

(26)

19.1 Regular expressions using Macbeth

425

(6)

19.2 Extended example: Analyzing textual data from arXiv.org

431

(14)

19.3 Ingesting text

445

(3)

19.4 Further resources

448

(1)

19.5 Exercises

448

(2)

19.6 Supplementary exercises

450

(1)

20 Network science

451

(26)

20.1 Introduction to network science

451

(5)

20.2 Extended example: Six degrees of Kristen Stewart

456

(9)

20.3 PageRank

465

(2)

20.4 Extended example: 1996 men's college basketball

467

(7)

20.5 Further resources

474

(1)

20.6 Exercises

475

(1)

20.7 Supplementary exercises

475

(2)

21 Epilogue: Towards "big data"

477

(14)

21.1 Notions of big data

477

(2)

21.2 Tools for bigger data

479

(10)

21.3 Alternatives to R

489

(1)

21.4 Closing thoughts

489

(1)

21.5 Further resources

490

(1)

IV Part IV Appendices

491

(82)

A Packages used in this book

493

(6)

A.1 The mdsr package

493

(1)

A.2 Other packages

493

(5)

A.3 Further resources

498

(1)

B Introduction to R and RStudio

499

(20)

B.1 Installation

499

(1)

B.2 Learning R

500

(1)

B.3 Fundamental structures and objects

501

(7)

B.4 Add-ons: Packages

508

(6)

B.5 Further resources

514

(1)

B.6 Exercises

515

(2)

B.7 Supplementary exercises

517

(2)

C Algorithmic thinking

519

(12)

C.1 Introduction

519

(1)

C.2 Simple example

519

(3)

C.3 Extended example: Law of large numbers

522

(3)

C.4 Non-standard evaluation

525

(2)

C.5 Debugging and defensive coding

527

(2)

C.6 Further resources

529

(1)

C.7 Exercises

529

(1)

C.8 Supplementary exercises

530

(1)

D Reproducible analysis and workflow

531

(10)

D.1 Scriptable statistical computing

532

(1)

D.2 Reproducible analysis with R Markdown

532

(3)

D.3 Projects and version control

535

(2)

D.4 Further resources

537

(1)

D.5 Exercises

537

(3)

D.6 Supplementary exercises

540

(1)

E Regression modeling

541

(22)

E.1 Simple linear regression

541

(5)

E.2 Multiple regression

546

(6)

E.3 Inference for regression

552

(1)

E.4 Assumptions underlying regression

553

(3)

E.5 Logistic regression

556

(3)

E.6 Further resources

559

(2)

E.7 Exercises

561

(1)

E.8 Supplementary exercises

562

(1)

F Setting up a database server

563

(10)

F.1 SQLite

563

(1)

F.2 MySQL

564

(3)

F.3 PostgreSQL

567

(1)

F.4 Connecting to SQL

568

(5)

Bibliography

573

(16)

Indices

589

(1)

Subject index

590

(32)

R index

622

Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research.

Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award.

Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".

Modern Data Science with R 2nd edition [Kõva köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv