Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Algorithms for Data Science 1st ed. 2016 [Kõva köide]

4.00/5 (4 hinnangut Goodreads-ist)

Brian Steele, John Chandler, Swarna Reddy

Formaat: Hardback, 430 pages, kõrgus x laius: 235x155 mm, kaal: 8041 g, 30 Illustrations, color; 18 Illustrations, black and white; XXIII, 430 p. 48 illus., 30 illus. in color., 1 Hardback
Ilmumisaeg: 27-Dec-2016
Kirjastus: Springer International Publishing AG
ISBN-10: 3319457950
ISBN-13: 9783319457956

Teised raamatud teemal:

Kõva köide
Hind: 85,76 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 100,89 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 430 pages, kõrgus x laius: 235x155 mm, kaal: 8041 g, 30 Illustrations, color; 18 Illustrations, black and white; XXIII, 430 p. 48 illus., 30 illus. in color., 1 Hardback
Ilmumisaeg: 27-Dec-2016
Kirjastus: Springer International Publishing AG
ISBN-10: 3319457950
ISBN-13: 9783319457956

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9783319457956.html

Märksõnad:

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts: (a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable

algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter. (b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention"s Behavioral Risk Factor Surveillance System. (c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester cours

e in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.

Introduction.- Data Mapping and Data Dictionaries.- Scalable Algorithms and Associative Statistics.- Hadoop and MapReduce.- Data Visualization.- Linear Regression Methods.- Healthcare Analytics.- Cluster Analysis.- k-Nearest Neighbor Prediction Functions.- The Multinomial Naive Bayes Prediction Function.- Forecasting.- Real-time Analytics.

Arvustused

This 430-page book contains an excellent collection of information on the subject of practical algorithms used in data science. The discussion of each algorithm starts with some basic concepts, followed by a tutorial with real datasets and detailed code examples in Python or R. Each chapter has a set of exercise problems so readers can practice the concepts learned in the chapter. a good reference for practitioners, or a good textbook for graduate or upper-class undergraduate students. (Xiannong Meng, Computing Reviews, September, 2017)

This textbook on practical data analytics unites fundamental principles, algorithms, and data. this book is devoted to upper-division undergraduate and graduate students in mathematics, statistics, and computer science. It is intended for a one- or two-semester course in data analytics and reflects the authors research experience in data science concepts and the teaching skills in various areas. The text is eminently suitable for self-study and an exceptional resource for practitioners. (Krzysztof J. Szajowski, zbMATH 1367.62005, 2017)

1 Introduction

(18)

1.1 What Is Data Science?

(2)

1.2 Diabetes in America

(2)

1.3 Authors of the Federalist Papers

(1)

1.4 Forecasting NASDAQ Stock Prices

(2)

1.5 Remarks

(1)

1.6 The Book

(3)

1.7 Algorithms

(1)

1.8 Python

(1)

1.9 R

(1)

1.10 Terminology and Notation

(2)

1.10.1 Matrices and Vectors

(2)

1.11 Book Website

(3)

Part I Data Reduction

2 Data Mapping and Data Dictionaries

(32)

2.1 Data Reduction

(1)

2.2 Political Contributions

(2)

2.3 Dictionaries

(1)

2.4 Tutorial: Big Contributors

(5)

2.5 Data Reduction

(4)

2.5.1 Notation and Terminology

(1)

2.5.2 The Political Contributions Example

(1)

2.5.3 Mappings

(1)

2.6 Tutorial: Election Cycle Contributions

(7)

2.7 Similarity Measures

(5)

2.7.1 Computation

(2)

2.8 Tutorial: Computing Similarity

(4)

2.9 Concluding Remarks About Dictionaries

(1)

2.10 Exercises

(3)

2.10.1 Conceptual

(1)

2.10.2 Computational

(2)

3 Scalable Algorithms and Associative Statistics

(54)

3.1 Introduction

(2)

3.2 Example: Obesity in the United States

(1)

3.3 Associative Statistics

(1)

3.4 Univariate Observations

(5)

3.4.1 Histograms

(1)

3.4.2 Histogram Construction

(2)

3.5 Functions

(1)

3.6 Tutorial: Histogram Construction

(13)

3.6.1 Synopsis

(1)

3.7 Multivariate Data

(6)

3.7.1 Notation and Terminology

(1)

3.7.2 Estimators

(3)

3.7.3 The Augmented Moment Matrix

(1)

3.7.4 Synopsis

(1)

3.8 Tutorial: Computing the Correlation Matrix

(8)

3.8.1 Conclusion

(1)

3.9 Introduction to Linear Regression

(7)

3.9.1 The Linear Regression Model

(1)

3.9.2 The Estimator of β

(3)

3.9.3 Accuracy Assessment

(1)

3.9.4 Computing R2adjusted

(1)

3.10 Tutorial: Computing β

(7)

3.10.1 Conclusion

101

(1)

3.11 Exercises

102

(3)

3.11.1 Conceptual

102

(1)

3.11.2 Computational

103

(2)

4 Hadoop and MapReduce

105

(28)

4.1 Introduction

105

(1)

4.2 The Hadoop Ecosystem

106

(5)

4.2.1 The Hadoop Distributed File System

106

(2)

4.2.2 MapReduce

108

(1)

4.2.3 Mapping

108

(2)

4.2.4 Reduction

110

(1)

4.3 Developing a Hadoop Application

111

(1)

4.4 Medicare Payments

111

(2)

4.5 The Command Line Environment

113

(1)

4.6 Tutorial: Programming a MapReduce Algorithm

113

(11)

4.6.1 The Mapper

116

(4)

4.6.2 The Reducer

120

(3)

4.6.3 Synopsis

123

(1)

4.7 Tutorial: Using Amazon Web Services

124

(4)

4.7.1 Closing Remarks

128

(1)

4.8 Exercises

128

(5)

4.8.1 Conceptual

128

(1)

4.8.2 Computational

128

(5)

Part II Extracting Information from Data

5 Data Visualization

133

(28)

5.1 Introduction

133

(2)

5.2 Principles of Data Visualization

135

(3)

5.3 Making Good Choices

138

(10)

5.3.1 Univariate Data

139

(3)

5.3.2 Bivariate and Multivariate Data

142

(6)

5.4 Harnessing the Machine

148

(10)

5.4.1 Building Fig. 5.2

151

(1)

5.4.2 Building Fig. 5.3

152

(1)

5.4.3 Building Fig. 5.4

153

(1)

5.4.4 Building Fig. 5.5

154

(1)

5.4.5 Building Fig. 5.8

155

(1)

5.4.6 Building Fig. 5.10

156

(1)

5.4.7 Building Fig. 5.11

157

(1)

5.5 Exercises

158

(3)

6 Linear Regression Methods

161

(56)

6.1 Introduction

161

(1)

6.2 The Linear Regression Model

162

(14)

6.2.1 Example: Depression, Fatalism, and Simplicity

164

(2)

6.2.2 Least Squares

166

(2)

6.2.3 Confidence Intervals

168

(2)

6.2.4 Distributional Conditions

170

(1)

6.2.5 Hypothesis Testing

171

(4)

6.2.6 Cautionary Remarks

175

(1)

6.3 Introduction to R

176

(1)

6.4 Tutorial: R

177

(4)

6.4.1 Remark

181

(1)

6.5 Tutorial: Large Data Sets and R

181

(6)

6.6 Factors

187

(8)

6.6.1 Interaction

189

(3)

6.6.2 The Extra Sums-of-Squares F-test

192

(3)

6.7 Tutorial: Bike Share

195

(5)

6.7.1 An Incongruous Result

200

(1)

6.8 Analysis of Residuals

200

(8)

6.8.1 Linearity

201

(1)

6.8.2 Example: The Bike Share Problem

202

(2)

6.8.3 Independence

204

(4)

6.9 Tutorial: Residual Analysis

208

(3)

6.9.1 Final Remarks

210

(1)

6.10 Exercises

211

(6)

6.10.1 Conceptual

211

(1)

6.10.2 Computational

212

(5)

7 Healthcare Analytics

217

(36)

7.1 Introduction

217

(2)

7.2 The Behavioral Risk Factor Surveillance System

219

(3)

7.2.1 Estimation of Prevalence

220

(1)

7.2.2 Estimation of Incidence

221

(1)

7.3 Tutorial: Diabetes Prevalence and Incidence

222

(9)

7.4 Predicting At-Risk Individuals

231

(5)

7.4.1 Sensitivity and Specificity

234

(2)

7.5 Tutorial: Identifying At-Risk Individuals

236

(7)

7.6 Unusual Demographic Attribute Vectors

243

(2)

7.7 Tutorial: Building Neighborhood Sets

245

(4)

7.7.1 Synopsis

247

(2)

7.8 Exercises

249

(4)

7.8.1 Conceptual

249

(1)

7.8.2 Computational

250

(3)

8 Cluster Analysis

253

(26)

8.1 Introduction

253

(1)

8.2 Hierarchical Agglomerative Clustering

254

(1)

8.3 Comparison of States

255

(3)

8.4 Tutorial: Hierarchical Clustering of States

258

(8)

8.4.1 Synopsis

264

(2)

8.5 The k-Means Algorithm

266

(2)

8.6 Tutorial: The k-Means Algorithm

268

(6)

8.6.1 Synopsis

273

(1)

8.7 Exercises

274

(5)

8.7.1 Conceptual

274

(1)

8.7.2 Computational

274

(5)

Part III Predictive Analytics

9 k-Nearest Neighbor Prediction Functions

279

(34)

9.1 Introduction

279

(3)

9.1.1 The Prediction Task

280

(2)

9.2 Notation and Terminology

282

(1)

9.3 Distance Metrics

283

(1)

9.4 The k-Nearest Neighbor Prediction Function

284

(2)

9.5 Exponentially Weighted k-Nearest Neighbors

286

(1)

9.6 Tutorial: Digit Recognition

287

(8)

9.6.1 Remarks

294

(1)

9.7 Accuracy Assessment

295

(3)

9.7.1 Confusion Matrices

297

(1)

9.8 A;-Nearest Neighbor Regression

298

(1)

9.9 Forecasting the S&P 500

299

(1)

9.10 Tutorial: Forecasting by Pattern Recognition

300

(8)

9.10.1 Remark

307

(1)

9.11 Cross-Validation

308

(2)

9.12 Exercises

310

(3)

9.12.1 Conceptual

310

(1)

9.12.2 Computational

310

(3)

10 The Multinomial Naive Bayes Prediction Function

313

(30)

10.1 Introduction

313

(1)

10.2 The Federalist Papers

314

(1)

10.3 The Multinomial Naive Bayes Prediction Function

315

(4)

10.3.1 Posterior Probabilities

317

(2)

10.4 Tutorial: Reducing the Federalist Papers

319

(6)

10.4.1 Summary

325

(1)

10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers

325

(4)

10.5.1 Remark

329

(1)

10.6 Tutorial: Customer Segmentation

329

(9)

10.6.1 Additive Smoothing

330

(2)

10.6.2 The Data

332

(5)

10.6.3 Remarks

337

(1)

10.7 Exercises

338

(5)

10.7.1 Conceptual

338

(1)

10.7.2 Computational

339

(4)

11 Forecasting

343

(38)

11.1 Introduction

343

(2)

11.2 Tutorial: Working with Time

345

(5)

11.3 Analytical Methods

350

(4)

11.3.1 Notation

350

(1)

11.3.2 Estimation of the Mean and Variance

350

(2)

11.3.3 Exponential Forecasting

352

(1)

11.3.4 Autocorrelation

353

(1)

11.4 Tutorial: Computing ρτ

354

(5)

11.4.1 Remarks

359

(1)

11.5 Drift and Forecasting

359

(1)

11.6 Holt-Winters Exponential Forecasting

360

(3)

11.6.1 Forecasting Error

362

(1)

11.7 Tutorial: Holt-Winters Forecasting

363

(4)

11.8 Regression-Based Forecasting of Stock Prices

367

(1)

11.9 Tutorial: Regression-Based Forecasting

368

(6)

11.9.1 Remarks

373

(1)

11.10 Time-Varying Regression Estimators

374

(1)

11.11 Tutorial: Time-Varying Regression Estimators

375

(2)

11.11.1 Remarks

377

(1)

11.12 Exercises

377

(4)

11.12.1 Conceptual

377

(1)

11.12.2 Computational

378

(3)

12 Real-time Analytics

381

(22)

12.1 Introduction

381

(1)

12.2 Forecasting with a NASDAQ Quotation Stream

382

(2)

12.2.1 Forecasting Algorithms

383

(1)

12.3 Tutorial: Forecasting the Apple Inc. Stream

384

(6)

12.3.1 Remarks

389

(1)

12.4 The Twitter Streaming API

390

(1)

12.5 Tutorial: Tapping the Twitter Stream

391

(5)

12.5.1 Remarks

395

(1)

12.6 Sentiment Analysis

396

(2)

12.7 Tutorial: Sentiment Analysis of Hashtag Groups

398

(2)

12.8 Exercises

400

(3)

A Solutions to Exercises

403

(14)

B Accessing the Twitter API

417

(2)

References

419

(4)

Index

423

Brian Steele is a full professor of Mathematics at the University of Montana and a Senior Data Scientist for SoftMath Consultants, LLC. Dr. Steele has published on the EM algorithm, exact bagging, the bootstrap, and numerous statistical applications. He teaches data analytics and statistics and consults on a wide variety of subjects related to data science and statistics. John Chandler has worked at the forefront of marketing and data analysis since 1999. He has worked with Fortune 100 advertisers and scores of agencies, measuring the effectiveness of advertising and improving performance. Dr. Chandler joined the faculty at the University of Montana School of Business Administration as a Clinical Professor of Marketing in 2015 and teaches classes in advanced marketing analytics and data science. He is one of the founders and Chief Data Scientist for Ars Quanta, a Seattle-based data science consultancy. Dr. Swarna Reddy is the founder, CEO, and a Senior Data Scientist for SoftMath Consultants, LLC and serves as a faculty affiliate with the Department of Mathematical Sciences at the University of Montana. Her area of expertise is computational mathematics and operations research. She is a published researcher and has developed computational solutions across a wide variety of areas spanning bioinformatics, cybersecurity, and business analytics.

Algorithms for Data Science 1st ed. 2016 [Kõva köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv