Muutke küpsiste eelistusi

Algorithms for Data Science 1st ed. 2016 [Kõva köide]

  • Formaat: Hardback, 430 pages, kõrgus x laius: 235x155 mm, kaal: 8041 g, 30 Illustrations, color; 18 Illustrations, black and white; XXIII, 430 p. 48 illus., 30 illus. in color., 1 Hardback
  • Ilmumisaeg: 27-Dec-2016
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319457950
  • ISBN-13: 9783319457956
  • Kõva köide
  • Hind: 85,76 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 100,89 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 430 pages, kõrgus x laius: 235x155 mm, kaal: 8041 g, 30 Illustrations, color; 18 Illustrations, black and white; XXIII, 430 p. 48 illus., 30 illus. in color., 1 Hardback
  • Ilmumisaeg: 27-Dec-2016
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319457950
  • ISBN-13: 9783319457956
This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.  This book has three parts: (a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable

algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter. (b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention"s Behavioral Risk Factor Surveillance System. (c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester cours

e in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners. 

Introduction.- Data Mapping and Data Dictionaries.- Scalable Algorithms and Associative Statistics.- Hadoop and MapReduce.- Data Visualization.- Linear Regression Methods.- Healthcare Analytics.- Cluster Analysis.- k-Nearest Neighbor Prediction Functions.- The Multinomial Naive Bayes Prediction Function.- Forecasting.- Real-time Analytics.

Arvustused

This 430-page book contains an excellent collection of information on the subject of practical algorithms used in data science. The discussion of each algorithm starts with some basic concepts, followed by a tutorial with real datasets and detailed code examples in Python or R. Each chapter has a set of exercise problems so readers can practice the concepts learned in the chapter. a good reference for practitioners, or a good textbook for graduate or upper-class undergraduate students. (Xiannong Meng, Computing Reviews, September, 2017)

This textbook on practical data analytics unites fundamental principles, algorithms, and data. this book is devoted to upper-division undergraduate and graduate students in mathematics, statistics, and computer science. It is intended for a one- or two-semester course in data analytics and reflects the authors research experience in data science concepts and the teaching skills in various areas. The text is eminently suitable for self-study and an exceptional resource for practitioners. (Krzysztof J. Szajowski, zbMATH 1367.62005, 2017)  

1 Introduction
1(18)
1.1 What Is Data Science?
1(2)
1.2 Diabetes in America
3(2)
1.3 Authors of the Federalist Papers
5(1)
1.4 Forecasting NASDAQ Stock Prices
6(2)
1.5 Remarks
8(1)
1.6 The Book
8(3)
1.7 Algorithms
11(1)
1.8 Python
12(1)
1.9 R
13(1)
1.10 Terminology and Notation
14(2)
1.10.1 Matrices and Vectors
14(2)
1.11 Book Website
16(3)
Part I Data Reduction
2 Data Mapping and Data Dictionaries
19(32)
2.1 Data Reduction
19(1)
2.2 Political Contributions
20(2)
2.3 Dictionaries
22(1)
2.4 Tutorial: Big Contributors
22(5)
2.5 Data Reduction
27(4)
2.5.1 Notation and Terminology
28(1)
2.5.2 The Political Contributions Example
29(1)
2.5.3 Mappings
30(1)
2.6 Tutorial: Election Cycle Contributions
31(7)
2.7 Similarity Measures
38(5)
2.7.1 Computation
41(2)
2.8 Tutorial: Computing Similarity
43(4)
2.9 Concluding Remarks About Dictionaries
47(1)
2.10 Exercises
48(3)
2.10.1 Conceptual
48(1)
2.10.2 Computational
49(2)
3 Scalable Algorithms and Associative Statistics
51(54)
3.1 Introduction
51(2)
3.2 Example: Obesity in the United States
53(1)
3.3 Associative Statistics
54(1)
3.4 Univariate Observations
55(5)
3.4.1 Histograms
57(1)
3.4.2 Histogram Construction
58(2)
3.5 Functions
60(1)
3.6 Tutorial: Histogram Construction
61(13)
3.6.1 Synopsis
74(1)
3.7 Multivariate Data
74(6)
3.7.1 Notation and Terminology
75(1)
3.7.2 Estimators
76(3)
3.7.3 The Augmented Moment Matrix
79(1)
3.7.4 Synopsis
80(1)
3.8 Tutorial: Computing the Correlation Matrix
80(8)
3.8.1 Conclusion
87(1)
3.9 Introduction to Linear Regression
88(7)
3.9.1 The Linear Regression Model
89(1)
3.9.2 The Estimator of β
90(3)
3.9.3 Accuracy Assessment
93(1)
3.9.4 Computing R2adjusted
94(1)
3.10 Tutorial: Computing β
95(7)
3.10.1 Conclusion
101(1)
3.11 Exercises
102(3)
3.11.1 Conceptual
102(1)
3.11.2 Computational
103(2)
4 Hadoop and MapReduce
105(28)
4.1 Introduction
105(1)
4.2 The Hadoop Ecosystem
106(5)
4.2.1 The Hadoop Distributed File System
106(2)
4.2.2 MapReduce
108(1)
4.2.3 Mapping
108(2)
4.2.4 Reduction
110(1)
4.3 Developing a Hadoop Application
111(1)
4.4 Medicare Payments
111(2)
4.5 The Command Line Environment
113(1)
4.6 Tutorial: Programming a MapReduce Algorithm
113(11)
4.6.1 The Mapper
116(4)
4.6.2 The Reducer
120(3)
4.6.3 Synopsis
123(1)
4.7 Tutorial: Using Amazon Web Services
124(4)
4.7.1 Closing Remarks
128(1)
4.8 Exercises
128(5)
4.8.1 Conceptual
128(1)
4.8.2 Computational
128(5)
Part II Extracting Information from Data
5 Data Visualization
133(28)
5.1 Introduction
133(2)
5.2 Principles of Data Visualization
135(3)
5.3 Making Good Choices
138(10)
5.3.1 Univariate Data
139(3)
5.3.2 Bivariate and Multivariate Data
142(6)
5.4 Harnessing the Machine
148(10)
5.4.1 Building Fig. 5.2
151(1)
5.4.2 Building Fig. 5.3
152(1)
5.4.3 Building Fig. 5.4
153(1)
5.4.4 Building Fig. 5.5
154(1)
5.4.5 Building Fig. 5.8
155(1)
5.4.6 Building Fig. 5.10
156(1)
5.4.7 Building Fig. 5.11
157(1)
5.5 Exercises
158(3)
6 Linear Regression Methods
161(56)
6.1 Introduction
161(1)
6.2 The Linear Regression Model
162(14)
6.2.1 Example: Depression, Fatalism, and Simplicity
164(2)
6.2.2 Least Squares
166(2)
6.2.3 Confidence Intervals
168(2)
6.2.4 Distributional Conditions
170(1)
6.2.5 Hypothesis Testing
171(4)
6.2.6 Cautionary Remarks
175(1)
6.3 Introduction to R
176(1)
6.4 Tutorial: R
177(4)
6.4.1 Remark
181(1)
6.5 Tutorial: Large Data Sets and R
181(6)
6.6 Factors
187(8)
6.6.1 Interaction
189(3)
6.6.2 The Extra Sums-of-Squares F-test
192(3)
6.7 Tutorial: Bike Share
195(5)
6.7.1 An Incongruous Result
200(1)
6.8 Analysis of Residuals
200(8)
6.8.1 Linearity
201(1)
6.8.2 Example: The Bike Share Problem
202(2)
6.8.3 Independence
204(4)
6.9 Tutorial: Residual Analysis
208(3)
6.9.1 Final Remarks
210(1)
6.10 Exercises
211(6)
6.10.1 Conceptual
211(1)
6.10.2 Computational
212(5)
7 Healthcare Analytics
217(36)
7.1 Introduction
217(2)
7.2 The Behavioral Risk Factor Surveillance System
219(3)
7.2.1 Estimation of Prevalence
220(1)
7.2.2 Estimation of Incidence
221(1)
7.3 Tutorial: Diabetes Prevalence and Incidence
222(9)
7.4 Predicting At-Risk Individuals
231(5)
7.4.1 Sensitivity and Specificity
234(2)
7.5 Tutorial: Identifying At-Risk Individuals
236(7)
7.6 Unusual Demographic Attribute Vectors
243(2)
7.7 Tutorial: Building Neighborhood Sets
245(4)
7.7.1 Synopsis
247(2)
7.8 Exercises
249(4)
7.8.1 Conceptual
249(1)
7.8.2 Computational
250(3)
8 Cluster Analysis
253(26)
8.1 Introduction
253(1)
8.2 Hierarchical Agglomerative Clustering
254(1)
8.3 Comparison of States
255(3)
8.4 Tutorial: Hierarchical Clustering of States
258(8)
8.4.1 Synopsis
264(2)
8.5 The k-Means Algorithm
266(2)
8.6 Tutorial: The k-Means Algorithm
268(6)
8.6.1 Synopsis
273(1)
8.7 Exercises
274(5)
8.7.1 Conceptual
274(1)
8.7.2 Computational
274(5)
Part III Predictive Analytics
9 k-Nearest Neighbor Prediction Functions
279(34)
9.1 Introduction
279(3)
9.1.1 The Prediction Task
280(2)
9.2 Notation and Terminology
282(1)
9.3 Distance Metrics
283(1)
9.4 The k-Nearest Neighbor Prediction Function
284(2)
9.5 Exponentially Weighted k-Nearest Neighbors
286(1)
9.6 Tutorial: Digit Recognition
287(8)
9.6.1 Remarks
294(1)
9.7 Accuracy Assessment
295(3)
9.7.1 Confusion Matrices
297(1)
9.8 A;-Nearest Neighbor Regression
298(1)
9.9 Forecasting the S&P 500
299(1)
9.10 Tutorial: Forecasting by Pattern Recognition
300(8)
9.10.1 Remark
307(1)
9.11 Cross-Validation
308(2)
9.12 Exercises
310(3)
9.12.1 Conceptual
310(1)
9.12.2 Computational
310(3)
10 The Multinomial Naive Bayes Prediction Function
313(30)
10.1 Introduction
313(1)
10.2 The Federalist Papers
314(1)
10.3 The Multinomial Naive Bayes Prediction Function
315(4)
10.3.1 Posterior Probabilities
317(2)
10.4 Tutorial: Reducing the Federalist Papers
319(6)
10.4.1 Summary
325(1)
10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers
325(4)
10.5.1 Remark
329(1)
10.6 Tutorial: Customer Segmentation
329(9)
10.6.1 Additive Smoothing
330(2)
10.6.2 The Data
332(5)
10.6.3 Remarks
337(1)
10.7 Exercises
338(5)
10.7.1 Conceptual
338(1)
10.7.2 Computational
339(4)
11 Forecasting
343(38)
11.1 Introduction
343(2)
11.2 Tutorial: Working with Time
345(5)
11.3 Analytical Methods
350(4)
11.3.1 Notation
350(1)
11.3.2 Estimation of the Mean and Variance
350(2)
11.3.3 Exponential Forecasting
352(1)
11.3.4 Autocorrelation
353(1)
11.4 Tutorial: Computing ρτ
354(5)
11.4.1 Remarks
359(1)
11.5 Drift and Forecasting
359(1)
11.6 Holt-Winters Exponential Forecasting
360(3)
11.6.1 Forecasting Error
362(1)
11.7 Tutorial: Holt-Winters Forecasting
363(4)
11.8 Regression-Based Forecasting of Stock Prices
367(1)
11.9 Tutorial: Regression-Based Forecasting
368(6)
11.9.1 Remarks
373(1)
11.10 Time-Varying Regression Estimators
374(1)
11.11 Tutorial: Time-Varying Regression Estimators
375(2)
11.11.1 Remarks
377(1)
11.12 Exercises
377(4)
11.12.1 Conceptual
377(1)
11.12.2 Computational
378(3)
12 Real-time Analytics
381(22)
12.1 Introduction
381(1)
12.2 Forecasting with a NASDAQ Quotation Stream
382(2)
12.2.1 Forecasting Algorithms
383(1)
12.3 Tutorial: Forecasting the Apple Inc. Stream
384(6)
12.3.1 Remarks
389(1)
12.4 The Twitter Streaming API
390(1)
12.5 Tutorial: Tapping the Twitter Stream
391(5)
12.5.1 Remarks
395(1)
12.6 Sentiment Analysis
396(2)
12.7 Tutorial: Sentiment Analysis of Hashtag Groups
398(2)
12.8 Exercises
400(3)
A Solutions to Exercises
403(14)
B Accessing the Twitter API
417(2)
References 419(4)
Index 423
Brian Steele is a full professor of Mathematics at the University of Montana and a Senior Data Scientist for SoftMath Consultants, LLC. Dr. Steele has published on the EM algorithm, exact bagging, the bootstrap, and numerous statistical applications. He teaches data analytics and statistics and consults on a wide variety of subjects related to data science and statistics. John Chandler has worked at the forefront of marketing and data analysis since 1999. He has worked with Fortune 100 advertisers and scores of agencies, measuring the effectiveness of advertising and improving performance. Dr. Chandler joined the faculty at the University of Montana School of Business Administration as a Clinical Professor of Marketing in 2015 and teaches classes in advanced marketing analytics and data science. He is one of the founders and Chief Data Scientist for Ars Quanta, a Seattle-based data science consultancy. Dr. Swarna Reddy is the founder, CEO, and a Senior Data Scientist for SoftMath Consultants, LLC and serves as a faculty affiliate with the Department of Mathematical Sciences at the University of Montana. Her area of expertise is computational mathematics and operations research. She is a published researcher and has developed computational solutions across a wide variety of areas spanning bioinformatics, cybersecurity, and business analytics.