Muutke küpsiste eelistusi

Mathematical Foundations for Data Analysis 2021 ed. [Kõva köide]

  • Formaat: Hardback, 287 pages, kõrgus x laius: 235x155 mm, kaal: 685 g, 108 Illustrations, color; 1 Illustrations, black and white; XVII, 287 p. 109 illus., 108 illus. in color., 1 Hardback
  • Sari: Springer Series in the Data Sciences
  • Ilmumisaeg: 30-Mar-2021
  • Kirjastus: Springer Nature Switzerland AG
  • ISBN-10: 3030623408
  • ISBN-13: 9783030623401
  • Kõva köide
  • Hind: 53,33 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 62,74 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 287 pages, kõrgus x laius: 235x155 mm, kaal: 685 g, 108 Illustrations, color; 1 Illustrations, black and white; XVII, 287 p. 109 illus., 108 illus. in color., 1 Hardback
  • Sari: Springer Series in the Data Sciences
  • Ilmumisaeg: 30-Mar-2021
  • Kirjastus: Springer Nature Switzerland AG
  • ISBN-10: 3030623408
  • ISBN-13: 9783030623401

This textbook, suitable for an early undergraduate up to a graduate course, provides an overview of many basic principles and techniques needed for modern data analysis. In particular, this book was designed and written as preparation for students planning to take rigorous Machine Learning and Data Mining courses. It introduces key conceptual tools necessary for data analysis, including concentration of measure and PAC bounds, cross validation, gradient descent, and principal component analysis. It also surveys basic techniques in supervised (regression and classification) and unsupervised learning (dimensionality reduction and clustering) through an accessible, simplified presentation. Students are recommended to have some background in calculus, probability, and linear algebra.  Some familiarity with programming and algorithms is useful to understand advanced topics on computational techniques.

Arvustused

This is certainly a timely book with large potential impact and appeal. the book is therewith accessible to a broad scientific audience including undergraduate students. Mathematical Foundations for Data Analysis provides a comprehensive exploration of the mathematics relevant to modern data science topics, with a target audience that is looking for an intuitive and accessible presentation rather than a deep dive into mathematical intricacies. (Aretha L. Teckentrup, SIAM Review, Vol. 65 (1), March, 2023) The book is fairly compact, but a lot of information is presented in those pages. the book is pretty much self-contained, but prior knowledge of linear algebra and python programming would benefit anyone. The clear writing is backed in many instances by helpful illustrations. Color is used judiciously throughout the text to help differentiate between objects and highlight items of interest. Phillips book is much more concise, but still discusses many different mathematical aspects of data science. (David R. Gurney, MAA Reviews, September 5, 2021)

1 Probability Review
1(22)
1.1 Sample Spaces
1(3)
1.2 Conditional Probability and Independence
4(1)
1.3 Density Functions
4(2)
1.4 Expected Value
6(1)
1.5 Variance
7(1)
1.6 Joint, Marginal, and Conditional Distributions
8(2)
1.7 Bayes' Rule
10(4)
1.7.1 Model Given Data
11(3)
1.8 Bayesian Inference
14(9)
Exercises
19(4)
2 Convergence and Sampling
23(20)
2.1 Sampling and Estimation
23(3)
2.2 Probably Approximately Correct (PAC)
26(1)
2.3 Concentration of Measure
26(8)
2.3.1 Markov Inequality
27(1)
2.3.2 Chebyshev Inequality
28(1)
2.3.3 Chernoff-Hoeffding Inequality
29(2)
2.3.4 Union Bound and Examples
31(3)
2.4 Importance Sampling
34(9)
2.4.1 Sampling Without Replacement with Priority Sampling
39(2)
Exercises
41(2)
3 Linear Algebra Review
43(16)
3.1 Vectors and Matrices
43(3)
3.2 Addition and Multiplication
46(3)
3.3 Norms
49(2)
3.4 Linear Independence
51(1)
3.5 Rank
52(1)
3.6 Square Matrices and Properties
53(2)
3.7 Orthogonality
55(4)
Exercises
57(2)
4 Distances and Nearest Neighbors
59(36)
4.1 Metrics
59(1)
4.2 Lp Distances and their Relatives
60(6)
4.2.1 Lp Distances
60(3)
4.2.2 Mahalanobis Distance
63(1)
4.2.3 Cosine and Angular Distance
64(1)
4.2.4 KL Divergence
65(1)
4.3 Distances for Sets and Strings
66(4)
4.3.1 Jaccard Distance
67(2)
4.3.2 Edit Distance
69(1)
4.4 Modeling Text with Distances
70(6)
4.4.1 Bag-of-Words Vectors
70(3)
4.4.2 k-Grams
73(3)
4.5 Similarities
76(4)
4.5.1 Set Similarities
76(1)
4.5.2 Normed Similarities
77(1)
4.5.3 Normed Similarities between Sets
78(2)
4.6 Locality Sensitive Hashing
80(15)
4.6.1 Properties of Locality Sensitive Hashing
82(1)
4.6.2 Prototypical Tasks for LSH
83(1)
4.6.3 Banding to Amplify LSH
84(3)
4.6.4 LSH for Angular Distance
87(2)
4.6.5 LSH for Euclidean Distance
89(1)
4.6.6 Min Hashing as LSH for Jaccard Distance
90(3)
Exercises
93(2)
5 Linear Regression
95(30)
5.1 Simple Linear Regression
95(4)
5.2 Linear Regression with Multiple Explanatory Variables
99(3)
5.3 Polynomial Regression
102(2)
5.4 Cross-Validation
104(5)
5.4.1 Other ways to Evaluate Linear Regression Models
108(1)
5.5 Regularized Regression
109(16)
5.5.1 Tikhonov Regularization for Ridge Regression
110(2)
5.5.2 Lasso
112(1)
5.5.3 Dual Constrained Formulation
113(2)
5.5.4 Matching Pursuit
115(7)
Exercises
122(3)
6 Gradient Descent
125(18)
6.1 Functions
125(3)
6.2 Gradients
128(1)
6.3 Gradient Descent
129(6)
6.3.1 Learning Rate
129(6)
6.4 Fitting a Model to Data
135(8)
6.4.1 Least Mean Squares Updates for Regression
136(1)
6.4.2 Decomposable Functions
137(4)
Exercises
141(2)
7 Dimensionality Reduction
143(34)
7.1 Data Matrices
143(4)
7.1.1 Projections
145(1)
7.1.2 Sum of Squared Errors Goal
146(1)
7.2 Singular Value Decomposition
147(8)
7.2.1 Best Rank-k Approximation of a Matrix
152(3)
7.3 Eigenvalues and Eigenvectors
155(2)
7.4 The Power Method
157(3)
7.5 Principal Component Analysis
160(1)
7.6 Multidimensional Scaling
161(5)
7.6.1 Why does Classical MDS work?
163(3)
7.7 Linear Discriminant Analysis
166(1)
7.8 Distance Metric Learning
167(2)
7.9 Matrix Completion
169(2)
7.10 Random Projections
171(6)
Exercises
173(4)
8 Clustering
177(30)
8.1 Voronoi Diagrams
177(6)
8.1.1 Delaunay Triangulation
180(2)
8.1.2 Connection to Assignment-Based Clustering
182(1)
8.2 Gonzalez's Algorithm for k-Center Clustering
183(2)
8.3 Lloyd's Algorithm for k-Means Clustering
185(9)
8.3.1 Lloyd's Algorithm
186(5)
8.3.2 k-Means++
191(1)
8.3.3 k-Mediod Clustering
192(1)
8.3.4 Soft Clustering
193(1)
8.4 Mixture of Gaussians
194(2)
8.4.1 Expectation-Maximization
196(1)
8.5 Hierarchical Clustering
196(3)
8.6 Density-Based Clustering and Outliers
199(2)
8.6.1 Outliers
200(1)
8.7 Mean Shift Clustering
201(6)
Exercises
203(4)
9 Classification
207(30)
9.1 Linear Classifiers
207(6)
9.1.1 Loss Functions
210(2)
9.1.2 Cross-Validation and Regularization
212(1)
9.2 Perception Algorithm
213(4)
9.3 Support Vector Machines and Kernels
217(5)
9.3.1 The Dual: Mistake Counter
218(1)
9.3.2 Feature Expansion
219(2)
9.3.3 Support Vector Machines
221(1)
9.4 Learnability and VC dimension
222(3)
9.5 kNN Classifiers
225(1)
9.6 Decision Trees
225(3)
9.7 Neural Networks
228(9)
9.7.1 Training with Back-propagation
230(3)
Exercises
233(4)
10 Graph Structured Data
237(24)
10.1 Markov Chains
239(7)
10.1.1 Ergodic Markov Chains
242(3)
10.1.2 Metropolis Algorithm
245(1)
10.2 PageRank
246(3)
10.3 Spectral Clustering on Graphs
249(5)
10.3.1 Laplacians and their EigenStructures
250(4)
10.4 Communities in Graphs
254(7)
10.4.1 Preferential Attachment
256(1)
10.4.2 Betweenness
256(1)
10.4.3 Modularity
257(2)
Exercises
259(2)
11 Big Data and Sketching
261(22)
11.1 The Streaming Model
262(3)
11.1.1 Mean and Variance
264(1)
11.1.2 Reservoir Sampling
264(1)
11.2 Frequent Items
265(8)
11.2.1 Warm-Up: Majority
268(1)
11.2.2 Misra-Gries Algorithm
269(1)
11.2.3 Count-Min Sketch
270(2)
11.2.4 Count Sketch
272(1)
11.3 Matrix Sketching
273(10)
11.3.1 Covariance Matrix Summation
274(1)
11.3.2 Frequent Directions
275(2)
11.3.3 Row Sampling
277(1)
11.3.4 Random Projections and Count Sketch Hashing
278(2)
Exercises
280(3)
Index 283
Jeff M. Phillips is an Associate Professor in the School of Computing within the University of Utah.  He directs the Utah Center for Data Science as well as the Data Science curriculum within the School of Computing.  His research is on algorithms for big data analytics, a domain with spans machine learning, computational geometry, data mining, algorithms, and databases, and his work regularly appears in top venues in each of these fields.  He focuses on a geometric interpretation of problems, striving for simple, geometric, and intuitive techniques with provable guarantees and solve important challenges in data science.  His research is supported by numerous NSF awards including an NSF Career Award.