|
|
1 | (22) |
|
|
1 | (3) |
|
1.2 Conditional Probability and Independence |
|
|
4 | (1) |
|
|
4 | (2) |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.6 Joint, Marginal, and Conditional Distributions |
|
|
8 | (2) |
|
|
10 | (4) |
|
|
11 | (3) |
|
|
14 | (9) |
|
|
19 | (4) |
|
2 Convergence and Sampling |
|
|
23 | (20) |
|
2.1 Sampling and Estimation |
|
|
23 | (3) |
|
2.2 Probably Approximately Correct (PAC) |
|
|
26 | (1) |
|
2.3 Concentration of Measure |
|
|
26 | (8) |
|
|
27 | (1) |
|
2.3.2 Chebyshev Inequality |
|
|
28 | (1) |
|
2.3.3 Chernoff-Hoeffding Inequality |
|
|
29 | (2) |
|
2.3.4 Union Bound and Examples |
|
|
31 | (3) |
|
|
34 | (9) |
|
2.4.1 Sampling Without Replacement with Priority Sampling |
|
|
39 | (2) |
|
|
41 | (2) |
|
|
43 | (16) |
|
|
43 | (3) |
|
3.2 Addition and Multiplication |
|
|
46 | (3) |
|
|
49 | (2) |
|
|
51 | (1) |
|
|
52 | (1) |
|
3.6 Square Matrices and Properties |
|
|
53 | (2) |
|
|
55 | (4) |
|
|
57 | (2) |
|
4 Distances and Nearest Neighbors |
|
|
59 | (36) |
|
|
59 | (1) |
|
4.2 Lp Distances and their Relatives |
|
|
60 | (6) |
|
|
60 | (3) |
|
4.2.2 Mahalanobis Distance |
|
|
63 | (1) |
|
4.2.3 Cosine and Angular Distance |
|
|
64 | (1) |
|
|
65 | (1) |
|
4.3 Distances for Sets and Strings |
|
|
66 | (4) |
|
|
67 | (2) |
|
|
69 | (1) |
|
4.4 Modeling Text with Distances |
|
|
70 | (6) |
|
4.4.1 Bag-of-Words Vectors |
|
|
70 | (3) |
|
|
73 | (3) |
|
|
76 | (4) |
|
|
76 | (1) |
|
4.5.2 Normed Similarities |
|
|
77 | (1) |
|
4.5.3 Normed Similarities between Sets |
|
|
78 | (2) |
|
4.6 Locality Sensitive Hashing |
|
|
80 | (15) |
|
4.6.1 Properties of Locality Sensitive Hashing |
|
|
82 | (1) |
|
4.6.2 Prototypical Tasks for LSH |
|
|
83 | (1) |
|
4.6.3 Banding to Amplify LSH |
|
|
84 | (3) |
|
4.6.4 LSH for Angular Distance |
|
|
87 | (2) |
|
4.6.5 LSH for Euclidean Distance |
|
|
89 | (1) |
|
4.6.6 Min Hashing as LSH for Jaccard Distance |
|
|
90 | (3) |
|
|
93 | (2) |
|
|
95 | (30) |
|
5.1 Simple Linear Regression |
|
|
95 | (4) |
|
5.2 Linear Regression with Multiple Explanatory Variables |
|
|
99 | (3) |
|
5.3 Polynomial Regression |
|
|
102 | (2) |
|
|
104 | (5) |
|
5.4.1 Other ways to Evaluate Linear Regression Models |
|
|
108 | (1) |
|
5.5 Regularized Regression |
|
|
109 | (16) |
|
5.5.1 Tikhonov Regularization for Ridge Regression |
|
|
110 | (2) |
|
|
112 | (1) |
|
5.5.3 Dual Constrained Formulation |
|
|
113 | (2) |
|
|
115 | (7) |
|
|
122 | (3) |
|
|
125 | (18) |
|
|
125 | (3) |
|
|
128 | (1) |
|
|
129 | (6) |
|
|
129 | (6) |
|
6.4 Fitting a Model to Data |
|
|
135 | (8) |
|
6.4.1 Least Mean Squares Updates for Regression |
|
|
136 | (1) |
|
6.4.2 Decomposable Functions |
|
|
137 | (4) |
|
|
141 | (2) |
|
7 Dimensionality Reduction |
|
|
143 | (34) |
|
|
143 | (4) |
|
|
145 | (1) |
|
7.1.2 Sum of Squared Errors Goal |
|
|
146 | (1) |
|
7.2 Singular Value Decomposition |
|
|
147 | (8) |
|
7.2.1 Best Rank-k Approximation of a Matrix |
|
|
152 | (3) |
|
7.3 Eigenvalues and Eigenvectors |
|
|
155 | (2) |
|
|
157 | (3) |
|
7.5 Principal Component Analysis |
|
|
160 | (1) |
|
7.6 Multidimensional Scaling |
|
|
161 | (5) |
|
7.6.1 Why does Classical MDS work? |
|
|
163 | (3) |
|
7.7 Linear Discriminant Analysis |
|
|
166 | (1) |
|
7.8 Distance Metric Learning |
|
|
167 | (2) |
|
|
169 | (2) |
|
|
171 | (6) |
|
|
173 | (4) |
|
|
177 | (30) |
|
|
177 | (6) |
|
8.1.1 Delaunay Triangulation |
|
|
180 | (2) |
|
8.1.2 Connection to Assignment-Based Clustering |
|
|
182 | (1) |
|
8.2 Gonzalez's Algorithm for k-Center Clustering |
|
|
183 | (2) |
|
8.3 Lloyd's Algorithm for k-Means Clustering |
|
|
185 | (9) |
|
|
186 | (5) |
|
|
191 | (1) |
|
8.3.3 k-Mediod Clustering |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
194 | (2) |
|
8.4.1 Expectation-Maximization |
|
|
196 | (1) |
|
8.5 Hierarchical Clustering |
|
|
196 | (3) |
|
8.6 Density-Based Clustering and Outliers |
|
|
199 | (2) |
|
|
200 | (1) |
|
8.7 Mean Shift Clustering |
|
|
201 | (6) |
|
|
203 | (4) |
|
|
207 | (30) |
|
|
207 | (6) |
|
|
210 | (2) |
|
9.1.2 Cross-Validation and Regularization |
|
|
212 | (1) |
|
|
213 | (4) |
|
9.3 Support Vector Machines and Kernels |
|
|
217 | (5) |
|
9.3.1 The Dual: Mistake Counter |
|
|
218 | (1) |
|
|
219 | (2) |
|
9.3.3 Support Vector Machines |
|
|
221 | (1) |
|
9.4 Learnability and VC dimension |
|
|
222 | (3) |
|
|
225 | (1) |
|
|
225 | (3) |
|
|
228 | (9) |
|
9.7.1 Training with Back-propagation |
|
|
230 | (3) |
|
|
233 | (4) |
|
|
237 | (24) |
|
|
239 | (7) |
|
10.1.1 Ergodic Markov Chains |
|
|
242 | (3) |
|
10.1.2 Metropolis Algorithm |
|
|
245 | (1) |
|
|
246 | (3) |
|
10.3 Spectral Clustering on Graphs |
|
|
249 | (5) |
|
10.3.1 Laplacians and their EigenStructures |
|
|
250 | (4) |
|
10.4 Communities in Graphs |
|
|
254 | (7) |
|
10.4.1 Preferential Attachment |
|
|
256 | (1) |
|
|
256 | (1) |
|
|
257 | (2) |
|
|
259 | (2) |
|
11 Big Data and Sketching |
|
|
261 | (22) |
|
|
262 | (3) |
|
|
264 | (1) |
|
11.1.2 Reservoir Sampling |
|
|
264 | (1) |
|
|
265 | (8) |
|
|
268 | (1) |
|
11.2.2 Misra-Gries Algorithm |
|
|
269 | (1) |
|
|
270 | (2) |
|
|
272 | (1) |
|
|
273 | (10) |
|
11.3.1 Covariance Matrix Summation |
|
|
274 | (1) |
|
11.3.2 Frequent Directions |
|
|
275 | (2) |
|
|
277 | (1) |
|
11.3.4 Random Projections and Count Sketch Hashing |
|
|
278 | (2) |
|
|
280 | (3) |
Index |
|
283 | |