|
I Introduction to Big Data |
|
|
1 | (14) |
|
|
5 | (10) |
|
|
5 | (3) |
|
|
8 | (2) |
|
|
10 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
13 | (2) |
|
II Statistical Inference for Big Data |
|
|
15 | (260) |
|
2 Basic Concepts in Probability |
|
|
17 | (20) |
|
2.1 Pearson System of Distributions |
|
|
21 | (6) |
|
|
27 | (6) |
|
2.3 Multivariate Central Limit Theorem |
|
|
33 | (1) |
|
|
34 | (3) |
|
3 Basic Concepts in Statistics |
|
|
37 | (26) |
|
3.1 Parametric Estimation |
|
|
37 | (9) |
|
|
46 | (11) |
|
3.3 Classical Bayesian Statistics |
|
|
57 | (6) |
|
|
63 | (32) |
|
|
63 | (1) |
|
4.2 Multivariate Analysis as a Generalization of Univariate Analysis |
|
|
64 | (7) |
|
4.2.1 The General Linear Model |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (2) |
|
4.3 Structure in Multivariate Data Analysis |
|
|
71 | (24) |
|
4.3.1 Principal Component Analysis |
|
|
71 | (3) |
|
|
74 | (2) |
|
4.3.3 Canonical Correlation |
|
|
76 | (3) |
|
4.3.4 Linear Discriminant Analysis |
|
|
79 | (1) |
|
4.3.5 Multidimensional Scaling |
|
|
80 | (7) |
|
|
87 | (8) |
|
5 Nonparametric Statistics |
|
|
95 | (76) |
|
5.1 Goodness-of-Fit Tests |
|
|
96 | (2) |
|
5.2 Linear Rank Statistics |
|
|
98 | (14) |
|
|
112 | (2) |
|
5.4 Hoeffding's Combinatorial Central Limit Theorem |
|
|
114 | (2) |
|
|
116 | (7) |
|
5.5.1 One-Sample Tests of Location |
|
|
116 | (3) |
|
5.5.2 Confidence Interval for the Median |
|
|
119 | (1) |
|
5.5.3 Wilcoxon Signed Rank Test |
|
|
120 | (3) |
|
|
123 | (4) |
|
5.6.1 Two-Sample Tests for Location |
|
|
124 | (1) |
|
5.6.2 Multi-Sample Test for Location |
|
|
125 | (1) |
|
5.6.3 Tests for Dispersion |
|
|
126 | (1) |
|
|
127 | (1) |
|
5.8 Tests for Ordered Alternatives |
|
|
128 | (4) |
|
5.9 A Unified Theory of Hypothesis Testing |
|
|
132 | (10) |
|
5.9.1 Umbrella Alternatives |
|
|
132 | (4) |
|
5.9.2 Tests for Trend in Proportions |
|
|
136 | (6) |
|
5.10 Randomized Block Designs |
|
|
142 | (2) |
|
|
144 | (10) |
|
5.11.1 Univariate Kernel Density Estimation |
|
|
145 | (4) |
|
5.11.2 The Rank Transform |
|
|
149 | (1) |
|
5.11.3 Multivariate Kernel Density Estimation |
|
|
149 | (5) |
|
5.12 Spatial Data Analysis |
|
|
154 | (8) |
|
5.12.1 Spatial Prediction |
|
|
156 | (4) |
|
5.12.2 Point Poisson Kriging of Areal Data |
|
|
160 | (2) |
|
|
162 | (7) |
|
|
162 | (6) |
|
5.13.2 Application of Le Cam's Lemmas |
|
|
168 | (1) |
|
|
169 | (2) |
|
6 Exponential Tilting and Its Applications |
|
|
171 | (24) |
|
|
171 | (4) |
|
6.2 Smooth Models for Discrete Distributions |
|
|
175 | (4) |
|
|
179 | (5) |
|
6.4 Tweedie's Formula: Univariate Case |
|
|
184 | (4) |
|
6.5 Tweedie's Formula: Multivariate Case |
|
|
188 | (1) |
|
6.6 The Saddlepoint Approximation and Notions of Information |
|
|
189 | (6) |
|
|
195 | (20) |
|
7.1 Inference for Generalized Linear Models |
|
|
198 | (2) |
|
7.2 Inference for Contingency Tables |
|
|
200 | (4) |
|
7.3 Two-Way Ordered Classifications |
|
|
204 | (5) |
|
|
209 | (6) |
|
7.4.1 Kaplan-Meier Estimator |
|
|
211 | (3) |
|
7.4.2 Modeling Survival Data |
|
|
214 | (1) |
|
|
215 | (14) |
|
8.1 Classical Methods of Analysis |
|
|
215 | (9) |
|
|
224 | (5) |
|
|
229 | (18) |
|
|
234 | (2) |
|
|
236 | (11) |
|
9.2.1 Application to One-Sample Ranking Problems |
|
|
239 | (4) |
|
9.2.2 Application to Two-Sample Ranking Problems |
|
|
243 | (4) |
|
10 Symbolic Data Analysis |
|
|
247 | (28) |
|
|
247 | (1) |
|
|
247 | (1) |
|
|
248 | (5) |
|
|
248 | (3) |
|
10.3.2 Sample Mean and Sample Variance |
|
|
251 | (2) |
|
10.3.3 Realization In SODAS |
|
|
253 | (1) |
|
|
253 | (3) |
|
|
253 | (3) |
|
|
256 | (2) |
|
10.5.1 Symbolic Regression for Interval Data |
|
|
256 | (1) |
|
10.5.2 Symbolic Regression for Modal Data |
|
|
257 | (1) |
|
10.5.3 Symbolic Regression in SODAS |
|
|
257 | (1) |
|
|
258 | (1) |
|
|
259 | (1) |
|
10.8 Factorial Discriminant Analysis |
|
|
260 | (1) |
|
10.9 Application to Parkinson's Disease |
|
|
260 | (7) |
|
|
261 | (1) |
|
|
262 | (1) |
|
|
262 | (1) |
|
10.9.2.2 Descriptive Statistics |
|
|
262 | (1) |
|
10.9.2.3 Symbolic Regression Analysis |
|
|
263 | (1) |
|
10.9.2.4 Symbolic Clustering |
|
|
263 | (1) |
|
10.9.2.5 Principal Component Analysis |
|
|
264 | (3) |
|
10.9.3 Comparison with Classical Method |
|
|
267 | (1) |
|
10.10 Application to Cardiovascular Disease Analysis |
|
|
267 | (8) |
|
10.10.1 Results of the Analysis |
|
|
269 | (4) |
|
10.10.2 Comparison with the Classical Method |
|
|
273 | (2) |
|
III Machine Learning for Big Data |
|
|
275 | (108) |
|
11 Tools for Machine Learning |
|
|
277 | (52) |
|
|
277 | (2) |
|
11.2 Simple Linear Regression |
|
|
279 | (10) |
|
11.2.1 Least Squares Method |
|
|
280 | (2) |
|
11.2.2 Statistical Inference on Regression Coefficients |
|
|
282 | (2) |
|
11.2.3 Verifying the Assumptions on the Error Terms |
|
|
284 | (5) |
|
11.3 Multiple Linear Regression |
|
|
289 | (7) |
|
11.3.1 Multiple Linear Regression Model |
|
|
289 | (1) |
|
|
290 | (1) |
|
11.3.3 Statistical Inference on Regression Coefficients |
|
|
291 | (1) |
|
11.3.4 Model Fit Evaluation |
|
|
292 | (4) |
|
11.4 Regression in Machine Learning |
|
|
296 | (10) |
|
11.4.1 Optimization for Linear Regression in Machine Learning |
|
|
298 | (2) |
|
11.4.1.1 Gradient Descent |
|
|
300 | (1) |
|
11.4.1.2 Feature Standardization |
|
|
301 | (2) |
|
11.4.1.3 Computing Cost on a Test Set |
|
|
303 | (3) |
|
11.5 Classification Models |
|
|
306 | (23) |
|
11.5.1 Logistic Regression |
|
|
307 | (1) |
|
11.5.1.1 Optimization with Maximal Likelihood for Logistic Regression |
|
|
308 | (2) |
|
11.5.1.2 Statistical Inference |
|
|
310 | (1) |
|
11.5.2 Logistic Regression for Binary Classification |
|
|
311 | (1) |
|
11.5.2.1 Kullback-Leibler Divergence |
|
|
312 | (4) |
|
11.5.3 Logistic Regression with Multiple Response Classes |
|
|
316 | (1) |
|
11.5.4 Regularization for Regression Models in Machine Learning |
|
|
317 | (2) |
|
11.5.4.1 Ridge Regression |
|
|
319 | (1) |
|
11.5.4.2 Lasso Regression |
|
|
320 | (1) |
|
11.5.4.3 The Choice of Regularization Method |
|
|
321 | (1) |
|
11.5.5 Support Vector Machines (SVM) |
|
|
321 | (1) |
|
|
321 | (1) |
|
11.5.5.2 Finding the Optimal Hyperplane |
|
|
322 | (3) |
|
11.5.5.3 SVM for Nonlinearly Separable Data Sets |
|
|
325 | (1) |
|
11.5.5.4 Illustrating SVM |
|
|
325 | (4) |
|
|
329 | (54) |
|
12.1 Feed-Forward Networks |
|
|
329 | (21) |
|
|
330 | (3) |
|
12.1.2 Introduction to Neural Networks |
|
|
333 | (1) |
|
12.1.3 Building a Deep Feed-Forward Network |
|
|
334 | (6) |
|
12.1.4 Learning in Deep Networks |
|
|
340 | (1) |
|
12.1.4.1 Quantitative Model |
|
|
341 | (1) |
|
12.1.4.2 Binary Classification Model |
|
|
342 | (1) |
|
|
342 | (3) |
|
12.1.5.1 A Machine Learning Approach to Generalization |
|
|
345 | (5) |
|
12.2 Recurrent Neural Networks |
|
|
350 | (16) |
|
12.2.1 Building a Recurrent Neural Network |
|
|
350 | (2) |
|
12.2.2 Learning in Recurrent Networks |
|
|
352 | (2) |
|
12.2.3 Most Common Design Structures of RNNs |
|
|
354 | (3) |
|
|
357 | (2) |
|
|
359 | (2) |
|
12.2.6 Long-Term Dependencies and LSTM RNN |
|
|
361 | (3) |
|
12.2.7 Reduction for Exploding Gradients |
|
|
364 | (2) |
|
12.3 Convolution Neural Networks |
|
|
366 | (10) |
|
12.3.1 Convolution Operator for Arrays |
|
|
368 | (1) |
|
12.3.1.1 Properties of the Convolution Operator |
|
|
369 | (3) |
|
12.3.2 Convolution Layers |
|
|
372 | (3) |
|
|
375 | (1) |
|
|
376 | (7) |
|
|
376 | (2) |
|
12.4.2 General Architecture |
|
|
378 | (5) |
|
IV Computational Methods for Statistical Inference |
|
|
383 | (44) |
|
13 Bayesian Computation Methods |
|
|
385 | (42) |
|
13.1 Data Augmentation Methods |
|
|
385 | (2) |
|
13.2 Metropolis-Hastings Algorithm |
|
|
387 | (2) |
|
|
389 | (1) |
|
|
390 | (10) |
|
13.4.1 Application to Ranking |
|
|
391 | (7) |
|
13.4.2 Extension to Several Populations |
|
|
398 | (2) |
|
13.5 Variational Bayesian Methods |
|
|
400 | (4) |
|
13.5.1 Optimization of the Variational Distribution |
|
|
402 | (2) |
|
13.6 Bayesian Nonparametric Methods |
|
|
404 | (23) |
|
|
404 | (4) |
|
13.6.2 The Poisson-Dirichlet Prior |
|
|
408 | (1) |
|
13.6.3 Simulation of Bayesian Posterior Distributions |
|
|
408 | (2) |
|
13.6.4 Other Applications |
|
|
410 | (17) |
Index |
|
427 | |