Why I Wrote This Book |
|
xv | |
1 Discrete random variables |
|
1 | (42) |
|
|
1 | (1) |
|
1.2 Bernoulli random variable |
|
|
2 | (2) |
|
1.3 General discrete random variable |
|
|
4 | (2) |
|
|
6 | (9) |
|
1.4.1 Mechanical interpretation of the mean |
|
|
7 | (5) |
|
|
12 | (3) |
|
|
15 | (11) |
|
|
16 | (1) |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
18 | (1) |
|
1.5.5 Vectorized computations |
|
|
19 | (4) |
|
|
23 | (2) |
|
1.5.7 Coding and help in R |
|
|
25 | (1) |
|
1.6 Binomial distribution |
|
|
26 | (6) |
|
|
32 | (6) |
|
1.8 Random number generation using sample |
|
|
38 | (5) |
|
1.8.1 Generation of a discrete random variable |
|
|
38 | (1) |
|
|
39 | (4) |
2 Continuous random variables |
|
43 | (106) |
|
2.1 Distribution and density functions |
|
|
43 | (5) |
|
2.1.1 Cumulative distribution function |
|
|
43 | (2) |
|
|
45 | (1) |
|
|
46 | (2) |
|
2.2 Mean, variance, and other moments |
|
|
48 | (11) |
|
2.2.1 Quantiles, quartiles, and the median |
|
|
54 | (1) |
|
2.2.2 The tight confidence range |
|
|
55 | (4) |
|
|
59 | (4) |
|
2.4 Exponential distribution |
|
|
63 | (6) |
|
2.4.1 Laplace or double-exponential distribution |
|
|
67 | (1) |
|
|
67 | (2) |
|
2.5 Moment generating function |
|
|
69 | (6) |
|
2.5.1 Fourier transform and characteristic function |
|
|
72 | (3) |
|
|
75 | (7) |
|
2.6.1 Relationship to Poisson distribution |
|
|
77 | (2) |
|
2.6.2 Computing the gamma distribution in R |
|
|
79 | (1) |
|
2.6.3 The tight confidence range |
|
|
79 | (3) |
|
|
82 | (9) |
|
2.8 Chebyshev's inequality |
|
|
91 | (2) |
|
2.9 The law of large numbers |
|
|
93 | (11) |
|
2.9.1 Four types of stochastic convergence |
|
|
94 | (5) |
|
2.9.2 Integral approximation using simulations |
|
|
99 | (5) |
|
2.10 The central limit theorem |
|
|
104 | (12) |
|
2.10.1 Why the normal distribution is the most natural symmetric distribution |
|
|
112 | (1) |
|
2.10.2 CLT on the relative scale |
|
|
113 | (3) |
|
2.11 Lognormal distribution |
|
|
116 | (4) |
|
2.11.1 Computation of the tight confidence range |
|
|
118 | (2) |
|
2.12 Transformations and the delta method |
|
|
120 | (6) |
|
|
124 | (2) |
|
2.13 Random number generation |
|
|
126 | (6) |
|
2.13.1 Cauchy distribution |
|
|
130 | (2) |
|
|
132 | (2) |
|
|
134 | (4) |
|
2.16 Benford's law: the distribution of the first digit |
|
|
138 | (7) |
|
2.16.1 Distributions that almost obey Benford's law |
|
|
142 | (3) |
|
2.17 The Pearson family of distributions |
|
|
145 | (2) |
|
2.18 Major univariate continuous distributions |
|
|
147 | (2) |
3 Multivariate random variables |
|
149 | (106) |
|
3.1 Joint cdf and density |
|
|
149 | (7) |
|
|
154 | (1) |
|
3.1.2 Bivariate discrete distribution |
|
|
154 | (2) |
|
|
156 | (12) |
|
|
159 | (9) |
|
|
168 | (21) |
|
3.3.1 Conditional mean and variance |
|
|
171 | (8) |
|
3.3.2 Mixture distribution and Bayesian statistics |
|
|
179 | (3) |
|
|
182 | (2) |
|
3.3.4 Cancer tumors grow exponentially |
|
|
184 | (5) |
|
3.4 Correlation and linear regression |
|
|
189 | (9) |
|
3.5 Bivariate normal distribution |
|
|
198 | (20) |
|
3.5.1 Regression as conditional mean |
|
|
206 | (2) |
|
3.5.2 Variance decomposition and coefficient of determination |
|
|
208 | (1) |
|
3.5.3 Generation of dependent normal observations |
|
|
209 | (5) |
|
|
214 | (4) |
|
3.6 Joint density upon transformation |
|
|
218 | (5) |
|
3.7 Geometric probability |
|
|
223 | (7) |
|
|
224 | (1) |
|
3.7.2 Random objects on the square |
|
|
225 | (5) |
|
3.8 Optimal portfolio allocation |
|
|
230 | (6) |
|
3.8.1 Stocks do not correlate |
|
|
231 | (1) |
|
|
232 | (1) |
|
|
233 | (1) |
|
|
234 | (2) |
|
3.9 Distribution of order statistics |
|
|
236 | (3) |
|
3.10 Multidimensional random vectors |
|
|
239 | (16) |
|
3.10.1 Multivariate conditional distribution |
|
|
245 | (2) |
|
|
247 | (1) |
|
3.10.3 Multivariate delta method |
|
|
248 | (3) |
|
3.10.4 Multinomial distribution |
|
|
251 | (4) |
4 Four important distributions in statistics |
|
255 | (36) |
|
4.1 Multivariate normal distribution |
|
|
255 | (15) |
|
4.1.1 Generation of multivariate normal variables |
|
|
259 | (2) |
|
4.1.2 Conditional distribution |
|
|
261 | (7) |
|
|
268 | (2) |
|
4.2 Chi-square distribution |
|
|
270 | (10) |
|
4.2.1 Noncentral chi-square distribution |
|
|
276 | (1) |
|
4.2.2 Expectations and variances of quadratic forms |
|
|
277 | (1) |
|
4.2.3 Kronecker product and covariance matrix |
|
|
277 | (3) |
|
|
280 | (6) |
|
4.3.1 Noncentral t-distribution |
|
|
284 | (2) |
|
|
286 | (5) |
5 Preliminary data analysis and visualization |
|
291 | (56) |
|
5.1 Comparison of random variables using the cdf |
|
|
291 | (21) |
|
|
294 | (11) |
|
5.1.2 Survival probability |
|
|
305 | (7) |
|
|
312 | (3) |
|
|
315 | (9) |
|
5.3.1 The q-q confidence bands |
|
|
319 | (5) |
|
|
324 | (1) |
|
5.5 Kernel density estimation |
|
|
325 | (10) |
|
|
331 | (2) |
|
|
333 | (2) |
|
5.6 Bivariate normal kernel density |
|
|
335 | (12) |
|
5.6.1 Bivariate kernel smoother for images |
|
|
339 | (2) |
|
5.6.2 Smoothed scatterplot |
|
|
341 | (1) |
|
5.6.3 Spatial statistics for disease mapping |
|
|
342 | (5) |
6 Parameter estimation |
|
347 | (176) |
|
6.1 Statistics as inverse probability |
|
|
349 | (1) |
|
|
350 | (7) |
|
6.2.1 Generalized method of moments |
|
|
353 | (4) |
|
|
357 | (1) |
|
6.4 Statistical properties of an estimator |
|
|
358 | (20) |
|
|
359 | (6) |
|
|
365 | (6) |
|
6.4.3 Multidimensional MSE |
|
|
371 | (2) |
|
6.4.4 Consistency of estimators |
|
|
373 | (5) |
|
|
378 | (7) |
|
6.5.1 Estimation of the mean using linear estimator |
|
|
379 | (4) |
|
6.5.2 Vector representation |
|
|
383 | (2) |
|
6.6 Estimation of variance and correlation coefficient |
|
|
385 | (13) |
|
6.6.1 Quadratic estimation of the variance |
|
|
386 | (3) |
|
6.6.2 Estimation of the covariance and correlation coefficient |
|
|
389 | (9) |
|
6.7 Least squares for simple linear regression |
|
|
398 | (17) |
|
6.7.1 Gauss-Markov theorem |
|
|
402 | (2) |
|
6.7.2 Statistical properties of the OLS estimator under the normal assumption |
|
|
404 | (2) |
|
6.7.3 The lm function and prediction by linear regression |
|
|
406 | (4) |
|
6.7.4 Misinterpretation of the coefficient of determination |
|
|
410 | (5) |
|
6.8 Sufficient statistics and the exponential family of distributions |
|
|
415 | (18) |
|
6.8.1 Uniformly minimum-variance unbiased estimator |
|
|
419 | (3) |
|
6.8.2 Exponential family of distributions |
|
|
422 | (11) |
|
6.9 Fisher information and the Cramer-Rao bound |
|
|
433 | (20) |
|
|
434 | (6) |
|
6.9.2 Multiple parameters |
|
|
440 | (13) |
|
|
453 | (57) |
|
6.10.1 Basic definitions and examples |
|
|
453 | (18) |
|
6.10.2 Circular statistics and the von Mises distribution |
|
|
471 | (4) |
|
6.10.3 Maximum likelihood, sufficient statistics and the exponential family |
|
|
475 | (2) |
|
6.10.4 Asymptotic properties of ML |
|
|
477 | (8) |
|
6.10.5 When maximum likelihood breaks down |
|
|
485 | (13) |
|
6.10.6 Algorithms for log-likelihood function maximization |
|
|
498 | (12) |
|
6.11 Estimating equations and the M-estimator |
|
|
510 | (13) |
|
|
516 | (7) |
7 Hypothesis testing and confidence intervals |
|
523 | (104) |
|
7.1 Fundamentals of statistical testing |
|
|
523 | (8) |
|
7.1.1 The p-value and its interpretation |
|
|
525 | (3) |
|
7.1.2 Ad hoc statistical testing |
|
|
528 | (3) |
|
|
531 | (5) |
|
7.3 The power function of the Z-test |
|
|
536 | (13) |
|
7.3.1 Type II error and the power function |
|
|
536 | (6) |
|
7.3.2 Optimal significance level and the ROC curve |
|
|
542 | (3) |
|
7.3.3 One-sided hypothesis |
|
|
545 | (4) |
|
7.4 The t-test for the means |
|
|
549 | (13) |
|
|
549 | (3) |
|
|
552 | (5) |
|
|
557 | (1) |
|
7.4.4 Paired versus unpaired t-test |
|
|
558 | (2) |
|
7.4.5 Parametric versus nonparametric tests |
|
|
560 | (2) |
|
|
562 | (4) |
|
7.5.1 Two-sided variance test |
|
|
562 | (3) |
|
7.5.2 One-sided variance test |
|
|
565 | (1) |
|
|
566 | (14) |
|
7.6.1 General formulation |
|
|
567 | (2) |
|
7.6.2 The F-test for variances |
|
|
569 | (4) |
|
7.6.3 Binomial proportion |
|
|
573 | (4) |
|
|
577 | (3) |
|
7.7 Testing for correlation coefficient |
|
|
580 | (3) |
|
|
583 | (14) |
|
7.8.1 Unbiased CI and its connection to hypothesis testing |
|
|
588 | (1) |
|
|
589 | (2) |
|
7.8.3 CI for the normal variance and SD |
|
|
591 | (1) |
|
7.8.4 CI for other major statistical parameters |
|
|
592 | (2) |
|
|
594 | (3) |
|
7.9 Three asymptotic tests and confidence intervals |
|
|
597 | (15) |
|
7.9.1 Pearson chi-square test |
|
|
605 | (3) |
|
7.9.2 Handwritten digit recognition |
|
|
608 | (4) |
|
7.10 Limitations of classical hypothesis testing and the d-value |
|
|
612 | (15) |
|
7.10.1 What the p-value means? |
|
|
613 | (1) |
|
|
614 | (2) |
|
7.10.3 The null hypothesis is always rejected with a large enough sample size |
|
|
616 | (2) |
|
7.10.4 Parameter-based inference |
|
|
618 | (1) |
|
7.10.5 The d-value for individual inference |
|
|
619 | (8) |
8 Linear model and its extensions |
|
627 | (114) |
|
8.1 Basic definitions and linear least squares |
|
|
627 | (12) |
|
8.1.1 Linear model with the intercept term |
|
|
632 | (1) |
|
8.1.2 The vector-space geometry of least squares |
|
|
633 | (3) |
|
8.1.3 Coefficient of determination |
|
|
636 | (3) |
|
8.2 The Gauss-Markov theorem |
|
|
639 | (4) |
|
8.2.1 Estimation of regression variance |
|
|
641 | (2) |
|
8.3 Properties of OLS estimators under the normal assumption |
|
|
643 | (7) |
|
8.3.1 The sensitivity of statistical inference to violation of the normal assumption |
|
|
646 | (4) |
|
8.4 Statistical inference with linear models |
|
|
650 | (21) |
|
8.4.1 Confidence interval and region |
|
|
650 | (3) |
|
8.4.2 Linear hypothesis testing and the F-test |
|
|
653 | (8) |
|
8.4.3 Prediction by linear regression and simultaneous confidence band |
|
|
661 | (3) |
|
8.4.4 Testing the null hypothesis and the coefficient of determination |
|
|
664 | (1) |
|
8.4.5 Is X fixed or random? |
|
|
665 | (6) |
|
8.5 The one-sided p- and d-value for regression coefficients |
|
|
671 | (5) |
|
8.5.1 The one-sided p-value for interpretation on the population level |
|
|
672 | (1) |
|
8.5.2 The d-value for interpretation on the individual level |
|
|
673 | (3) |
|
8.6 Examples and pitfalls |
|
|
676 | (20) |
|
8.6.1 Kids drinking and alcohol movie watching |
|
|
676 | (4) |
|
8.6.2 My first false discovery |
|
|
680 | (1) |
|
8.6.3 Height, foot, and nose regression |
|
|
681 | (3) |
|
8.6.4 A geometric interpretation of adding a new predictor |
|
|
684 | (3) |
|
8.6.5 Contrast coefficient of determination against spurious regression |
|
|
687 | (9) |
|
8.7 Dummy variable approach and ANOVA |
|
|
696 | (27) |
|
8.7.1 Dummy variables for categories |
|
|
696 | (9) |
|
8.7.2 Unpaired and paired t-test |
|
|
705 | (3) |
|
8.7.3 Modeling longitudinal data |
|
|
708 | (4) |
|
8.7.4 One-way ANOVA model |
|
|
712 | (8) |
|
|
720 | (3) |
|
8.8 Generalized linear model |
|
|
723 | (18) |
|
8.8.1 MLE estimation of GLM |
|
|
727 | (1) |
|
8.8.2 Logistic and probit regressions for binary outcome |
|
|
728 | (8) |
|
|
736 | (5) |
9 Nonlinear regression |
|
741 | (70) |
|
9.1 Definition and motivating examples |
|
|
741 | (9) |
|
9.2 Nonlinear least squares |
|
|
750 | (3) |
|
9.3 Gauss-Newton algorithm |
|
|
753 | (4) |
|
9.4 Statistical properties of the NLS estimator |
|
|
757 | (13) |
|
9.4.1 Large sample properties |
|
|
757 | (5) |
|
9.4.2 Small sample properties |
|
|
762 | (1) |
|
9.4.3 Asymptotic confidence intervals and hypothesis testing |
|
|
763 | (5) |
|
9.4.4 Three methods of statistical inference in large sample |
|
|
768 | (2) |
|
9.5 The nls function and examples |
|
|
770 | (16) |
|
|
782 | (4) |
|
9.6 Studying small sample properties through simulations |
|
|
786 | (8) |
|
9.6.1 Normal distribution approximation |
|
|
787 | (2) |
|
|
789 | (2) |
|
|
791 | (1) |
|
9.6.4 Confidence intervals |
|
|
792 | (2) |
|
9.7 Numerical complications of the nonlinear least squares |
|
|
794 | (5) |
|
9.7.1 Criteria for existence |
|
|
795 | (1) |
|
9.7.2 Criteria for uniqueness |
|
|
796 | (3) |
|
9.8 Optimal design of experiments with nonlinear regression |
|
|
799 | (6) |
|
9.8.1 Motivating examples |
|
|
799 | (3) |
|
9.8.2 Optimal designs with nonlinear regression |
|
|
802 | (3) |
|
9.9 The Michaelis-Menten model |
|
|
805 | (6) |
|
|
806 | (1) |
|
|
807 | (4) |
10 Appendix |
|
811 | (32) |
|
|
811 | (1) |
|
10.2 Basics of matrix algebra |
|
|
811 | (7) |
|
10.2.1 Preliminaries and matrix inverse |
|
|
812 | (3) |
|
|
815 | (1) |
|
10.2.3 Partition matrices |
|
|
816 | (2) |
|
10.3 Eigenvalues and eigenvectors |
|
|
818 | (4) |
|
10.3.1 Jordan spectral matrix decomposition |
|
|
819 | (1) |
|
10.3.2 SVD: Singular value decomposition of a rectangular matrix |
|
|
820 | (2) |
|
10.4 Quadratic forms and positive definite matrices |
|
|
822 | (4) |
|
|
822 | (1) |
|
10.4.2 Positive and nonnegative definite matrices |
|
|
823 | (3) |
|
10.5 Vector and matrix calculus |
|
|
826 | (3) |
|
10.5.1 Differentiation of a scalar-valued function with respect to a vector |
|
|
826 | (1) |
|
10.5.2 Differentiation of a vector-valued function with respect to a vector |
|
|
827 | (1) |
|
|
828 | (1) |
|
|
828 | (1) |
|
|
829 | (14) |
|
10.6.1 Convex and concave functions |
|
|
830 | (1) |
|
10.6.2 Criteria for unconstrained minimization |
|
|
831 | (4) |
|
10.6.3 Gradient algorithms |
|
|
835 | (3) |
|
10.6.4 Constrained optimization: Lagrange multiplier technique |
|
|
838 | (5) |
Bibliography |
|
843 | (8) |
Index |
|
851 | |