Preface to the Third Edition |
|
xvii | |
Preface to the Second Edition |
|
xix | |
Preface to the First Edition |
|
xxiii | |
Part I Introduction to Exploratory Data Analysis |
|
|
Chapter 1 Introduction to Exploratory Data Analysis |
|
|
|
1.1 What is Exploratory Data Analysis |
|
|
3 | (3) |
|
|
6 | (2) |
|
1.3 A Few Words about Notation |
|
|
8 | (1) |
|
1.4 Data Sets Used in the Book |
|
|
9 | (11) |
|
1.4.1 Unstructured Text Documents |
|
|
9 | (3) |
|
1.4.2 Gene Expression Data |
|
|
12 | (6) |
|
|
18 | (1) |
|
1.4.4 Software Inspection |
|
|
19 | (1) |
|
|
20 | (5) |
|
1.5.1 Power Transformations |
|
|
21 | (1) |
|
|
22 | (2) |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (4) |
Part II EDA as Pattern Discovery |
|
|
Chapter 2 Dimensionality Reduction - Linear Methods |
|
|
|
|
31 | (2) |
|
2.2 Principal Component Analysis - PCA |
|
|
33 | (9) |
|
2.2.1 PCA Using the Sample Covariance Matrix |
|
|
34 | (3) |
|
2.2.2 PCA Using the Sample Correlation Matrix |
|
|
37 | (1) |
|
2.2.3 How Many Dimensions Should We Keep? |
|
|
38 | (4) |
|
2.3 Singular Value Decomposition - SVD |
|
|
42 | (5) |
|
2.4 Nonnegative Matrix Factorization |
|
|
47 | (4) |
|
|
51 | (5) |
|
2.6 Fisher's Linear Discriminant |
|
|
56 | (5) |
|
|
61 | (4) |
|
2.8 Intrinsic Dimensionality |
|
|
65 | (14) |
|
2.8.1 Nearest Neighbor Approach |
|
|
67 | (4) |
|
2.8.2 Correlation Dimension |
|
|
71 | (1) |
|
2.8.3 Maximum Likelihood Approach |
|
|
72 | (2) |
|
2.8.4 Estimation Using Packing Numbers |
|
|
74 | (2) |
|
2.8.5 Estimation of Local Dimension |
|
|
76 | (3) |
|
2.9 Summary and Further Reading |
|
|
79 | (2) |
|
|
81 | (4) |
|
Chapter 3 Dimensionality Reduction - Nonlinear Methods |
|
|
|
3.1 Multidimensional Scaling - MDS |
|
|
85 | (20) |
|
|
87 | (10) |
|
|
97 | (8) |
|
|
105 | (9) |
|
3.2.1 Locally Linear Embedding |
|
|
105 | (2) |
|
3.2.2 Isometric Feature Mapping - ISOMAP |
|
|
107 | (2) |
|
|
109 | (5) |
|
3.3 Artificial Neural Network Approaches |
|
|
114 | (17) |
|
3.3.1 Self-Organizing Maps |
|
|
114 | (3) |
|
3.3.2 Generative Topographic Maps |
|
|
117 | (5) |
|
3.3.3 Curvilinear Component Analysis |
|
|
122 | (5) |
|
|
127 | (4) |
|
3.4 Stochastic Neighbor Embedding |
|
|
131 | (4) |
|
3.5 Summary and Further Reading |
|
|
135 | (1) |
|
|
136 | (4) |
|
|
|
|
140 | (6) |
|
4.1.1 Torus Winding Method |
|
|
141 | (2) |
|
|
143 | (3) |
|
|
146 | (2) |
|
|
148 | (8) |
|
4.4 Projection Pursuit Indexes |
|
|
156 | (5) |
|
4.4.1 Posse Chi-Square Index |
|
|
156 | (3) |
|
|
159 | (2) |
|
4.5 Independent Component Analysis |
|
|
161 | (4) |
|
4.6 Summary and Further Reading |
|
|
165 | (1) |
|
|
166 | (3) |
|
Chapter 5 Finding Clusters |
|
|
|
|
169 | (2) |
|
|
171 | (6) |
|
5.3 Optimization Methods - k-Means |
|
|
177 | (4) |
|
|
181 | (4) |
|
|
185 | (11) |
|
5.5.1 Nonnegative Matrix Factorization - Revisited |
|
|
187 | (4) |
|
5.5.2 Probabilistic Latent Semantic Analysis |
|
|
191 | (5) |
|
5.6 Minimum Spanning Trees and Clustering |
|
|
196 | (8) |
|
|
196 | (3) |
|
5.6.2 Minimum Spanning Tree Clustering |
|
|
199 | (5) |
|
5.7 Evaluating the Clusters |
|
|
204 | (26) |
|
|
205 | (2) |
|
5.7.2 Cophenetic Correlation |
|
|
207 | (1) |
|
|
208 | (3) |
|
|
211 | (2) |
|
|
213 | (6) |
|
5.7.6 Cluster Validity Indices |
|
|
219 | (11) |
|
5.8 Summary and Further Reading |
|
|
230 | (2) |
|
|
232 | (5) |
|
Chapter 6 Model-Based Clustering |
|
|
|
6.1 Overview of Model-Based Clustering |
|
|
237 | (3) |
|
|
240 | (9) |
|
6.2.1 Multivariate Finite Mixtures |
|
|
242 | (1) |
|
6.2.2 Component Models - Constraining the Covariances |
|
|
243 | (6) |
|
6.3 Expectation-Maximization Algorithm |
|
|
249 | (5) |
|
6.4 Hierarchical Agglomerative Model-Based Clustering |
|
|
254 | (2) |
|
6.5 Model-Based Clustering |
|
|
256 | (7) |
|
6.6 MBC for Density Estimation and Discriminant Analysis |
|
|
263 | (8) |
|
6.6.1 Introduction to Pattern Recognition |
|
|
263 | (1) |
|
6.6.2 Bayes Decision Theory |
|
|
264 | (3) |
|
6.6.3 Estimating Probability Densities with MBC |
|
|
267 | (4) |
|
6.7 Generating Random Variables from a Mixture Model |
|
|
271 | (2) |
|
6.8 Summary and Further Reading |
|
|
273 | (3) |
|
|
276 | (3) |
|
Chapter 7 Smoothing Scatterplots |
|
|
|
|
279 | (1) |
|
|
280 | (11) |
|
|
291 | (2) |
|
7.4 Residuals and Diagnostics with Loess |
|
|
293 | (8) |
|
|
293 | (4) |
|
|
297 | (3) |
|
7.4.3 Loess Envelopes - Upper and Lower Smooths |
|
|
300 | (1) |
|
|
301 | (12) |
|
7.5.1 Regression with Splines |
|
|
302 | (2) |
|
|
304 | (6) |
|
7.5.3 Smoothing Splines for Uniformly Spaced Data |
|
|
310 | (3) |
|
7.6 Choosing the Smoothing Parameter |
|
|
313 | (4) |
|
7.7 Bivariate Distribution Smooths |
|
|
317 | (6) |
|
7.7.1 Pairs of Middle Smoothings |
|
|
317 | (2) |
|
|
319 | (4) |
|
7.8 Curve Fitting Toolbox |
|
|
323 | (2) |
|
7.9 Summary and Further Reading |
|
|
325 | (1) |
|
|
326 | (7) |
Part III Graphical Methods for EDA |
|
|
Chapter 8 Visualizing Clusters |
|
|
|
|
333 | (2) |
|
|
335 | (3) |
|
|
338 | (6) |
|
|
344 | (5) |
|
|
349 | (6) |
|
8.6 Summary and Further Reading |
|
|
355 | (1) |
|
|
356 | (3) |
|
Chapter 9 Distribution Shapes |
|
|
|
|
359 | (9) |
|
9.1.1 Univariate Histograms |
|
|
359 | (7) |
|
9.1.2 Bivariate Histograms |
|
|
366 | (2) |
|
|
368 | (6) |
|
9.2.1 Univariate Kernel Density Estimation |
|
|
369 | (2) |
|
9.2.2 Multivariate Kernel Density Estimation |
|
|
371 | (3) |
|
|
374 | (16) |
|
|
374 | (6) |
|
9.3.2 Variations of the Basic Boxplot |
|
|
380 | (3) |
|
|
383 | (2) |
|
|
385 | (3) |
|
|
388 | (2) |
|
|
390 | (9) |
|
|
392 | (1) |
|
9.4.2 Quantile-Quantile Plot |
|
|
393 | (4) |
|
|
397 | (2) |
|
|
399 | (1) |
|
|
400 | (5) |
|
9.7 Summary and Further Reading |
|
|
405 | (1) |
|
|
405 | (4) |
|
Chapter 10 Multivariate Visualization |
|
|
|
|
409 | (1) |
|
|
410 | (8) |
|
10.2.1 2-D and 3-D Scatterplots |
|
|
412 | (3) |
|
10.2.2 Scatterplot Matrices |
|
|
415 | (1) |
|
10.2.3 Scatterplots with Hexagonal Binning |
|
|
416 | (2) |
|
|
418 | (10) |
|
10.3.1 Identification of Data |
|
|
420 | (2) |
|
|
422 | (3) |
|
|
425 | (3) |
|
|
428 | (3) |
|
|
431 | (5) |
|
|
431 | (1) |
|
10.5.2 Multiway Dot Chart |
|
|
432 | (4) |
|
10.6 Plotting Points as Curves |
|
|
436 | (11) |
|
10.6.1 Parallel Coordinate Plots |
|
|
437 | (2) |
|
|
439 | (4) |
|
|
443 | (1) |
|
10.6.4 More Plot Matrices |
|
|
444 | (3) |
|
10.7 Data Tours Revisited |
|
|
447 | (5) |
|
|
448 | (1) |
|
|
449 | (3) |
|
|
452 | (3) |
|
10.9 Summary and Further Reading |
|
|
455 | (2) |
|
|
457 | (5) |
|
Chapter 11 Visualizing Categorical Data |
|
|
|
11.1 Discrete Distributions |
|
|
462 | (5) |
|
11.1.1 Binomial Distribution |
|
|
462 | (2) |
|
11.1.2 Poisson Distribution |
|
|
464 | (3) |
|
11.2 Exploring Distribution Shapes |
|
|
467 | (12) |
|
|
467 | (2) |
|
|
469 | (2) |
|
11.2.3 Extensions of the Poissonness Plot |
|
|
471 | (5) |
|
|
476 | (3) |
|
|
479 | (19) |
|
|
481 | (2) |
|
|
483 | (3) |
|
|
486 | (3) |
|
|
489 | (1) |
|
|
490 | (3) |
|
|
493 | (5) |
|
11.4 Summary and Further Reading |
|
|
498 | (2) |
|
|
500 | (3) |
Appendix A Proximity Measures |
|
|
|
503 | (5) |
|
|
504 | (2) |
|
A.1.2 Similarity Measures |
|
|
506 | (1) |
|
A.1.3 Similarity Measures for Binary Data |
|
|
506 | (1) |
|
A.1.4 Dissimilarities for Probability Density Functions |
|
|
507 | (1) |
|
|
508 | (1) |
|
|
509 | (2) |
Appendix B Software Resources for EDA |
|
|
|
511 | (4) |
|
B.2 Other Programs for EDA |
|
|
515 | (1) |
|
|
516 | (1) |
Appendix C Description of Data Sets |
|
517 | (6) |
Appendix D MATLAB® Basics |
|
|
|
523 | (2) |
|
D.2 Getting Help and Other Documentation |
|
|
525 | (1) |
|
D.3 Data Import and Export |
|
|
526 | (3) |
|
D.3.1 Data Import and Export in Base MATLAB® |
|
|
526 | (2) |
|
D.3.2 Data Import and Export with the Statistics Toolbox |
|
|
528 | (1) |
|
|
529 | (6) |
|
D.4.1 Data Objects in Base MATLAB® |
|
|
529 | (3) |
|
D.4.2 Accessing Data Elements |
|
|
532 | (3) |
|
D.4.3 Object-Oriented Programming |
|
|
535 | (1) |
|
|
535 | (5) |
|
D.5.1 File and Workspace Management |
|
|
536 | (1) |
|
|
537 | (2) |
|
D.5.3 Functions in MATLAB® |
|
|
539 | (1) |
|
|
540 | (7) |
|
|
540 | (3) |
|
|
543 | (1) |
|
|
544 | (1) |
|
|
545 | (1) |
|
|
545 | (2) |
|
D.7 Summary and Further Reading |
|
|
547 | (4) |
References |
|
551 | (24) |
Author Index |
|
575 | (8) |
Subject Index |
|
583 | |