Muutke küpsiste eelistusi

E-raamat: Exploratory Data Analysis with MATLAB

, (U.S. Bureau of Labor Statistics, Washington, DC, USA), (U.S. Bureau of Labor Statistics, Washington, DC, USA)
  • Formaat - PDF+DRM
  • Hind: 59,79 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Praise for the Second Edition: "The authors present an intuitive and easy-to-read book. accompanied by many examples, proposed exercises, good references, and comprehensive appendices that initiate the reader unfamiliar with MATLAB."Adolfo Alvarez Pinto, International Statistical Review

"Practitioners of EDA who use MATLAB will want a copy of this book. The authors have done a great service by bringing together so many EDA routines, but their main accomplishment in this dynamic text is providing the understanding and tools to do EDA.

David A Huckaby, MAA Reviews

Exploratory Data Analysis (EDA) is an important part of the data analysis process. The methods presented in this text are ones that should be in the toolkit of every data scientist. As computational sophistication has increased and data sets have grown in size and complexity, EDA has become an even more important process for visualizing and summarizing data before making assumptions to generate hypotheses and models.

Exploratory Data Analysis with MATLAB, Third Edition presents EDA methods from a computational perspective and uses numerous examples and applications to show how the methods are used in practice. The authors use MATLAB code, pseudo-code, and algorithm descriptions to illustrate the concepts. The MATLAB code for examples, data sets, and the EDA Toolbox are available for download on the books website.

New to the Third Edition











Random projections and estimating local intrinsic dimensionality





Deep learning autoencoders and stochastic neighbor embedding





Minimum spanning tree and additional cluster validity indices





Kernel density estimation





Plots for visualizing data distributions, such as beanplots and violin plots





A chapter on visualizing categorical data
Preface to the Third Edition xvii
Preface to the Second Edition xix
Preface to the First Edition xxiii
Part I Introduction to Exploratory Data Analysis
Chapter 1 Introduction to Exploratory Data Analysis
1.1 What is Exploratory Data Analysis
3(3)
1.2 Overview of the Text
6(2)
1.3 A Few Words about Notation
8(1)
1.4 Data Sets Used in the Book
9(11)
1.4.1 Unstructured Text Documents
9(3)
1.4.2 Gene Expression Data
12(6)
1.4.3 Oronsay Data Set
18(1)
1.4.4 Software Inspection
19(1)
1.5 Transforming Data
20(5)
1.5.1 Power Transformations
21(1)
1.5.2 Standardization
22(2)
1.5.3 Sphering the Data
24(1)
1.6 Further Reading
25(2)
Exercises
27(4)
Part II EDA as Pattern Discovery
Chapter 2 Dimensionality Reduction - Linear Methods
2.1 Introduction
31(2)
2.2 Principal Component Analysis - PCA
33(9)
2.2.1 PCA Using the Sample Covariance Matrix
34(3)
2.2.2 PCA Using the Sample Correlation Matrix
37(1)
2.2.3 How Many Dimensions Should We Keep?
38(4)
2.3 Singular Value Decomposition - SVD
42(5)
2.4 Nonnegative Matrix Factorization
47(4)
2.5 Factor Analysis
51(5)
2.6 Fisher's Linear Discriminant
56(5)
2.7 Random Projections
61(4)
2.8 Intrinsic Dimensionality
65(14)
2.8.1 Nearest Neighbor Approach
67(4)
2.8.2 Correlation Dimension
71(1)
2.8.3 Maximum Likelihood Approach
72(2)
2.8.4 Estimation Using Packing Numbers
74(2)
2.8.5 Estimation of Local Dimension
76(3)
2.9 Summary and Further Reading
79(2)
Exercises
81(4)
Chapter 3 Dimensionality Reduction - Nonlinear Methods
3.1 Multidimensional Scaling - MDS
85(20)
3.1.1 Metric MDS
87(10)
3.1.2 Nonmetric MDS
97(8)
3.2 Manifold Learning
105(9)
3.2.1 Locally Linear Embedding
105(2)
3.2.2 Isometric Feature Mapping - ISOMAP
107(2)
3.2.3 Hessian Eigenmaps
109(5)
3.3 Artificial Neural Network Approaches
114(17)
3.3.1 Self-Organizing Maps
114(3)
3.3.2 Generative Topographic Maps
117(5)
3.3.3 Curvilinear Component Analysis
122(5)
3.3.4 Autoencoders
127(4)
3.4 Stochastic Neighbor Embedding
131(4)
3.5 Summary and Further Reading
135(1)
Exercises
136(4)
Chapter 4 Data Tours
4.1 Grand Tour
140(6)
4.1.1 Torus Winding Method
141(2)
4.1.2 Pseudo Grand Tour
143(3)
4.2 Interpolation Tours
146(2)
4.3 Projection Pursuit
148(8)
4.4 Projection Pursuit Indexes
156(5)
4.4.1 Posse Chi-Square Index
156(3)
4.4.2 Moment Index
159(2)
4.5 Independent Component Analysis
161(4)
4.6 Summary and Further Reading
165(1)
Exercises
166(3)
Chapter 5 Finding Clusters
5.1 Introduction
169(2)
5.2 Hierarchical Methods
171(6)
5.3 Optimization Methods - k-Means
177(4)
5.4 Spectral Clustering
181(4)
5.5 Document Clustering
185(11)
5.5.1 Nonnegative Matrix Factorization - Revisited
187(4)
5.5.2 Probabilistic Latent Semantic Analysis
191(5)
5.6 Minimum Spanning Trees and Clustering
196(8)
5.6.1 Definitions
196(3)
5.6.2 Minimum Spanning Tree Clustering
199(5)
5.7 Evaluating the Clusters
204(26)
5.7.1 Rand Index
205(2)
5.7.2 Cophenetic Correlation
207(1)
5.7.3 Upper Tail Rule
208(3)
5.7.4 Silhouette Plot
211(2)
5.7.5 Gap Statistic
213(6)
5.7.6 Cluster Validity Indices
219(11)
5.8 Summary and Further Reading
230(2)
Exercises
232(5)
Chapter 6 Model-Based Clustering
6.1 Overview of Model-Based Clustering
237(3)
6.2 Finite Mixtures
240(9)
6.2.1 Multivariate Finite Mixtures
242(1)
6.2.2 Component Models - Constraining the Covariances
243(6)
6.3 Expectation-Maximization Algorithm
249(5)
6.4 Hierarchical Agglomerative Model-Based Clustering
254(2)
6.5 Model-Based Clustering
256(7)
6.6 MBC for Density Estimation and Discriminant Analysis
263(8)
6.6.1 Introduction to Pattern Recognition
263(1)
6.6.2 Bayes Decision Theory
264(3)
6.6.3 Estimating Probability Densities with MBC
267(4)
6.7 Generating Random Variables from a Mixture Model
271(2)
6.8 Summary and Further Reading
273(3)
Exercises
276(3)
Chapter 7 Smoothing Scatterplots
7.1 Introduction
279(1)
7.2 Loess
280(11)
7.3 Robust Loess
291(2)
7.4 Residuals and Diagnostics with Loess
293(8)
7.4.1 Residual Plots
293(4)
7.4.2 Spread Smooth
297(3)
7.4.3 Loess Envelopes - Upper and Lower Smooths
300(1)
7.5 Smoothing Splines
301(12)
7.5.1 Regression with Splines
302(2)
7.5.2 Smoothing Splines
304(6)
7.5.3 Smoothing Splines for Uniformly Spaced Data
310(3)
7.6 Choosing the Smoothing Parameter
313(4)
7.7 Bivariate Distribution Smooths
317(6)
7.7.1 Pairs of Middle Smoothings
317(2)
7.7.2 Polar Smoothing
319(4)
7.8 Curve Fitting Toolbox
323(2)
7.9 Summary and Further Reading
325(1)
Exercises
326(7)
Part III Graphical Methods for EDA
Chapter 8 Visualizing Clusters
8.1 Dendrogram
333(2)
8.2 Treemaps
335(3)
8.3 Rectangle Plots
338(6)
8.4 ReClus Plots
344(5)
8.5 Data Image
349(6)
8.6 Summary and Further Reading
355(1)
Exercises
356(3)
Chapter 9 Distribution Shapes
9.1 Histograms
359(9)
9.1.1 Univariate Histograms
359(7)
9.1.2 Bivariate Histograms
366(2)
9.2 Kernel Density
368(6)
9.2.1 Univariate Kernel Density Estimation
369(2)
9.2.2 Multivariate Kernel Density Estimation
371(3)
9.3 Boxplots
374(16)
9.3.1 The Basic Boxplot
374(6)
9.3.2 Variations of the Basic Boxplot
380(3)
9.3.3 Violin Plots
383(2)
9.3.4 Beeswarm Plot
385(3)
9.3.5 Beanplot
388(2)
9.4 Quantile Plots
390(9)
9.4.1 Probability Plots
392(1)
9.4.2 Quantile-Quantile Plot
393(4)
9.4.3 Quantile Plot
397(2)
9.5 Bagplots
399(1)
9.6 Rangefinder Boxplot
400(5)
9.7 Summary and Further Reading
405(1)
Exercises
405(4)
Chapter 10 Multivariate Visualization
10.1 Glyph Plots
409(1)
10.2 Scatterplots
410(8)
10.2.1 2-D and 3-D Scatterplots
412(3)
10.2.2 Scatterplot Matrices
415(1)
10.2.3 Scatterplots with Hexagonal Binning
416(2)
10.3 Dynamic Graphics
418(10)
10.3.1 Identification of Data
420(2)
10.3.2 Linking
422(3)
10.3.3 Brushing
425(3)
10.4 Coplots
428(3)
10.5 Dot Charts
431(5)
10.5.1 Basic Dot Chart
431(1)
10.5.2 Multiway Dot Chart
432(4)
10.6 Plotting Points as Curves
436(11)
10.6.1 Parallel Coordinate Plots
437(2)
10.6.2 Andrews' Curves
439(4)
10.6.3 Andrews' Images
443(1)
10.6.4 More Plot Matrices
444(3)
10.7 Data Tours Revisited
447(5)
10.7.1 Grand Tour
448(1)
10.7.2 Permutation Tour
449(3)
10.8 Biplots
452(3)
10.9 Summary and Further Reading
455(2)
Exercises
457(5)
Chapter 11 Visualizing Categorical Data
11.1 Discrete Distributions
462(5)
11.1.1 Binomial Distribution
462(2)
11.1.2 Poisson Distribution
464(3)
11.2 Exploring Distribution Shapes
467(12)
11.2.1 Poissonness Plot
467(2)
11.2.2 Binomialness Plot
469(2)
11.2.3 Extensions of the Poissonness Plot
471(5)
11.2.4 Hanging Rootogram
476(3)
11.3 Contingency Tables
479(19)
11.3.1 Background
481(2)
11.3.2 Bar Plots
483(3)
11.3.3 Spine Plots
486(3)
11.3.4 Mosaic Plots
489(1)
11.3.5 Sieve Plots
490(3)
11.3.6 Log Odds Plot
493(5)
11.4 Summary and Further Reading
498(2)
Exercises
500(3)
Appendix A Proximity Measures
A.1 Definitions
503(5)
A.1.1 Dissimilarities
504(2)
A.1.2 Similarity Measures
506(1)
A.1.3 Similarity Measures for Binary Data
506(1)
A.1.4 Dissimilarities for Probability Density Functions
507(1)
A.2 Transformations
508(1)
A.3 Further Reading
509(2)
Appendix B Software Resources for EDA
B.1 MATLAB Programs
511(4)
B.2 Other Programs for EDA
515(1)
B.3 EDA Toolbox
516(1)
Appendix C Description of Data Sets 517(6)
Appendix D MATLAB® Basics
D.1 Desktop Environment
523(2)
D.2 Getting Help and Other Documentation
525(1)
D.3 Data Import and Export
526(3)
D.3.1 Data Import and Export in Base MATLAB®
526(2)
D.3.2 Data Import and Export with the Statistics Toolbox
528(1)
D.4 Data in MATLAB®
529(6)
D.4.1 Data Objects in Base MATLAB®
529(3)
D.4.2 Accessing Data Elements
532(3)
D.4.3 Object-Oriented Programming
535(1)
D.5 Workspace and Syntax
535(5)
D.5.1 File and Workspace Management
536(1)
D.5.2 Syntax in MATLAB®
537(2)
D.5.3 Functions in MATLAB®
539(1)
D.6 Basic Plot Functions
540(7)
D.6.1 Plotting 2D Data
540(3)
D.6.2 Plotting 3D Data
543(1)
D.6.3 Scatterplots
544(1)
D.6.4 Scatterplot Matrix
545(1)
D.6.5 GUIs for Graphics
545(2)
D.7 Summary and Further Reading
547(4)
References 551(24)
Author Index 575(8)
Subject Index 583
Wendy L. Martinez is a mathematical statistician with the U.S. Bureau of Labor Statistics. She is a fellow of the American Statistical Association, a co-author of several popular Chapman & Hall/CRC books, and a MATLAB® user for more than 20 years. Her research interests include text data mining, probability density estimation, signal processing, scientific visualization, and statistical pattern recognition. She earned an M.S. in aerospace engineering from George Washington University and a Ph.D. in computational sciences and informatics from George Mason University.



Angel R. Martinez is fully retired after a long career with the U.S. federal government and as an adjunct professor at Strayer University, where he taught undergraduate and graduate courses in statistics and mathematics. Before retiring from government service, he worked for the U.S. Navy as an operations research analyst and a computer scientist. He earned an M.S. in systems engineering from the Virginia Polytechnic Institute and State University and a Ph.D. in computational sciences and informatics from George Mason University.



Since 1984, Jeffrey L. Solka has been working in statistical pattern recognition for the Department of the Navy. He has published over 120 journal, conference, and technical papers; has won numerous awards; and holds 4 patents. He earned an M.S. in mathematics from James Madison University, an M.S. in physics from Virginia Polytechnic Institute and State University, and a Ph.D. in computational sciences and informatics from George Mason University.