Klienditugi: 7440010 (E-R 10-18)

E-raamat: Exploratory Data Analysis with MATLAB

3.86/5 (7 hinnangut Goodreads-ist)

Jeffrey Solka, Angel R. Martinez (U.S. Bureau of Labor Statistics, Washington, DC, USA), Wendy L. Martinez (U.S. Bureau of Labor Statistics, Washington, DC, USA)

Formaat: 616 pages
Sari: Chapman & Hall/CRC Computer Science & Data Analysis
Ilmumisaeg: 07-Aug-2017
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781498776073

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 59,79 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 616 pages
Sari: Chapman & Hall/CRC Computer Science & Data Analysis
Ilmumisaeg: 07-Aug-2017
Kirjastus: Chapman & Hall/CRC
Keel: eng
ISBN-13: 9781498776073

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Praise for the Second Edition: "The authors present an intuitive and easy-to-read book. accompanied by many examples, proposed exercises, good references, and comprehensive appendices that initiate the reader unfamiliar with MATLAB."Adolfo Alvarez Pinto, International Statistical Review

"Practitioners of EDA who use MATLAB will want a copy of this book. The authors have done a great service by bringing together so many EDA routines, but their main accomplishment in this dynamic text is providing the understanding and tools to do EDA.

David A Huckaby, MAA Reviews

Exploratory Data Analysis (EDA) is an important part of the data analysis process. The methods presented in this text are ones that should be in the toolkit of every data scientist. As computational sophistication has increased and data sets have grown in size and complexity, EDA has become an even more important process for visualizing and summarizing data before making assumptions to generate hypotheses and models.

Exploratory Data Analysis with MATLAB, Third Edition presents EDA methods from a computational perspective and uses numerous examples and applications to show how the methods are used in practice. The authors use MATLAB code, pseudo-code, and algorithm descriptions to illustrate the concepts. The MATLAB code for examples, data sets, and the EDA Toolbox are available for download on the books website.

New to the Third Edition

Random projections and estimating local intrinsic dimensionality

Deep learning autoencoders and stochastic neighbor embedding

Minimum spanning tree and additional cluster validity indices

Kernel density estimation

Plots for visualizing data distributions, such as beanplots and violin plots

A chapter on visualizing categorical data

Preface to the Third Edition

xvii

Preface to the Second Edition

xix

Preface to the First Edition

xxiii

Part I Introduction to Exploratory Data Analysis

Chapter 1 Introduction to Exploratory Data Analysis

1.1 What is Exploratory Data Analysis

(3)

1.2 Overview of the Text

(2)

1.3 A Few Words about Notation

(1)

1.4 Data Sets Used in the Book

(11)

1.4.1 Unstructured Text Documents

(3)

1.4.2 Gene Expression Data

(6)

1.4.3 Oronsay Data Set

(1)

1.4.4 Software Inspection

(1)

1.5 Transforming Data

(5)

1.5.1 Power Transformations

(1)

1.5.2 Standardization

(2)

1.5.3 Sphering the Data

(1)

1.6 Further Reading

(2)

Exercises

(4)

Part II EDA as Pattern Discovery

Chapter 2 Dimensionality Reduction - Linear Methods

2.1 Introduction

(2)

2.2 Principal Component Analysis - PCA

(9)

2.2.1 PCA Using the Sample Covariance Matrix

(3)

2.2.2 PCA Using the Sample Correlation Matrix

(1)

2.2.3 How Many Dimensions Should We Keep?

(4)

2.3 Singular Value Decomposition - SVD

(5)

2.4 Nonnegative Matrix Factorization

(4)

2.5 Factor Analysis

(5)

2.6 Fisher's Linear Discriminant

(5)

2.7 Random Projections

(4)

2.8 Intrinsic Dimensionality

(14)

2.8.1 Nearest Neighbor Approach

(4)

2.8.2 Correlation Dimension

(1)

2.8.3 Maximum Likelihood Approach

(2)

2.8.4 Estimation Using Packing Numbers

(2)

2.8.5 Estimation of Local Dimension

(3)

2.9 Summary and Further Reading

(2)

Exercises

(4)

Chapter 3 Dimensionality Reduction - Nonlinear Methods

3.1 Multidimensional Scaling - MDS

(20)

3.1.1 Metric MDS

(10)

3.1.2 Nonmetric MDS

(8)

3.2 Manifold Learning

105

(9)

3.2.1 Locally Linear Embedding

105

(2)

3.2.2 Isometric Feature Mapping - ISOMAP

107

(2)

3.2.3 Hessian Eigenmaps

109

(5)

3.3 Artificial Neural Network Approaches

114

(17)

3.3.1 Self-Organizing Maps

114

(3)

3.3.2 Generative Topographic Maps

117

(5)

3.3.3 Curvilinear Component Analysis

122

(5)

3.3.4 Autoencoders

127

(4)

3.4 Stochastic Neighbor Embedding

131

(4)

3.5 Summary and Further Reading

135

(1)

Exercises

136

(4)

Chapter 4 Data Tours

4.1 Grand Tour

140

(6)

4.1.1 Torus Winding Method

141

(2)

4.1.2 Pseudo Grand Tour

143

(3)

4.2 Interpolation Tours

146

(2)

4.3 Projection Pursuit

148

(8)

4.4 Projection Pursuit Indexes

156

(5)

4.4.1 Posse Chi-Square Index

156

(3)

4.4.2 Moment Index

159

(2)

4.5 Independent Component Analysis

161

(4)

4.6 Summary and Further Reading

165

(1)

Exercises

166

(3)

Chapter 5 Finding Clusters

5.1 Introduction

169

(2)

5.2 Hierarchical Methods

171

(6)

5.3 Optimization Methods - k-Means

177

(4)

5.4 Spectral Clustering

181

(4)

5.5 Document Clustering

185

(11)

5.5.1 Nonnegative Matrix Factorization - Revisited

187

(4)

5.5.2 Probabilistic Latent Semantic Analysis

191

(5)

5.6 Minimum Spanning Trees and Clustering

196

(8)

5.6.1 Definitions

196

(3)

5.6.2 Minimum Spanning Tree Clustering

199

(5)

5.7 Evaluating the Clusters

204

(26)

5.7.1 Rand Index

205

(2)

5.7.2 Cophenetic Correlation

207

(1)

5.7.3 Upper Tail Rule

208

(3)

5.7.4 Silhouette Plot

211

(2)

5.7.5 Gap Statistic

213

(6)

5.7.6 Cluster Validity Indices

219

(11)

5.8 Summary and Further Reading

230

(2)

Exercises

232

(5)

Chapter 6 Model-Based Clustering

6.1 Overview of Model-Based Clustering

237

(3)

6.2 Finite Mixtures

240

(9)

6.2.1 Multivariate Finite Mixtures

242

(1)

6.2.2 Component Models - Constraining the Covariances

243

(6)

6.3 Expectation-Maximization Algorithm

249

(5)

6.4 Hierarchical Agglomerative Model-Based Clustering

254

(2)

6.5 Model-Based Clustering

256

(7)

6.6 MBC for Density Estimation and Discriminant Analysis

263

(8)

6.6.1 Introduction to Pattern Recognition

263

(1)

6.6.2 Bayes Decision Theory

264

(3)

6.6.3 Estimating Probability Densities with MBC

267

(4)

6.7 Generating Random Variables from a Mixture Model

271

(2)

6.8 Summary and Further Reading

273

(3)

Exercises

276

(3)

Chapter 7 Smoothing Scatterplots

7.1 Introduction

279

(1)

7.2 Loess

280

(11)

7.3 Robust Loess

291

(2)

7.4 Residuals and Diagnostics with Loess

293

(8)

7.4.1 Residual Plots

293

(4)

7.4.2 Spread Smooth

297

(3)

7.4.3 Loess Envelopes - Upper and Lower Smooths

300

(1)

7.5 Smoothing Splines

301

(12)

7.5.1 Regression with Splines

302

(2)

7.5.2 Smoothing Splines

304

(6)

7.5.3 Smoothing Splines for Uniformly Spaced Data

310

(3)

7.6 Choosing the Smoothing Parameter

313

(4)

7.7 Bivariate Distribution Smooths

317

(6)

7.7.1 Pairs of Middle Smoothings

317

(2)

7.7.2 Polar Smoothing

319

(4)

7.8 Curve Fitting Toolbox

323

(2)

7.9 Summary and Further Reading

325

(1)

Exercises

326

(7)

Part III Graphical Methods for EDA

Chapter 8 Visualizing Clusters

8.1 Dendrogram

333

(2)

8.2 Treemaps

335

(3)

8.3 Rectangle Plots

338

(6)

8.4 ReClus Plots

344

(5)

8.5 Data Image

349

(6)

8.6 Summary and Further Reading

355

(1)

Exercises

356

(3)

Chapter 9 Distribution Shapes

9.1 Histograms

359

(9)

9.1.1 Univariate Histograms

359

(7)

9.1.2 Bivariate Histograms

366

(2)

9.2 Kernel Density

368

(6)

9.2.1 Univariate Kernel Density Estimation

369

(2)

9.2.2 Multivariate Kernel Density Estimation

371

(3)

9.3 Boxplots

374

(16)

9.3.1 The Basic Boxplot

374

(6)

9.3.2 Variations of the Basic Boxplot

380

(3)

9.3.3 Violin Plots

383

(2)

9.3.4 Beeswarm Plot

385

(3)

9.3.5 Beanplot

388

(2)

9.4 Quantile Plots

390

(9)

9.4.1 Probability Plots

392

(1)

9.4.2 Quantile-Quantile Plot

393

(4)

9.4.3 Quantile Plot

397

(2)

9.5 Bagplots

399

(1)

9.6 Rangefinder Boxplot

400

(5)

9.7 Summary and Further Reading

405

(1)

Exercises

405

(4)

Chapter 10 Multivariate Visualization

10.1 Glyph Plots

409

(1)

10.2 Scatterplots

410

(8)

10.2.1 2-D and 3-D Scatterplots

412

(3)

10.2.2 Scatterplot Matrices

415

(1)

10.2.3 Scatterplots with Hexagonal Binning

416

(2)

10.3 Dynamic Graphics

418

(10)

10.3.1 Identification of Data

420

(2)

10.3.2 Linking

422

(3)

10.3.3 Brushing

425

(3)

10.4 Coplots

428

(3)

10.5 Dot Charts

431

(5)

10.5.1 Basic Dot Chart

431

(1)

10.5.2 Multiway Dot Chart

432

(4)

10.6 Plotting Points as Curves

436

(11)

10.6.1 Parallel Coordinate Plots

437

(2)

10.6.2 Andrews' Curves

439

(4)

10.6.3 Andrews' Images

443

(1)

10.6.4 More Plot Matrices

444

(3)

10.7 Data Tours Revisited

447

(5)

10.7.1 Grand Tour

448

(1)

10.7.2 Permutation Tour

449

(3)

10.8 Biplots

452

(3)

10.9 Summary and Further Reading

455

(2)

Exercises

457

(5)

Chapter 11 Visualizing Categorical Data

11.1 Discrete Distributions

462

(5)

11.1.1 Binomial Distribution

462

(2)

11.1.2 Poisson Distribution

464

(3)

11.2 Exploring Distribution Shapes

467

(12)

11.2.1 Poissonness Plot

467

(2)

11.2.2 Binomialness Plot

469

(2)

11.2.3 Extensions of the Poissonness Plot

471

(5)

11.2.4 Hanging Rootogram

476

(3)

11.3 Contingency Tables

479

(19)

11.3.1 Background

481

(2)

11.3.2 Bar Plots

483

(3)

11.3.3 Spine Plots

486

(3)

11.3.4 Mosaic Plots

489

(1)

11.3.5 Sieve Plots

490

(3)

11.3.6 Log Odds Plot

493

(5)

11.4 Summary and Further Reading

498

(2)

Exercises

500

(3)

Appendix A Proximity Measures

A.1 Definitions

503

(5)

A.1.1 Dissimilarities

504

(2)

A.1.2 Similarity Measures

506

(1)

A.1.3 Similarity Measures for Binary Data

506

(1)

A.1.4 Dissimilarities for Probability Density Functions

507

(1)

A.2 Transformations

508

(1)

A.3 Further Reading

509

(2)

Appendix B Software Resources for EDA

B.1 MATLAB Programs

511

(4)

B.2 Other Programs for EDA

515

(1)

B.3 EDA Toolbox

516

(1)

Appendix C Description of Data Sets

517

(6)

Appendix D MATLAB® Basics

D.1 Desktop Environment

523

(2)

D.2 Getting Help and Other Documentation

525

(1)

D.3 Data Import and Export

526

(3)

D.3.1 Data Import and Export in Base MATLAB®

526

(2)

D.3.2 Data Import and Export with the Statistics Toolbox

528

(1)

D.4 Data in MATLAB®

529

(6)

D.4.1 Data Objects in Base MATLAB®

529

(3)

D.4.2 Accessing Data Elements

532

(3)

D.4.3 Object-Oriented Programming

535

(1)

D.5 Workspace and Syntax

535

(5)

D.5.1 File and Workspace Management

536

(1)

D.5.2 Syntax in MATLAB®

537

(2)

D.5.3 Functions in MATLAB®

539

(1)

D.6 Basic Plot Functions

540

(7)

D.6.1 Plotting 2D Data

540

(3)

D.6.2 Plotting 3D Data

543

(1)

D.6.3 Scatterplots

544

(1)

D.6.4 Scatterplot Matrix

545

(1)

D.6.5 GUIs for Graphics

545

(2)

D.7 Summary and Further Reading

547

(4)

References

551

(24)

Author Index

575

(8)

Subject Index

583

Wendy L. Martinez is a mathematical statistician with the U.S. Bureau of Labor Statistics. She is a fellow of the American Statistical Association, a co-author of several popular Chapman & Hall/CRC books, and a MATLAB® user for more than 20 years. Her research interests include text data mining, probability density estimation, signal processing, scientific visualization, and statistical pattern recognition. She earned an M.S. in aerospace engineering from George Washington University and a Ph.D. in computational sciences and informatics from George Mason University.

Angel R. Martinez is fully retired after a long career with the U.S. federal government and as an adjunct professor at Strayer University, where he taught undergraduate and graduate courses in statistics and mathematics. Before retiring from government service, he worked for the U.S. Navy as an operations research analyst and a computer scientist. He earned an M.S. in systems engineering from the Virginia Polytechnic Institute and State University and a Ph.D. in computational sciences and informatics from George Mason University.

Since 1984, Jeffrey L. Solka has been working in statistical pattern recognition for the Department of the Navy. He has published over 120 journal, conference, and technical papers; has won numerous awards; and holds 4 patents. He earned an M.S. in mathematics from James Madison University, an M.S. in physics from Virginia Polytechnic Institute and State University, and a Ph.D. in computational sciences and informatics from George Mason University.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814987760732e.html

Märksõnad:

E-raamat: Exploratory Data Analysis with MATLAB

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv