Muutke küpsiste eelistusi

Object Oriented Data Analysis [Kõva köide]

Teised raamatud teemal:
Teised raamatud teemal:
"Object Oriented Data Analysis (OODA) provides a useful general framework for the consideration of many types of Complex Data. It is deliberately intended to be particularly useful in the analysis of data in complicated situations which are typically noteasily represented as an unconstrained matrix of numbers"--

Object Oriented Data Analysis is a framework that facilitates inter-disciplinary research through new terminology for discussing the often many possible approaches to the analysis of complex data. Such data are naturally arising in a wide variety of areas. This book aims to provide ways of thinking that enable the making of sensible choices.

The main points are illustrated with many real data examples, based on the authors' personal experiences, which have motivated the invention of a wide array of analytic methods.

While the mathematics go far beyond the usual in statistics (including differential geometry and even topology), the book is aimed at accessibility by graduate students. There is deliberate focus on ideas over mathematical formulas.

J. S. Marron

is the Amos Hawley Distinguished Professor of Statistics, Professor of Biostatistics, Adjunct Professor of Computer Science, Faculty Member of the Bioinformatics and Computational Biology Curriculum and Research Member of the Lineberger Cancer Center and the Computational Medicine Program, at the University of North Carolina, Chapel Hill. Ian L. Dryden is Professor of Statistics in the School of Mathematical Sciences at the University of Nottingham, has served as Head of School, and is joint author of the acclaimed book Statistical Shape Analysis

.

"Lots of common-sense advice, a lot of informative graphs, and not too much theory…A breath of fresh air in an area where there can be a tendency to make the material overly technical." (John Kent, University of Leeds)

"We need this book badly in statistics." (Jim Ramsay, McGill University)

"The particular strength of this book is that is connects classical statistics, "classical" machine learning and statistics on non-Euclidean spaces with one another…This is a book I have been waiting for." (Stephan F. Huckemann, Georgia-Augusta-University Goettingen)

"An exciting and timely project highlighting the importance of non-Euclidean data across different scientific applications…Covers important and fast-developing topic areas that are important in many applications." (Anuj Srivastava, Florida State University)



Object Oriented Data Analysis (OODA) provides a useful general framework for the consideration of many types of Complex Data. It is deliberately intended to be particularly useful in the analysis of data in complicated situations which are typically not easily represented as an unconstrained matrix of numbers.

Arvustused

I wrote a report last year on an earlier draft of the book. I was enthusiastic then and I remain so. I can see the book being very popular both with graduate students and with more advanced researchers in disciplines such as Statistics, Biology, Computer Science and Engineering. There is lots of common sense advice, a lot of informative graphs, and not too much theoryI regard the manuscript as a breath of fresh air in an area where there can be a tendency to make the material overly technical." -John Kent, University of Leeds

"We need this book badly in statistics. This field, like most in the mathematical sciences tends to generate a literature that offers more and more exposition of research on less and less. The authors say this well in their first two pages, where they point out how much of statistical practice, literature and software assumes data in the form of a two-dimensional table. As in functional data analysis, the theoreticians will sniff and grouch about the lack of theory, but this quickly follows as they see the opportunities for more intense mathematical effort in the area. Both authors are high levels theoreticians in their own right." -Jim Ramsay, McGill University

"The chapters I have read are very informative and convey the deep insight of two senior specialists in the field. This is very valuable, especially for younger scientists that want to quickly grasp the fundamentals of the fieldThe particular strength of this book is that is connects classical statistics, "classical" machine learning and statistics on non-Euclidean spaces with one anotherThis is a book I have been waiting for and I love to read it in detail and profit from it for research and teaching, once it is finished." -Stephan F. Huckemann, Georgia-Augusta-University Goettingen

"This textbook represents an exciting and timely project highlighting the importance of non-Euclidean data across different scientific applications. It uses numerous examples to motivate this scientific approach, to engage researchers from different backgrounds and experiences. The textbook starts by introducing OODAoverview, contexts, and preliminaries. The main focus of the textbook is on statistical case studies driven by non-Euclidean data. The principal component analysis is frequently used as a primary data analysis tool, extracting interesting patterns from the data. Then the textbook moves into specific methods distance-based methods, visualizations, shapes, curves, and trees. In the last part, the manuscript focuses on pattern analysis techniques clustering, classification, smoothing, and asymptotics. I strongly recommend the publication of this textbook. This manuscript covers important and fast-developing topic areas that are important in many applications." -Anuj Srivastava, Florida State University

"(...) this monograph is destined, without doubt, to become the classic introductory text on OODA. Graduate students contemplating research in statistics and/or data science who have studied introductory courses in multivariate analysis and smoothing and robust methods will find it a gentle introduction, and also an inspiring one because it identifies many areas of the field that are in their infancy and which need to be more fully developed. It also provides great insight into how classical methods perform in not so standard setups, particularly for high dimensional data, and describes alternatives to them having enhanced performance. Its extensive bibliography constitutes an essential resource for those embarking on research into statistical and data science methodology for contexts involving complex data." -Arthur Pewsey, in Journal of the Royal Statistical Society, Series A, March 2022

"In this research monograph, two leading researchers, Marron and Dryden, provide a comprehensive overview of a field they have helped to build, which they term object oriented data analysis (OODA). [ ...] This work is a culmination of the authors last three decades of work and represents a welcome addition to the literature." -Debashis Ghosh, in International Statistical Review, September 2024

Preface xi
1 What Is OODA? 1(18)
1.1 Case Study: Curves as Data Objects
3(7)
1.2 Case Study: Shapes as Data Objects
10(9)
1.2.1 The Segmentation Challenge
10(2)
1.2.2 General Shape Representations
12(1)
1.2.3 Skeletal Shape Representations
13(2)
1.2.4 Bayes Segmentation via Principal Geodesic Analysis
15(4)
2 Breadth of OODA 19(12)
2.1 Amplitude and Phase Data Objects
19(4)
2.2 Tree-Structured Data Objects
23(2)
2.3 Sounds as Data Objects
25(3)
2.4 Images as Data Objects
28(3)
3 Data Object Definition 31(16)
3.1 OODA Foundations
31(8)
3.1.1 OODA Terminology
31(1)
3.1.2 Object and Feature Space Example
32(4)
3.1.3 Scree Plots
36(2)
3.1.4 Formalization of Modes of Variation
38(1)
3.2 Mathematical Notation
39(1)
3.3 Overview of Object and Feature Spaces
40(7)
3.3.1 Example: Probability Distributions as Data Objects
43(4)
4 Exploratory and Confirmatory Analyses 47(24)
4.1 Exploratory Analysis-Discover Structure in Data
47(16)
4.1.1 Example: Tilted Parabolas FDA
48(4)
4.1.2 Example: Twin Arches FDA
52(3)
4.1.3 Case Study: Lung Cancer Data
55(5)
4.1.4 Case Study: Pan-Cancer Data
60(3)
4.2 Confirmatory Analysis-Is It Really There?
63(6)
4.3 Further Major Statistical Tasks
69(2)
5 OODA Preprocessing 71(26)
5.1 Visualization of Marginal Distributions
71(14)
5.1.1 Case Study: Spanish Mortality Data
72(2)
5.1.2 Case Study: Drug Discovery Data
74(11)
5.2 Standardization-Appropriate Linear Scaling
85(6)
5.2.1 Example: Two Scale Curve Data
86(3)
5.2.2 Overview of Standardization
89(2)
5.3 Transformation-Appropriate Nonlinear Scaling
91(3)
5.4 Registration-Appropriate Alignment
94(3)
6 Data Visualization 97(28)
6.1 Heat-Map Views of Data Matrices
97(7)
6.2 Curve Views of Matrices and Modes of Variation
104(3)
6.3 Data Centering and Combined Views
107(9)
6.4 Scatterplot Matrix Views of Scores
116(4)
6.5 Alternatives to PCA Directions
120(5)
7 Distance Based Methods 125(22)
7.1 Frechet Centers In Metric Spaces
127(5)
7.2 Multi-Dimensional Scaling For Object Representation
132(4)
7.3 Important Distance Examples
136(11)
7.3.1 Conventional Norms
136(1)
7.3.2 Wasserstein Distances
137(2)
7.3.3 Procrustes Distances
139(2)
7.3.4 Generalized Procrustes Analysis
141(2)
7.3.5 Covariance Matrix Distances
143(4)
8 Manifold Data Analysis 147(28)
8.1 Directional Data
147(2)
8.2 Introduction to Shape Manifolds
149(2)
8.3 Statistical Analysis of Shapes
151(6)
8.4 Landmark Shapes
157(10)
8.4.1 Shape Tangent Space
160(1)
8.4.2 Case Study: Digit 3 Data
160(2)
8.4.3 Case Study: DNA Molecule Data
162(2)
8.4.4 Principal Nested Shape Spaces
164(2)
8.4.5 Size-and-shape space
166(1)
8.4.6 Further Methodology
167(1)
8.5 Central Limit Theory on Manifolds
167(2)
8.6 Backwards PCA
169(3)
8.7 Covariance Matrices as Data Objects
172(3)
9 FDA Curve Registration 175(22)
9.1 Fisher-Rao Curve Registration
176(17)
9.1.1 Example: Shifted Betas Data
176(5)
9.1.2 Introduction to Warping Functions
181(1)
9.1.3 Fisher-Rao Mathematics
182(11)
9.2 Principal Nested Spheres Decomposition
193(4)
10 Graph Structured Data Objects 197(18)
10.1 Arterial Trees as Data Objects
198(9)
10.1.1 Combinatoric Approaches
198(1)
10.1.2 Phylogenetics
199(3)
10.1.3 Dyck Path
202(1)
10.1.4 Persistent Homology
203(3)
10.1.5 Comparison of Tree Analysis Methods
206(1)
10.2 Networks as Data Objects
207(8)
10.2.1 Graph Laplacians
207(2)
10.2.2 Example: A Tale of Two Cities
209(2)
10.2.3 Extrinsic and Intrinsic Analysis
211(1)
10.2.4 Case Study: Corpus Linguistics
211(2)
10.2.5 Labeled versus Unlabeled Nodes
213(2)
11 Classification-Supervised Learning 215(28)
11.1 Classical Methods
217(9)
11.2 Kernel Methods
226(6)
11.3 Support Vector Machines
232(4)
11.4 Distance Weighted Discrimination
236(5)
11.5 Other Classification Approaches
241(2)
12 Clustering-Unsupervised Learning 243(14)
12.1 K-Means Clustering
243(4)
12.2 Hierarchical Clustering
247(7)
12.3 Visualization Based Methods
254(3)
12.3.1 Hybrid Clustering Methods
256(1)
13 High-Dimensional Inference 257(18)
13.1 DiProPerm-Two Sample Testing
257(5)
13.2 Statistical Significance in Clustering
262(13)
13.2.1 High Dimensional SigClust
266(9)
14 High Dimensional Asymptotics 275(18)
14.1 Random Matrix Theory
276(5)
14.2 High Dimension Low Sample Size
281(9)
14.3 High Dimension Medium Sample Size
290(3)
15 Smoothing and SiZer 293(20)
15.1 Why Not Histograms?-Hidalgo Stamps Data
294(5)
15.2 Smoothing Basics-Bralower Fossils Data
299(3)
15.3 Smoothing Parameter Selection
302(1)
15.4 Statistical Inference and SiZer
303(10)
15.4.1 Case Study: British Family Incomes Data
304(3)
15.4.2 Case Study: Bralower Fossils Data
307(1)
15.4.3 Case Study: Mass Flux Data
307(1)
15.4.4 Case Study: Kidney Cancer Data
308(3)
15.4.5 Additional SiZer Applications and Variants
311(2)
16 Robust Methods 313(18)
16.1 Robustness Controversies
314(1)
16.2 Robust Methods for OODA
315(12)
16.2.1 Case Study: Cornea Curvature Data
321(4)
16.2.2 Case Study: Genome-Wide Association Data
325(2)
16.3 Other Robustness Areas
327(4)
17 PCA Details and Variants 331(30)
17.1 Viewpoints of PCA
332(18)
17.1.1 Data Centering
334(8)
17.1.2 Singular Value Decomposition
342(6)
17.1.3 Gaussian Likelihood View
348(1)
17.1.4 PCA Computational Issues
349(1)
17.2 Two Block Decompositions
350(11)
17.2.1 Partial Least Squares
351(3)
17.2.2 Canonical Correlations
354(5)
17.2.3 Joint and Individual Variation Explained
359(2)
18 OODA Context and Related Areas 361(10)
18.1 History and Terminology
361(1)
18.2 OODA Analogy with Object-Oriented Programming
362(2)
18.3 Compositional Data Analysis
364(1)
18.4 Symbolic Data Analysis
365(2)
18.5 Other Research Areas
367(4)
Bibliography 371(45)
Index 416
J. S. Marron is the Amos Hawley Distinguished Professor of Statistics, Professor of Biostatistics, Adjunct Professor of Computer Science, Faculty Member of the Bioinformatics and Computational Biology Curriculum and Research Member of the Lineberger Cancer Center and the Computational Medicine Program, at the University of North Carolina, Chapel Hill.



Ian L. Dryden is a Professor in the Department of Mathematics and Statistics at Florida International University in Miami, has served as Head of School of Mathematical Sciences at the University of Nottingham, and is joint author of the acclaimed book Statistical Shape Analysis.