Muutke küpsiste eelistusi

E-raamat: The Energy of Data and Distance Correlation

(National Science Foundation, Arlington, Virginia, USA), (Bowling Green State University, Ohio, USA)
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 64,99 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Raamatukogudele
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

"Energy distance is a statistical distance between the distributions of random vectors, which characterizes equality of distributions. The name energy derives from Newton's gravitational potential energy, and there is an elegant relation to the notion ofpotential energy between statistical observations. Energy statistics are functions of distances between statistical observations in metric spaces. The authors hope this book will spark the interest of most statisticians who so far have not explored E-statistics and would like to apply these new methods using R. The Energy of Data and Distance Correlation is intended for teachers and students looking for dedicated material on energy statistics, but can serve as a supplement to a wide range of courses and areas, such as Monte Carlo methods, U-statistics or V-statistics, measures of multivariate dependence, goodness-of-fit tests, nonparametric methods and distance based methods"--

Energy distance is a statistical distance between the distributions of random vectors, which characterizes equality of distributions. The name energy derives from Newton's gravitational potential energy, and there is an elegant relation to the notion of potential energy between statistical observations. Energy statistics are functions of distances between statistical observations in metric spaces. The authors hope this book will spark the interest of most statisticians who so far have not explored E-statistics and would like to apply these new methods using R. The Energy of Data and Distance Correlation is intended for teachers and students looking for dedicated material on energy statistics, but can serve as a supplement to a wide range of courses and areas, such as Monte Carlo methods, U-statistics or V-statistics, measures of multivariate dependence, goodness-of-fit tests, nonparametric methods and distance based methods.

•E-statistics provides powerful methods to deal with problems in multivariate inference and analysis.

•Methods are implemented in R, and readers can immediately apply them using the freely available energy package for R.

•The proposed book will provide an overview of the existing state-of-the-art in development of energy statistics and an overview of applications.

•Background and literature review is valuable for anyone considering further research or application in energy statistics.



Energy statistics are functions of distances between statistical observations in metric spaces. The authors hope this book will spark the interest of most statisticians who so far have not explored E-statistics and would like to apply these new methods using R.

Arvustused

"Many dozens of theorems are proved, various R codes with numerical examples are provided, and multiple exercises are given in each chapter... The book and corresponding software can be useful for instructors and students in advanced statistical courses, and for researchers and practitioners in data analysis."

Stan Lipovetsky, Ipsos, Technometrics, 22nd August 2023.

Preface xiii
Authors xv
Notation xvii
I The Energy of Data
1(406)
1 Introduction
3(10)
1.1 Distances of Data
4(5)
1.2 Energy of Data: Distance Science of Data
9(4)
2 Preliminaries
13(10)
2.1 Notation
13(2)
2.2 V-statistics and U-statistics
15(4)
2.2.1 Examples
15(1)
2.2.2 Representation as a V-statistic
15(2)
2.2.3 Asymptotic Distribution
17(1)
2.2.4 E-statistics as V-statistics vs U-statistics
18(1)
2.3 A Key Lemma
19(1)
2.4 Invariance
20(1)
2.5 Exercises
21(2)
3 Energy Distance
23(22)
3.1 Introduction: The Energy of Data
23(3)
3.2 The Population Value of Statistical Energy
26(1)
3.3 A Simple Proof of the Inequality
27(1)
3.4 Energy Distance and Cramer's Distance
28(4)
3.5 Multivariate Case
32(3)
3.6 Why is Energy Distance Special?
35(1)
3.7 Infinite Divisibility and Energy Distance
36(3)
3.8 Freeing Energy via Uniting Sets in Partitions
39(2)
3.9 Applications of Energy Statistics
41(1)
3.10 Exercises
42(3)
4 Introduction to Energy Inference
45(14)
4.1 Introduction
45(1)
4.2 Testing for Equal Distributions
46(2)
4.3 Permutation Distribution and Test
48(3)
4.4 Goodness-of-Fit
51(2)
4.5 Energy Test of Univariate Normality
53(4)
4.6 Multivariate Normality and other Energy Tests
57(1)
4.7 Exercises
58(1)
5 Goodness-of-Fit
59(24)
5.1 Energy Goodness-of-Fit Tests
59(2)
5.2 Continuous Uniform Distribution
61(1)
5.3 Exponential and Two-Parameter Exponential
61(1)
5.4 Energy Test of Normality
61(1)
5.5 Bernoulli Distribution
62(1)
5.6 Geometric Distribution
62(1)
5.7 Beta Distribution
63(1)
5.8 Poisson Distribution
64(5)
5.8.1 The Poisson E-test
65(1)
5.8.2 Probabilities in Terms of Mean Distances
66(1)
5.8.3 The Poisson M-test
67(1)
5.8.4 Implementation of Poisson Tests
68(1)
5.9 Energy Test for Location-Scale Families
69(1)
5.10 Asymmetric Laplace Distribution
70(4)
5.10.1 Expected Distances
70(3)
5.10.2 Test Statistic and Empirical Results
73(1)
5.11 The Standard Half-Normal Distribution
74(1)
5.12 The Inverse Gaussian Distribution
75(2)
5.13 Testing Spherical Symmetry; Stolarsky Invariance
77(2)
5.14 Proofs
79(2)
5.15 Exercises
81(2)
6 Testing Multivariate Normality
83(16)
6.1 Energy Test of Multivariate Normality
83(8)
6.1.1 Simple Hypothesis: Known Parameters
84(3)
6.1.2 Composite Hypothesis: Estimated Parameters
87(1)
6.1.3 On the Asymptotic Behavior of the Test
88(1)
6.1.4 Simulations
89(2)
6.2 Energy Projection-Pursuit Test of Fit
91(3)
6.2.1 Methodology
91(2)
6.2.2 Projection Pursuit Results
93(1)
6.3 Proofs
94(3)
6.3.1 Hypergeometric Series Formula
94(2)
6.3.2 Original Formula
96(1)
6.4 Exercises
97(2)
7 Eigenvalues for One-Sample E-Statistics
99(22)
7.1 Introduction
99(2)
7.2 Kinetic; Energy: The Schrodinger Equation
101(2)
7.3 CF Version of the Hilbert-Schmidt Equation
103(4)
7.4 Implementation
107(2)
7.5 Computation of Eigenvalues
109(1)
7.6 Computational and Empirical Results
110(6)
7.6.1 Results for Univariate Normality
110(4)
7.6.2 Testing Multivariate Normality
114(1)
7.6.3 Computational Efficiency
115(1)
7.7 Proofs
116(3)
7.8 Exercises
119(2)
8 Generalized Goodness-of-Fit
121(10)
8.1 Introduction
121(1)
8.2 Pareto Distributions
122(5)
8.2.1 Energy Tests for Pareto Distribution
122(1)
8.2.2 Test of Transformed Pareto Sample
123(1)
8.2.3 Statistics for the Exponential Model
124(1)
8.2.4 Pareto Statistics
124(2)
8.2.5 Minimum Distance Estimation
126(1)
8.3 Cauchy Distribution
127(1)
8.4 Stable Family of Distributions
128(1)
8.5 Symmetric Stable Family
129(1)
8.6 Exercises
130(1)
9 Multi-sample Energy Statistics
131(24)
9.1 Energy Distance of a Set of Random Variables
131(1)
9.2 Multi-sample Energy Statistics
132(1)
9.3 Distance Components: A Nonparametric Extension of ANOVA
133(8)
9.3.1 The DISCO Decomposition
134(4)
9.3.2 Application: Decomposition of Residuals
138(3)
9.4 Hierarchical Clustering
141(2)
9.5 Case Study: Hierarchical Clustering
143(2)
9.6 K-groups Clustering
145(3)
9.6.1 K-groups Objective Function
146(1)
9.6.2 K-groups Clustering Algorithm
147(1)
9.6.3 K-means as a Special Case of K-groups
148(1)
9.7 Case Study: Hierarchical and K-groups Cluster Analysis
148(1)
9.8 Further Reading
149(1)
9.8.1 Bayesian Applications
150(1)
9.9 Proofs
150(3)
9.9.1 Proof of Theorem 9.1
150(2)
9.9.2 Proof of Proposition 9.1
152(1)
9.10 Exercises
153(2)
10 Energy in Metric Spaces and Other Distances
155(26)
10.1 Metric Spaces
155(3)
10.1.1 Review of Metric Spaces
155(1)
10.1.2 Examples of Metrics
156(2)
10.2 Energy Distance in a Metric Space
158(3)
10.3 Banach Spaces
161(1)
10.4 Earth Mover's Distance
162(4)
10.4.1 Wasserstein Distance
163(2)
10.4.2 Energy vs. Earth Mover's Distance
165(1)
10.5 Minimum Energy Distance (MED) Estimators
166(1)
10.6 Energy in Hyperbolic Spaces and in Spheres
167(1)
10.7 The Space of Positive Definite Symmetric Matrices
168(1)
10.8 Energy and Machine Learning
169(3)
10.9 Minkowski Kernel and Gaussian Kernel
172(1)
10.10 On Some Non-Energy Distances
173(3)
10.11 Topological Data Analysis
176(1)
10.12 Exercises
177(4)
11 Distance Correlation and Dependence
181(2)
11 On Correlation and Other Measures of Association
183(6)
11.1 The First Measure of Dependence: Correlation
183(1)
11.2 Distance Correlation
184(1)
11.3 Other Dependence Measures
185(1)
11.4 Representations by Uncorrelated Random Variables
185(4)
12 Distance Correlation
189(22)
12.1 Introduction
189(3)
12.2 Characteristic Function Based Covariance
192(2)
12.3 Dependence Coefficients
194(1)
12.3.1 Definitions
194(1)
12.4 Sample Distance Covariance and Correlation
195(4)
12.4.1 Derivation of
197(1)
12.4.2 Equivalent Definitions for
198(1)
12.4.3 Theorem on dCov Statistic Formula
198(1)
12.5 Properties
199(3)
12.6 Distance Correlation for Gaussian Variables
202(1)
12.7 Proofs
203(5)
12.7.1 Finiteness of ||fx.y(t,s) - fx(t)fY(s)||2
203(1)
12.7.2 Proof of Theorem 12.1
204(2)
12.7.3 Proof of Theorem 12.2
206(1)
12.7.4 Proof of Theorem 12.4
207(1)
12.8 Exercises
208(3)
13 Testing Independence
211(18)
13.1 The Sampling Distribution of nV2n
211(4)
13.1.1 Expected Value and Bias of Distance Covariance
213(1)
13.1.2 Convergence
213(1)
13.1.3 Asymptotic Properties of nV2n
214(1)
13.2 Testing Independence
215(6)
13.2.1 Implementation as a Permutation Test
215(1)
13.2.2 Rank Test
216(1)
13.2.3 Categorical Data
216(1)
13.2.4 Examples
217(1)
13.2.5 Power Comparisons
217(4)
13.3 Mutual Independence
221(1)
13.4 Proofs
222(5)
13.4.1 Proof of Proposition 13.1
222(1)
13.4.2 Proof of Theorem 13.1
223(2)
13.4.3 Proof of Corollary 13.3
225(1)
13.4.4 Proof of Theorem 13.2
226(1)
13.5 Exercises
227(2)
14 Applications and Extensions
229(20)
14.1 Applications
229(6)
14.1.1 Nonlinear and Non-monotone Dependence
229(3)
14.1.2 Identify and Test for Nonlinearity
232(1)
14.1.3 Exploratory Data Analysis
233(1)
14.1.4 Identify Influential Observations
234(1)
14.2 Some Extensions
235(4)
14.2.1 Affine and Monotone Invariant Versions
235(1)
14.2.2 Generalization: Powers of Distances
236(1)
14.2.3 Distance Correlation for Dissimilarities
237(1)
14.2.4 An Unbiased Distance Covariance Statistic
238(1)
14.3 Distance Correlation in Metric Spaces
239(2)
14.3.1 Hilbert Spaces and General Metric Spaces
239(1)
14.3.2 Testing Independence in Separable Metric Spaces
240(1)
14.3.3 Measuring Associations in Banach Spaces
241(1)
14.4 Distance Correlation with General Kernels
241(2)
14.5 Further Reading
243(4)
14.5.1 Variable Selection, DCA and ICA
243(1)
14.5.2 Nonparametric MANOVA Based on dCor
244(1)
14.5.3 Tests of Independence with Ranks
245(1)
14.5.4 Projection Correlation
245(1)
14.5.5 Detection of Periodicity via Distance Correlation
245(1)
14.5.6 dCov Goodness-of-fit Test of Dirichlet Distribution
246(1)
14.6 Exercises
247(2)
15 Brownian Distance Covariance
249(12)
15.1 Introduction
249(1)
15.2 Weighted L2 Norm
250(3)
15.3 Brownian Covariance
253(3)
15.3.1 Definition of Brownian Covariance
253(2)
15.3.2 Existence of Brownian Covariance Coefficient
255(1)
15.3.3 The Surprising Coincidence: BCov(X,Y) = dCov(X,Y)
255(1)
15.4 Fractional Powers of Distances
256(2)
15.5 Proofs of Statements
258(2)
15.5.1 Proof of Theorem 15.1
258(2)
15.6 Exercises
260(1)
16 U-statistics and Unbiased dCov2
261(20)
16.1 An Unbiased Estimator of Squared dCov
261(1)
16.2 The Hilbert Space of U-centered Distance Matrices
262(1)
16.3 U-statistics and V-statistics
263(4)
16.3.1 Definitions
263(1)
16.3.2 Examples
264(3)
16.4 Jackknife Invariance and U-statistics
267(3)
16.5 The Inner Product Estimator is a U-statistic
270(2)
16.6 Asymptotic Theory
272(1)
16.7 Relation between dCov U-statistic and V-statistic
273(3)
16.7.1 Deriving the Kernel of dCov V-statistic
274(2)
16.7.2 Combining Kernel Functions for Vn
276(1)
16.8 Implementation in R
276(1)
16.9 Proofs
277(2)
16.10 Exercises
279(2)
17 Partial Distance Correlation
281(24)
17.1 Introduction
281(2)
17.2 Hilbert Space of U-centered Distance Matrices
283(3)
17.2.1 U-centered Distance Matrices
284(1)
17.2.2 Properties of Centered Distance Matrices
285(1)
17.2.3 Additive Constant Invariance
285(1)
17.3 Partial Distance Covariance and Correlation
286(2)
17.4 Representation in Euclidean Space
288(2)
17.5 Methods for Dissimilarities
290(2)
17.6 Population Coefficients
292(4)
17.6.1 Distance Correlation in Hilbert Spaces
292(2)
17.6.2 Population pdCov and pdCor Coefficients
294(1)
17.6.3 On Conditional Independence
295(1)
17.7 Empirical Results and Applications
296(4)
17.8 Proofs
300(3)
17.9 Exercises
303(2)
18 The Numerical Value of dCor
305(10)
18.1 Cor and dCor: How Much Can They Differ?
305(2)
18.2 Relation Between Pearson and Distance Correlation
307(6)
18.3 Conjecture
313(2)
19 The dCor t-test of Independence in High Dimension
315(30)
19.1 Introduction
315(3)
19.1.1 Population dCov and dCor Coefficients
317(1)
19.1.2 Sample dCov and dCor
317(1)
19.2 On the Bias of the Statistics
318(3)
19.3 Modified Distance Covariance Statistics
321(1)
19.4 The t-test for Independence in High Dimension
322(1)
19.5 Theory and Properties
323(4)
19.6 Application to Time Series
327(3)
19.7 Dependence Metrics in High Dimension
330(2)
19.8 Proofs
332(12)
19.8.1 On the Bias of Distance Covariance
332(1)
19.8.2 Proofs of Lemmas
333(8)
19.8.3 Proof of Propositions
341(1)
19.8.4 Proof of Theorem
342(2)
19.9 Exercises
344(1)
20 Computational Algorithms
345(20)
20.1 Linearize Energy Distance of Univariate Samples
346(3)
20.1.1 L-statistics Identities
346(1)
20.1.2 One-sample Energy Statistics
347(1)
20.1.3 Energy Test for Equality of Two or More Distributions
348(1)
20.2 Distance Covariance and Correlation
349(1)
20.3 Bivariate Distance Covariance
350(4)
20.3.1 An 0 (n log n) Algorithm for Bivariate Data
351(2)
20.3.2 Bias-Corrected Distance Correlation
353(1)
20.4 Alternate Bias-Corrected Formula
354(1)
20.5 Randomized Computational Methods
355(5)
20.5.1 Random Projections
355(1)
20.5.2 Algorithm for Squared dCov
356(1)
20.5.3 Estimating Distance Correlation
357(3)
20.6 Appendix: Binary Search Algorithm
360(5)
20.6.1 Computation of the Partial Sums
360(1)
20.6.2 The Binary Tree
361(1)
20.6.3 Informal Description of the Algorithm
361(1)
20.6.4 Algorithm
362(3)
21 Time Series and Distance Correlation
365(10)
21.1 Yule's "nonsense correlation" is Contagious
365(1)
21.2 Auto dCor and Testing for iid
366(2)
21.3 Cross and Auto-dCor for Stationary Time Series
368(1)
21.4 Martingale Difference dCor
369(1)
21.5 Distance Covariance for Discretized Stochastic Processes
370(1)
21.6 Energy Distance with Dependent Data: Time Shift Invariance
370(5)
22 Axioms of Dependence Measures
375(20)
22.1 Renyi's Axioms and Maximal Correlation
375(1)
22.2 Axioms for Dependence Measures
376(2)
22.3 Important Dependence Measures
378(3)
22.4 Invariances of Dependence Measures
381(2)
22.5 The Erlangen Program of Statistics
383(3)
22.6 Multivariate Dependence Measures
386(3)
22.7 Maximal Distance Correlation
389(1)
22.8 Proofs
390(4)
22.9 Exercises
394(1)
23 Earth Mover's Correlation
395(12)
23.1 Earth Mover's Covariance
395(1)
23.2 Earth Mover's Correlation
396(2)
23.3 Population eCor for Mutual Dependence
398(1)
23.4 Metric Spaces
398(5)
23.5 Empirical Earth Mover's Correlation
403(2)
23.6 Dependence, Similarity, and Angles
405(2)
A Historical Background
407(4)
B Prehistory
411(6)
B.1 Introductory Remark
411(1)
B.2 Thales and the Ten Commandments
412(5)
Bibliography 417(28)
Index 445
Gábor J. Székely graduated from Eötvös Loránd University, Budapest, Hungary (ELTE) with MS in 1970, and Ph. D. in 1971. He joined the Department of Probability Theory of ELTE in 1970. In 1989 he became the funding chair of the Department of Stochastics of the Budapest Institute of Technology (Technical University of Budapest). In 1995 Székely moved to the US. Before that, in 1990-91 he was the first distinguished Lukacs Professor at Bowling Green State University, Ohio. Székely had several visiting positions, e.g., at the University of Amsterdam in 1976 and at Yale University in 1989. Between 1985 and 1995 he was the first Hungarian director of Budapest Semesters in Mathematics. Between 2006 and 2022, until his retirement, he was program director of statistics of the National Science Foundation (USA). Székely has almost 250 publications, including six books in several languages. In 1988 he received the Rollo Davidson Prize from Cambridge University, jointly with Imre Z. Ruzsa for their work on algebraic probability theory. In 2010 Székely became an elected fellow of the Institute of Mathematical Statistics for his seminal work on physics concepts in statistics like energy statistics and distance correlation. Székely was invited speaker at several Joint Statistics Meetings and also organizer of invited sessions on energy statistics and distance correlation. Székely has two children, Szilvia and Tamás, and six grandchildren: Elisa, Anna, Michaël and Lea, Eszter, Avi who live in Brussels, Belgium and Basel, Switzerland. Székely and his wife, Judit, live in McLean, Virginia and Budapest, Hungary.







Maria L. Rizzo

is Professor in the Department of Mathematics and Statistics at Bowling Green State University in Bowling Green, Ohio, where she teaches statistics, actuarial science, computational statistics, statistical programming and data science. Prior to joining the faculty at BGSU in 2006, she was a faculty member of the Department of Mathematics at Ohio University in Athens, Ohio. Her main research area is energy statistics and distance correlation. She is the software developer and maintainer of the energy package for R, and author of textbooks on statistical computing: "Statistical Computing with R" 1st and 2nd editions, "R by Example" (2nd edition in progress) with Jim Albert, and a forthcoming textbook on data science. Dr. Rizzo has eight PhD students and one current student, almost all with dissertations on energy statistics. Outside of work she enjoys spending time with her family including her husband, daughters, grandchildren and a large extended family.