Muutke küpsiste eelistusi

E-raamat: Robust Methods for Data Reduction

(Sapienza -- University of Rome, Rome, Italy), (University of Sannio, Benevento, Italy)
  • Formaat: 297 pages
  • Ilmumisaeg: 13-Jan-2016
  • Kirjastus: CRC Press Inc
  • Keel: eng
  • ISBN-13: 9781466590632
  • Formaat - PDF+DRM
  • Hind: 59,79 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 297 pages
  • Ilmumisaeg: 13-Jan-2016
  • Kirjastus: CRC Press Inc
  • Keel: eng
  • ISBN-13: 9781466590632

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Robust Methods for Data Reduction gives a non-technical overview of robust data reduction techniques, encouraging the use of these important and useful methods in practical applications. The main areas covered include principal components analysis, sparse principal component analysis, canonical correlation analysis, factor analysis, clustering, double clustering, and discriminant analysis.

The first part of the book illustrates how dimension reduction techniques synthesize available information by reducing the dimensionality of the data. The second part focuses on cluster and discriminant analysis. The authors explain how to perform sample reduction by finding groups in the data.

Despite considerable theoretical achievements, robust methods are not often used in practice. This book fills the gap between theoretical robust techniques and the analysis of real data sets in the area of data reduction. Using real examples, the authors show how to implement the procedures in R. The code and data for the examples are available on the book’s CRC Press web page.

Arvustused

" this book tries to avoid technicalities and focuses on illustrating the power of robust techniques in action. Additionally, it covers some novel techniques, involving data reduction An important concept addressed in Part 2 of the book is independent cell-wise contamination. A large number of variables and a relatively small number of cases are commonplace in modern statistical applications. The proposed snipping methodology is tailored to be applied in the presence of cell-wise contamination, and from my point of view, is one of the principal methodological contributions of the book. In summary, this book is interesting and useful. The book is not an attempt to systematically review all the literature in robust data reduction. However, it proposes a selection of techniques that are simple to understand or to use in practice." Luis Angel García Escudero, Dpto. de Estadística e I. O., Universidad de Valladolid, in Biometrics, June 2017

"'Robust Methods for Data Reduction' makes it easy for practitioners of big-data analytics to conduct robust and efficient data reduction. It is a timely topic in which recently prescribed algorithms and methodological research findings are properly assimilated and presented in a lucid fashion. The book serves as a good introductory book that motivates and teaches the art of developing robust frameworks for synthesis and reduction of large, complex datasetsThe most appealing aspect of this book is that all of the concepts and algorithms described are inspired by real-data examples. All of the methods presented in this book are accompanied by extensive codes and exhaustive documentation on how to implement them in the R computing environment. Readers can download the data and the computer code used in the book from the publishers webpageThe collection of data examples and the pedagogical writing style make it an ideal text for instructors aiming to quickly train students on proper data-reduction techniquesThis book will be particularly useful for courses with R labs. It is bound to find a wide and enduring readership and will be a valuable addition to the library of any data scientist." Gourab Mukherjee, University of Southern California, in Journal of the American Statistical Association, Volume 111, 2016 " this book tries to avoid technicalities and focuses on illustrating the power of robust techniques in action. Additionally, it covers some novel techniques, involving data reduction An important concept addressed in Part 2 of the book is independent cell-wise contamination. A large number of variables and a relatively small number of cases are commonplace in modern statistical applications. The proposed snipping methodology is tailored to be applied in the presence of cell-wise contamination, and from my point of view, is one of the principal methodological contributions of the book. In summary, this book is interesting and useful. The book is not an attempt to systematically review all the literature in robust data reduction. However, it proposes a selection of techniques that are simple to understand or to use in practice." Luis Angel García Escudero, Dpto. de Estadística e I. O., Universidad de Valladolid, in Biometrics, June 2017

"'Robust Methods for Data Reduction' makes it easy for practitioners of big-data analytics to conduct robust and efficient data reduction. It is a timely topic in which recently prescribed algorithms and methodological research findings are properly assimilated and presented in a lucid fashion. The book serves as a good introductory book that motivates and teaches the art of developing robust frameworks for synthesis and reduction of large, complex datasetsThe most appealing aspect of this book is that all of the concepts and algorithms described are inspired by real-data examples. All of the methods presented in this book are accompanied by extensive codes and exhaustive documentation on how to implement them in the R computing environment. Readers can download the data and the computer code used in the book from the publishers webpageThe collection of data examples and the pedagogical writing style make it an ideal text for instructors aiming to quickly train students on proper data-reduction techniquesThis book will be particularly useful for courses with R labs. It is bound to find a wide and enduring readership and will be a valuable addition to the library of any data scientist." Gourab Mukherjee, University of Southern California, in Journal of the American Statistical Association, Volume 111, 2016

Preface xi
Authors xv
List of Figures
xvii
List of Tables
xxi
List of Examples and R illustrations
xxv
Symbol Description xxvii
1 Introduction and Overview
1(28)
1.1 What is contamination?
4(2)
1.2 Evaluating robustness
6(6)
1.2.1 Consistency
7(1)
1.2.2 Local robustness: the influence function
8(2)
1.2.3 Global robustness: the breakdown point
10(1)
1.2.4 Global robustness: the maximum bias
11(1)
1.3 What is data reduction?
12(3)
1.3.1 Dimension reduction
13(1)
1.3.2 Sample reduction
14(1)
1.4 An overview of robust dimension reduction
15(3)
1.5 An overview of robust sample reduction
18(4)
1.6 Example datasets
22(7)
1.6.1 G8 macroeconomic data
22(1)
1.6.2 Handwritten digits data
23(1)
1.6.3 Automobile data
24(1)
1.6.4 Metallic oxide data
24(1)
1.6.5 Spam detection data
25(1)
1.6.6 Video surveillance data
26(1)
1.6.7 Water treatment plant data
27(2)
2 Multivariate Estimation Methods
29(42)
2.1 Robust univariate methods
30(12)
2.1.1 M estimators
31(1)
2.1.2 Huber estimator
32(1)
2.1.3 Redescending M estimators
33(2)
2.1.4 Scale estimators
35(5)
2.1.5 Measuring outlyingness
40(2)
2.2 Classical multivariate estimation
42(1)
2.3 Robust multivariate estimation
43(12)
2.3.1 Multivariate M estimators
45(2)
2.3.2 Multivariate S estimators
47(1)
2.3.3 Multivariate MM estimators
48(1)
2.3.4 Minimum Covariance Determinant
49(2)
2.3.5 Reweighted MCD
51(2)
2.3.6 Other multivariate estimators
53(2)
2.4 Identification of multivariate outliers
55(4)
2.4.1 Multiple testing strategy
56(3)
2.5 Examples
59(12)
2.5.1 Italian demographics data
59(2)
2.5.2 Star cluster CYG OB1 data
61(7)
2.5.3 Butterfly data
68(3)
Part I Dimension Reduction
71(74)
Introduction to Dimension Reduction
73(2)
3 Principal Component Analysis
75(26)
3.1 Classical PCA
76(4)
3.2 PCA based on robust covariance estimation
80(2)
3.3 PCA based on projection pursuit
82(1)
3.4 Spherical PCA
83(1)
3.5 PCA in high dimensions
84(1)
3.6 Outlier identification using principal components
85(2)
3.7 Examples
87(14)
3.7.1 Automobile data
87(6)
3.7.2 Octane data
93(3)
3.7.3 Video surveillance data
96(5)
4 Sparse Robust PCA
101(16)
4.1 Basic concepts and sPCA
102(3)
4.2 Robust sPCA
105(2)
4.3 Choice of the degree of sparsity
107(1)
4.4 Sparse projection pursuit
108(1)
4.5 Examples
109(8)
4.5.1 Automobile data
109(4)
4.5.2 Octane data
113(4)
5 Canonical Correlation Analysis
117(16)
5.1 Classical canonical correlation analysis
117(4)
5.1.1 Interpretation of the results
119(1)
5.1.2 Selection of the number of canonical variables
120(1)
5.2 CCA based on robust covariance estimation
121(1)
5.3 Other methods
122(1)
5.4 Examples
122(11)
5.4.1 Linnerud data
122(6)
5.4.2 Butterfly data
128(5)
6 Factor Analysis
133(12)
6.1 The FA model
133(5)
6.1.1 Fitting the FA model
135(3)
6.2 Robust factor analysis
138(1)
6.3 Examples
138(7)
6.3.1 Automobile data
138(4)
6.3.2 Butterfly data
142(3)
Part II Sample Reduction
145(86)
Introduction to Sample Reduction
147(2)
7 k-means and Model-Based Clustering
149(22)
7.1 A brief overview of applications of cluster analysis
149(1)
7.2 Basic concepts
150(1)
7.3 k-means
151(5)
7.4 Model-based clustering
156(8)
7.4.1 Likelihood inference
157(2)
7.4.2 Distribution of component densities
159(3)
7.4.3 Examples of model-based clustering
162(2)
7.5 Choosing the number of clusters
164(7)
8 Robust Clustering
171(18)
8.1 Partitioning Around Medoids
171(3)
8.2 Trimmed k-means
174(3)
8.2.1 The double minimization problem involved with trimmed k-means
175(2)
8.3 Snipped k-means
177(4)
8.3.1 Snipping and the component-wise contamination model
178(1)
8.3.2 Minimization of the loss function for snipped k-means
179(2)
8.4 Choosing the trimming and snipping levels
181(3)
8.5 Examples
184(5)
8.5.1 Metallic oxide data
185(1)
8.5.2 Handwritten digits data
186(3)
9 Robust Model-Based Clustering
189(20)
9.1 Robust heterogeneous clustering based on trimming
190(5)
9.1.1 A robust CEM for model estimation: the tclust algorithm
191(2)
9.1.2 Properties
193(2)
9.2 Robust heterogeneous clustering based on snipping
195(7)
9.2.1 A robust CEM for model estimation: the sclust algorithm
197(3)
9.2.2 Properties
200(2)
9.3 Examples
202(7)
9.3.1 Metallic oxide data
202(2)
9.3.2 Water treatment plant data
204(5)
10 Double Clustering
209(10)
10.1 Double k-means
210(2)
10.2 Trimmed double k-means
212(2)
10.3 Snipped double k-means
214(1)
10.4 Robustness properties
214(5)
11 Discriminant Analysis
219(12)
11.1 Classical discriminant analysis
219(3)
11.2 Robust discriminant analysis
222(9)
A Use of the Software R for Data Reduction
231(14)
A.1 Multivariate estimation methods
231(4)
A.2 Robust PCA
235(3)
A.3 Sparse robust PCA
238(1)
A.4 Canonical correlation analysis
239(1)
A.5 Factor analysis
240(1)
A.6 Classical k-means and model based clustering
240(1)
A.7 Robust clustering
241(2)
A.8 Robust double clustering
243(1)
A.9 Discriminant analysis
244(1)
Bibliography 245(22)
Index 267
Alessio Farcomeni is an assistant professor in the Department of Public Health and Infectious Diseases at the University of Rome Sapienza. His work focuses on robust statistics, longitudinal models, categorical data analysis, cluster analysis, and multiple testing. He also is involved in clinical, ecological, and econometric research.

Luca Greco is an assistant professor in the Department of Law, Economics, Management and Quantitative Methods at the University of Sannio. His research interests include robust statistics, likelihood asymptotics, pseudolikelihood functions, and skew elliptical distributions.