Muutke küpsiste eelistusi

Understanding Complex Datasets: Data Mining with Matrix Decompositions [Kõva köide]

(Queen's University, Canada)
Teised raamatud teemal:
Teised raamatud teemal:
Making obscure knowledge about matrix decompositions widely available, Understanding Complex Datasets: Data Mining with Matrix Decompositions discusses the most common matrix decompositions and shows how they can be used to analyze large datasets in a broad range of application areas. Without having to understand every mathematical detail, the book helps you determine which matrix is appropriate for your dataset and what the results mean.

Explaining the effectiveness of matrices as data analysis tools, the book illustrates the ability of matrix decompositions to provide more powerful analyses and to produce cleaner data than more mainstream techniques. The author explores the deep connections between matrix decompositions and structures within graphs, relating the PageRank algorithm of Google's search engine to singular value decomposition. He also covers dimensionality reduction, collaborative filtering, clustering, and spectral analysis. With numerous figures and examples, the book shows how matrix decompositions can be used to find documents on the Internet, look for deeply buried mineral deposits without drilling, explore the structure of proteins, detect suspicious emails or cell phone calls, and more.

Concentrating on data mining mechanics and applications, this resource helps you model large, complex datasets and investigate connections between standard data mining techniques and matrix decompositions.

Arvustused

One of this books attractive features is that every chapter contains a discussion relating to the algorithmic issues. One scenario is used as a running illustrative example throughout the book. Several other examples are discussed in different chapters. These examples should help the reader understand the advantages as well as the practical problems associated with any of the proposed matrix-based data mining techniques covered in the book. I recommend this book for anyone interested in using matrix methods for data mining. Technometrics, February 2009, Vol. 51, No. 1

This could be a nice companion book for courses in data mining or applied linear algebra. Producing a clear taxonomy of the use and intentions of matrix decompositions in data analysis is very useful to both students and researchers. Those working with large-scale complex datasets will definitely find this work useful. I would definitely use it in my own course in data mining. Michael W. Berry, University of Tennessee, Knoxville, USA

[ This book] is suffused with insightful suggestions for analytical methods and interpretations, drawn from the author's own research and his reading of the literature. The book has two great strengths. The first is its attempt to provide a unifying framework from which to view a host of important analytical methodologies based on matrix methods. Second, the book is extremely strong on interpreting the results of matrix methods. [ It] assembles and explains a diverse set of insights that are otherwise widely scattered in the literature. This alone makes the book an important contribution to the community. Bruce Hendrickson, Sandia National Laboratories, Albuquerque, New Mexico, USA

Preface xiii
1 Data Mining
1
1.1 What is data like?
4
1.2 Data-mining techniques
5
1.2.1 Prediction
6
1.2.2 Clustering
11
1.2.3 Finding outliers
16
1.2.4 Finding local patterns
16
1.3 Why use matrix decompositions?
17
1.3.1 Data that comes from multiple processes
18
1.3.2 Data that has multiple causes
19
1.3.3 What are matrix decompositions used for?
20
2 Matrix decompositions
23
2.1 Definition
23
2.2 Interpreting decompositions
28
2.2.1 Factor interpretation hidden sources
29
2.2.2 Geometric interpretation - hidden clusters
29
2.2.3 Component interpretation - underlying processes
32
2.2.4 Graph interpretation hidden connections
32
2.2.5 Summary
34
2.2.6 Example
34
2.3 Applying decompositions
36
2.3.1 Selecting factors, dimensions, components, or waystations
36
2.3.2 Similarity and clustering
41
2.3.3 Finding local relationships
42
2.3.4 Sparse representations
43
2.3.5 Oversampling
44
2.4 Algorithm issues
45
2.4.1 Algorithms and complexity
45
2.4.2 Data preparation issues
45
2.4.3 Updating a decomposition
46
3 Singular Value Decomposition (SVD)
49
3.1 Definition
49
3.2 Interpreting an SVD
54
3.2.1 Factor interpretation
54
3.2.2 Geometric interpretation
56
3.2.3 Component interpretation
60
3.2.4 Graph interpretation
61
3.3 Applying SVD
62
3.3.1 Selecting factors, dimensions, components, and waystations
62
3.3.2 Similarity and clustering
70
3.3.3 Finding local relationships
73
3.3.4 Sampling and sparsifying by removing values
76
3.3.5 Using domain knowledge or priors
77
3.4 Algorithm issues
77
3.4.1 Algorithms and complexity
77
3.4.2 Updating an SVD
78
3.5 Applications of SVD
78
3.5.1 The workhorse of noise removal
78
3.5.2 Information retrieval — Latent Semantic Indexing (LSI)
78
3.5.3 Ranking objects and attributes by interestingness
81
3.5.4 Collaborative filtering
81
3.5.5 Winnowing microarray data
86
3.6 Extensions
87
3.6.1 PDDP
87
3.6.2 The CUR decomposition
87
4 Graph Analysis
91
4.1 Graphs versus datasets
91
4.2 Adjacency matrix
95
4.3 Eigenvalues and eigenvectors
96
4.4 Connections to SVD
97
4.5 Google's PageRank
98
4.6 Overview of the embedding process
101
4.7 Datasets versus graphs
102
4.7.1 Mapping Euclidean space to an affinity matrix
103
4.7.2 Mapping an affinity matrix to a representation matrix
104
4.8 Eigendecompositions
110
4.9 Clustering
111
4.10 Edge prediction
114
4.11 Graph substructures
115
4.12 The ATHENS system for novel-knowledge discovery
118
4.13 Bipartite graphs
121
5 SemiDiscrete Decomposition (SDD)
123
5.1 Definition
123
5.2 Interpreting an SDD
132
5.2.1 Factor interpretation
133
5.2.2 Geometric interpretation
133
5.2.3 Component interpretation
134
5.2.4 Graph interpretation
134
5.3 Applying an SDD
134
5.3.1 Truncation
134
5.3.2 Similarity and clustering
135
5.4 Algorithm issues
138
5.5 Extensions
139
5.5.1 Binary nonorthogonal matrix decomposition
139
6 Using SVD and SDD together
141
6.1 SVD then SDD
142
6.1.1 Applying SDD to Ak
143
6.1.9 Applying SDD to the truncated correlation matrices
143
6.2 Applications of SVD and SDD together
114
6.2.1 Classifying galaxies
141
6.2.2 Mineral exploration
145
6.2.3 Protein conformation
151
7 Independent Component Analysis (ICA)
155
7.1 Definition
156
7.2 Interpreting an ICA
159
7.2.1 Factor interpretation
159
7.2.2 Geometric interpretation
159
7.2.3 Component interpretation
160
7.2.4 Graph interpretation
160
7.3 Applying an ICA
160
7.3.1 Selecting dimensions
160
7.3.2 Similarity and clustering
161
7.4 Algorithm issues
161
7.5 Applications of ICA
163
7.5.1 Determining suspicious messages
163
7.5.2 Removing spatial artifacts from microarrays
166
7.5.3 Finding al Qaeda groups
169
8 Non-Negative Matrix Factorization (NNMF)
173
8.1 Definition
174
8.2 Interpreting an NNMF
177
8.2.1 Factor interpretation
177
8.2.2 Geometric interpretation
177
8.2.3 Component interpretation
178
8.2.4 Graph interpretation
178
8.3 Applying an NNMF
178
8.3.1 Selecting factors
178
8.3.2 Denoising
179
8.3.3 Similarity and clustering
180
8.4 Algorithm issues
180
8.4.1 Algorithms and complexity
180
8.4.2 Updating
180
8.5 Applications of NNMF
181
8.5.1 Topic detection
181
8.5.2 Microarray analysis
181
8.5.3 Mineral exploration revisited
182
9 Tensors
189
9.1 The Tucker3 tensor decomposition
190
9.2 The CP decomposition
193
9.3 Applications of tensor decompositions
194
9.3.1 Citation data,
194
9.3.2 Words, documents, and links
195
9.3.3 Users, keywords, and time in chat rooms
195
9.4 Algorithmic issues
196
10 Conclusion 197
Appendix A Matlab scripts 203
Bibliography 223
Index 233


Queen's University, Kingston, Ontario, Canada