| Preface |
|
xi | |
| List of Symbols |
|
xiii | |
| 1 Introduction |
|
1 | (28) |
|
1.1 Introduction to Multi-Label Learning |
|
|
1 | (1) |
|
1.2 Applications of Multi-Label Learning |
|
|
2 | (6) |
|
1.2.1 Scene Classification |
|
|
2 | (1) |
|
1.2.2 Text Categorization |
|
|
3 | (1) |
|
1.2.3 Functional Genomics Analysis |
|
|
4 | (2) |
|
1.2.4 Gene Expression Pattern Image Annotation |
|
|
6 | (2) |
|
1.3 Challenges of Multi-Label Learning |
|
|
8 | (1) |
|
|
|
9 | (9) |
|
1.4.1 Problem Transformation |
|
|
9 | (3) |
|
1.4.1.1 Copy Transformation |
|
|
9 | (1) |
|
|
|
10 | (1) |
|
|
|
10 | (2) |
|
1.4.1.4 Single-Label Classification after Transformation |
|
|
12 | (1) |
|
1.4.2 Algorithm Adaptation |
|
|
12 | (6) |
|
|
|
12 | (1) |
|
1.4.2.2 Algorithms Based on Probabilistic Framework |
|
|
12 | (1) |
|
1.4.2.3 Support Vector Machines |
|
|
13 | (1) |
|
1.4.2.4 Artificial Neural Networks |
|
|
14 | (1) |
|
1.4.2.5 k-Nearest Neighbor |
|
|
15 | (1) |
|
1.4.2.6 Ensemble Learning |
|
|
15 | (1) |
|
|
|
16 | (2) |
|
1.5 Dimensionality Reduction for Multi-Label Learning |
|
|
18 | (5) |
|
1.5.1 Introduction to Dimensionality Reduction |
|
|
18 | (2) |
|
1.5.2 Linear and Nonlinear Dimensionality Reduction |
|
|
20 | (1) |
|
1.5.3 Multi-Label Dimensionality Reduction |
|
|
20 | (2) |
|
|
|
22 | (1) |
|
|
|
23 | (3) |
|
1.6.1 Design and Analysis of Algorithms |
|
|
24 | (1) |
|
1.6.2 Scalable Implementations |
|
|
25 | (1) |
|
|
|
26 | (1) |
|
|
|
26 | (1) |
|
|
|
27 | (2) |
| 2 Partial Least Squares |
|
29 | (20) |
|
2.1 Basic Models of Partial Least Squares |
|
|
29 | (2) |
|
2.1.1 The NIPALS Algorithm |
|
|
30 | (1) |
|
2.2 Partial Least Squares Variants |
|
|
31 | (6) |
|
|
|
32 | (1) |
|
|
|
32 | (1) |
|
|
|
33 | (1) |
|
|
|
34 | (1) |
|
|
|
34 | (1) |
|
2.2.6 Orthonormalized PLS |
|
|
34 | (2) |
|
2.2.7 Relationship between OPLS and Other PLS Models |
|
|
36 | (1) |
|
|
|
37 | (7) |
|
2.3.1 Basics of PLS Regression |
|
|
37 | (1) |
|
2.3.2 Shrinkage in Regression |
|
|
38 | (3) |
|
2.3.3 Principal Component Regression |
|
|
41 | (1) |
|
|
|
41 | (2) |
|
2.3.5 Shrinkage Properties of PLS Regression |
|
|
43 | (1) |
|
2.4 Partial Least Squares Classification |
|
|
44 | (5) |
| 3 Canonical Correlation Analysis |
|
49 | (20) |
|
3.1 Classical Canonical Correlation Analysis |
|
|
49 | (7) |
|
3.1.1 Linear Correlation Coefficient |
|
|
49 | (1) |
|
3.1.2 The Maximum Correlation Formulation of CCA |
|
|
50 | (4) |
|
3.1.3 The Distance Minimization Formulation of CCA |
|
|
54 | (1) |
|
|
|
55 | (1) |
|
3.1.5 CCA for Multiple Sets of Variables |
|
|
55 | (1) |
|
|
|
56 | (3) |
|
3.2.1 Sparse CCA via Linear Regression |
|
|
57 | (1) |
|
3.2.2 Sparse CCA via Iterative Greedy Algorithms |
|
|
57 | (1) |
|
3.2.3 Sparse CCA via Bayesian Learning |
|
|
58 | (1) |
|
3.3 Relationship between CCA and Partial Least Squares |
|
|
59 | (5) |
|
3.3.1 A Unified Framework for PLS and CCA |
|
|
59 | (1) |
|
3.3.2 The Equivalence without Regularization |
|
|
60 | (1) |
|
3.3.3 The Equivalence with Regularization |
|
|
61 | (2) |
|
3.3.3.1 Regularization on X |
|
|
62 | (1) |
|
3.3.3.2 Regularization on Y |
|
|
62 | (1) |
|
3.3.4 Analysis of Regularization on CCA |
|
|
63 | (1) |
|
3.4 The Generalized Eigenvalue Problem |
|
|
64 | (5) |
|
3.4.1 The Generalized Rayleigh Quotient Cost Function |
|
|
64 | (1) |
|
3.4.2 Properties of the Generalized Eigenvalue Problem |
|
|
65 | (1) |
|
3.4.3 Algorithms for the Generalized Eigenvalue Problem .. |
|
|
66 | (3) |
| 4 Hypergraph Spectral Learning |
|
69 | (22) |
|
|
|
69 | (5) |
|
|
|
71 | (1) |
|
|
|
72 | (1) |
|
4.1.3 Hypergraph Laplacian |
|
|
73 | (1) |
|
4.2 Multi-Label Learning with a Hypergraph |
|
|
74 | (1) |
|
4.3 A Class of Generalized Eigenvalue Problems |
|
|
75 | (3) |
|
4.3.1 Canonical Correlation Analysis |
|
|
76 | (1) |
|
4.3.2 Orthonormalized Partial Least Squares |
|
|
76 | (1) |
|
4.3.3 Hypergraph Spectral Learning |
|
|
77 | (1) |
|
4.3.4 Linear Discriminant Analysis |
|
|
77 | (1) |
|
4.4 The Generalized Eigenvalue Problem versus the Least Squares Problem |
|
|
78 | (6) |
|
4.4.1 Multivariate Linear Regression and Least Squares |
|
|
78 | (1) |
|
4.4.2 Matrix Orthonormality Property |
|
|
79 | (2) |
|
4.4.3 The Equivalence Relationship |
|
|
81 | (1) |
|
4.4.4 Regularized Least Squares |
|
|
82 | (1) |
|
4.4.5 Efficient Implementation via LSQR |
|
|
83 | (1) |
|
|
|
84 | (7) |
|
4.5.1 Empirical Evaluation Setup |
|
|
84 | (1) |
|
4.5.2 Performance of Hypergraph Spectral Learning |
|
|
85 | (1) |
|
4.5.3 Evaluation of the Equivalence Relationship |
|
|
86 | (2) |
|
4.5.4 Evaluation of Scalability |
|
|
88 | (3) |
| 5 A Scalable Two-Stage Approach for Dimensionality Reduction |
|
91 | (14) |
|
5.1 The Two-Stage Approach without Regularization |
|
|
91 | (4) |
|
|
|
92 | (1) |
|
5.1.2 Time Complexity Analysis |
|
|
92 | (1) |
|
5.1.3 The Equivalence Relationship |
|
|
93 | (2) |
|
5.2 The Two-Stage Approach with Regularization |
|
|
95 | (4) |
|
|
|
96 | (1) |
|
5.2.2 Time Complexity Analysis |
|
|
96 | (1) |
|
5.2.3 The Equivalence Relationship |
|
|
96 | (3) |
|
|
|
99 | (6) |
|
5.3.1 Empirical Evaluation Setup |
|
|
99 | (1) |
|
5.3.2 Performance Comparison |
|
|
100 | (1) |
|
5.3.3 Scalability Comparison |
|
|
101 | (4) |
| 6 A Shared-Subspace Learning Framework |
|
105 | (18) |
|
|
|
105 | (4) |
|
6.1.1 Problem Formulation |
|
|
105 | (2) |
|
6.1.2 A Trace Ratio Formulation |
|
|
107 | (2) |
|
6.2 An Efficient Implementation |
|
|
109 | (2) |
|
|
|
109 | (1) |
|
|
|
110 | (1) |
|
|
|
111 | (1) |
|
|
|
111 | (1) |
|
6.4 Connections with Existing Formulations |
|
|
112 | (1) |
|
6.5 A Feature Space Formulation |
|
|
113 | (1) |
|
|
|
114 | (9) |
|
6.6.1 Empirical Evaluation Setup |
|
|
115 | (1) |
|
6.6.2 Web Page Categorization |
|
|
116 | (5) |
|
6.6.2.1 Performance Evaluation |
|
|
116 | (2) |
|
6.6.2.2 Scalability Evaluation |
|
|
118 | (3) |
|
6.6.2.3 Sensitivity Analysis |
|
|
121 | (1) |
|
|
|
121 | (2) |
| 7 Joint Dimensionality Reduction and Classification |
|
123 | (10) |
|
|
|
123 | (2) |
|
|
|
124 | (1) |
|
|
|
124 | (1) |
|
7.2 Joint Dimensionality Reduction and Multi-Label Classification |
|
|
125 | (4) |
|
7.2.1 Joint Learning with Squared Loss |
|
|
125 | (1) |
|
7.2.2 Joint Learning with Hinge Loss |
|
|
126 | (4) |
|
7.2.2.1 A Convex-Concave Formulation |
|
|
127 | (1) |
|
7.2.2.2 Solving the Min-Max Problem |
|
|
128 | (1) |
|
7.2.2.3 Learning Orthonormal Features |
|
|
128 | (1) |
|
7.2.2.4 Joint Learning with Squared Hinge Loss |
|
|
128 | (1) |
|
|
|
129 | (1) |
|
7.3 Dimensionality Reduction with Different Input Data |
|
|
129 | (1) |
|
|
|
130 | (3) |
|
7.4.1 Evaluation on Multi-Label Data Sets |
|
|
130 | (1) |
|
7.4.2 Evaluation on Data with Different Inputs |
|
|
131 | (2) |
| 8 Nonlinear Dimensionality Reduction: Algorithms and Applications |
|
133 | (22) |
|
8.1 Background on Kernel Methods |
|
|
133 | (1) |
|
8.2 Kernel Centering and Projection |
|
|
134 | (2) |
|
|
|
135 | (1) |
|
|
|
135 | (1) |
|
8.3 Kernel Canonical Correlation Analysis |
|
|
136 | (2) |
|
8.4 Kernel Hypergraph Spectral Learning |
|
|
138 | (1) |
|
8.5 The Generalized Eigenvalue Problem in the Kernel-Induced Feature Space |
|
|
139 | (1) |
|
8.6 Kernel Least Squares Regression |
|
|
140 | (1) |
|
8.7 Dimensionality Reduction and Least Squares Regression in the Feature Space |
|
|
140 | (3) |
|
8.7.1 Matrix Orthonormality Property |
|
|
140 | (2) |
|
8.7.2 The Equivalence Relationship |
|
|
142 | (1) |
|
8.8 Gene Expression Pattern Image Annotation |
|
|
143 | (12) |
|
8.8.1 Problem Description |
|
|
143 | (2) |
|
8.8.2 Feature Generation and Kernel Construction |
|
|
145 | (2) |
|
8.8.3 Multi-Label Multiple Kernel Learning |
|
|
147 | (2) |
|
8.8.4 Empirical Evaluation Setup |
|
|
149 | (1) |
|
|
|
150 | (5) |
| Appendix Proofs |
|
155 | (12) |
|
|
|
155 | (4) |
|
|
|
159 | (2) |
|
|
|
161 | (1) |
|
|
|
162 | (2) |
|
|
|
164 | (3) |
| References |
|
167 | (24) |
| Index |
|
191 | |