Preface |
|
ix | |
Nomenclature |
|
xii | |
|
|
1 | (60) |
|
|
3 | (36) |
|
|
3 | (7) |
|
|
10 | (29) |
|
2 Variational Bayesian Learning |
|
|
39 | (22) |
|
|
39 | (12) |
|
2.2 Other Approximation Methods |
|
|
51 | (10) |
|
|
61 | (86) |
|
3 VB Algorithm for Multilinear Models |
|
|
63 | (40) |
|
|
63 | (11) |
|
3.2 Matrix Factorization with Missing Entries |
|
|
74 | (6) |
|
|
80 | (7) |
|
3.4 Low-Rank Subspace Clustering |
|
|
87 | (6) |
|
3.5 Sparse Additive Matrix Factorization |
|
|
93 | (10) |
|
4 VB Algorithm for Latent Variable Models |
|
|
103 | (29) |
|
4.1 Finite Mixture Models |
|
|
103 | (12) |
|
4.2 Other Latent Variable Models |
|
|
115 | (17) |
|
5 VB Algorithm under No Conjugacy |
|
|
132 | (15) |
|
|
132 | (3) |
|
5.2 Sparsity-Inducing Prior |
|
|
135 | (2) |
|
5.3 Unified Approach by Local VB Bounds |
|
|
137 | (10) |
|
Part III Nonasymptotic Theory |
|
|
147 | (192) |
|
6 Global VB Solution of Fully Observed Matrix Factorization |
|
|
149 | (35) |
|
|
150 | (2) |
|
6.2 Conditions for VB Solutions |
|
|
152 | (1) |
|
6.3 Irrelevant Degrees of Freedom |
|
|
153 | (4) |
|
|
157 | (3) |
|
6.5 Problem Decomposition |
|
|
160 | (2) |
|
6.6 Analytic Form of Global VB Solution |
|
|
162 | (1) |
|
6.7 Proofs of Theorem 6.7 and Corollary 6.8 |
|
|
163 | (8) |
|
6.8 Analytic Form of Global Empirical VB Solution |
|
|
171 | (2) |
|
6.9 Proof of Theorem 6.13 |
|
|
173 | (7) |
|
6.10 Summary of Intermediate Results |
|
|
180 | (4) |
|
7 Model-Induced Regularization and Sparsity Inducing Mechanism |
|
|
184 | (21) |
|
7.1 VB Solutions for Special Cases |
|
|
184 | (3) |
|
7.2 Posteriors and Estimators in a One-Dimensional Case |
|
|
187 | (8) |
|
7.3 Model-Induced Regularization |
|
|
195 | (7) |
|
7.4 Phase Transition in VB Learning |
|
|
202 | (2) |
|
7.5 Factorization as ARD Model |
|
|
204 | (1) |
|
8 Performance Analysis of VB Matrix Factorization |
|
|
205 | (31) |
|
8.1 Objective Function for Noise Variance Estimation |
|
|
205 | (2) |
|
8.2 Bounds of Noise Variance Estimator |
|
|
207 | (2) |
|
8.3 Proofs of Theorem 8.2 and Corollary 8.3 |
|
|
209 | (5) |
|
|
214 | (14) |
|
8.5 Numerical Verification |
|
|
228 | (2) |
|
8.6 Comparison with Laplace Approximation |
|
|
230 | (2) |
|
8.7 Optimality in Large-Scale Limit |
|
|
232 | (4) |
|
9 Global Solver for Matrix Factorization |
|
|
236 | (19) |
|
9.1 Global VB Solver for Fully Observed MF |
|
|
236 | (2) |
|
9.2 Global EVB Solver for Fully Observed MF |
|
|
238 | (4) |
|
9.3 Empirical Comparison with the Standard VB Algorithm |
|
|
242 | (5) |
|
9.4 Extension to Nonconjugate MF with Missing Entries |
|
|
247 | (8) |
|
10 Global Solver for Low-Rank Subspace Clustering |
|
|
255 | (24) |
|
|
255 | (3) |
|
10.2 Conditions for VB Solutions |
|
|
258 | (1) |
|
10.3 Irrelevant Degrees of Freedom |
|
|
259 | (1) |
|
10.4 Proof of Theorem 10.2 |
|
|
259 | (5) |
|
10.5 Exact Global VB Solver (EGVBS) |
|
|
264 | (3) |
|
10.6 Approximate Global VB Solver (AGVBS) |
|
|
267 | (3) |
|
10.7 Proof of Theorem 10.7 |
|
|
270 | (4) |
|
10.8 Empirical Evaluation |
|
|
274 | (5) |
|
11 Efficient Solver for Sparse Additive Matrix Factorization |
|
|
279 | (15) |
|
|
279 | (3) |
|
11.2 Efficient Algorithm for SAMF |
|
|
282 | (2) |
|
11.3 Experimental Results |
|
|
284 | (10) |
|
12 MAP and Partially Bayesian Learning |
|
|
294 | (45) |
|
12.1 Theoretical Analysis in Fully Observed MF |
|
|
295 | (34) |
|
|
329 | (3) |
|
12.3 Experimental Results |
|
|
332 | (7) |
|
Part IV Asymptotic Theory |
|
|
339 | (177) |
|
13 Asymptotic Learning Theory |
|
|
341 | (44) |
|
13.1 Statistical Learning Machines |
|
|
341 | (3) |
|
13.2 Basic Tools for Asymptotic Analysis |
|
|
344 | (2) |
|
|
346 | (5) |
|
13.4 Asymptotic Learning Theory for Regular Models |
|
|
351 | (15) |
|
13.5 Asymptotic Learning Theory for Singular Models |
|
|
366 | (16) |
|
13.6 Asymptotic Learning Theory for VB Learning |
|
|
382 | (3) |
|
14 Asymptotic VB Theory of Reduced Rank Regression |
|
|
385 | (44) |
|
14.1 Reduced Rank Regression |
|
|
385 | (11) |
|
14.2 Generalization Properties |
|
|
396 | (30) |
|
14.3 Insights into VB Learning |
|
|
426 | (3) |
|
15 Asymptotic VB Theory of Mixture Models |
|
|
429 | (26) |
|
|
429 | (5) |
|
15.2 Mixture of Gaussians |
|
|
434 | (9) |
|
15.3 Mixture of Exponential Family Distributions |
|
|
443 | (8) |
|
15.4 Mixture of Bernoulli with Deterministic Components |
|
|
451 | (4) |
|
16 Asymptotic VB Theory of Other Latent Variable Models |
|
|
455 | (45) |
|
|
455 | (6) |
|
16.2 Hidden Markov Models |
|
|
461 | (5) |
|
16.3 Probabilistic Context-Free Grammar |
|
|
466 | (4) |
|
16.4 Latent Dirichlet Allocation |
|
|
470 | (30) |
|
17 Unified Theory for Latent Variable Models |
|
|
500 | (16) |
|
17.1 Local Latent Variable Model |
|
|
500 | (4) |
|
17.2 Asymptotic Upper-Bound for VB Free Energy |
|
|
504 | (3) |
|
17.3 Example: Average VB Free Energy of Gaussian Mixture Model |
|
|
507 | (4) |
|
17.4 Free Energy and Generalization Error |
|
|
511 | (2) |
|
17.5 Relation to Other Analyses |
|
|
513 | (3) |
Appendix A James---Stein Estimator |
|
516 | (4) |
Appendix B Metric in Parameter Space |
|
520 | (5) |
Appendix C Detailed Description of Overlap Method |
|
525 | (2) |
Appendix D Optimality of Bayesian Learning |
|
527 | (2) |
Bibliography |
|
529 | (11) |
Subject Index |
|
540 | |