|
1 Machine Learning Models |
|
|
1 | (20) |
|
|
1 | (3) |
|
|
4 | (3) |
|
1.3 Generalized Linear Models |
|
|
7 | (4) |
|
|
7 | (1) |
|
|
8 | (3) |
|
1.4 Support Vector Machines |
|
|
11 | (4) |
|
1.5 Regularization, Lasso, and Ridge Regression |
|
|
15 | (1) |
|
1.6 Population Risk Minimization |
|
|
16 | (1) |
|
|
17 | (3) |
|
|
20 | (1) |
|
2 Convex Optimization Theory |
|
|
21 | (32) |
|
|
21 | (8) |
|
2.1.1 Definition and Examples |
|
|
21 | (2) |
|
2.1.2 Projection onto Convex Sets |
|
|
23 | (2) |
|
|
25 | (4) |
|
|
29 | (8) |
|
2.2.1 Definition and Examples |
|
|
29 | (1) |
|
2.2.2 Differentiable Convex Functions |
|
|
30 | (1) |
|
2.2.3 Non-differentiable Convex Functions |
|
|
31 | (2) |
|
2.2.4 Lipschitz Continuity of Convex Functions |
|
|
33 | (2) |
|
2.2.5 Optimality Conditions for Convex Optimization |
|
|
35 | (1) |
|
2.2.6 Representer Theorem and Kernel |
|
|
36 | (1) |
|
|
37 | (8) |
|
2.3.1 Lagrange Function and Duality |
|
|
38 | (1) |
|
2.3.2 Proof of Strong Duality |
|
|
39 | (2) |
|
|
41 | (1) |
|
2.3.4 Karush--Kuhn--Tucker Conditions |
|
|
42 | (2) |
|
2.3.5 Dual Support Vector Machine |
|
|
44 | (1) |
|
2.4 Legendre--Fenchel Conjugate Duality |
|
|
45 | (5) |
|
2.4.1 Closure of Convex Functions |
|
|
45 | (3) |
|
2.4.2 Conjugate Functions |
|
|
48 | (2) |
|
|
50 | (3) |
|
3 Deterministic Convex Optimization |
|
|
53 | (60) |
|
|
53 | (7) |
|
3.1.1 General Nonsmooth Convex Problems |
|
|
54 | (3) |
|
3.1.2 Nonsmooth Strongly Convex Problems |
|
|
57 | (1) |
|
3.1.3 Smooth Convex Problems |
|
|
58 | (2) |
|
3.1.4 Smooth and Strongly Convex Problems |
|
|
60 | (1) |
|
|
60 | (4) |
|
3.3 Accelerated Gradient Descent |
|
|
64 | (5) |
|
3.4 Game Interpretation for Accelerated Gradient Descent |
|
|
69 | (3) |
|
3.5 Smoothing Scheme for Nonsmooth Problems |
|
|
72 | (2) |
|
3.6 Primal-Dual Method for Saddle-Point Optimization |
|
|
74 | (10) |
|
3.6.1 General Bilinear Saddle Point Problems |
|
|
78 | (1) |
|
3.6.2 Smooth Bilinear Saddle Point Problems |
|
|
79 | (1) |
|
3.6.3 Smooth and Strongly Convex Bilinear Saddle Point Problems |
|
|
80 | (1) |
|
3.6.4 Linearly Constrained Problems |
|
|
81 | (3) |
|
3.7 Alternating Direction Method of Multipliers |
|
|
84 | (2) |
|
3.8 Mirror-Prox Method for Variational Inequalities |
|
|
86 | (6) |
|
3.8.1 Monotone Variational Inequalities |
|
|
87 | (2) |
|
3.8.2 Generalized Monotone Variational Inequalities |
|
|
89 | (3) |
|
3.9 Accelerated Level Method |
|
|
92 | (16) |
|
3.9.1 Nonsmooth, Smooth, and Weakly Smooth Problems |
|
|
93 | (9) |
|
3.9.2 Saddle Point Problems |
|
|
102 | (6) |
|
|
108 | (5) |
|
4 Stochastic Convex Optimization |
|
|
113 | (108) |
|
4.1 Stochastic Mirror Descent |
|
|
113 | (15) |
|
4.1.1 General Nonsmooth Convex Functions |
|
|
114 | (4) |
|
4.1.2 Smooth Convex Problems |
|
|
118 | (4) |
|
4.1.3 Accuracy Certificates |
|
|
122 | (6) |
|
4.2 Stochastic Accelerated Gradient Descent |
|
|
128 | (20) |
|
4.2.1 Problems Without Strong Convexity |
|
|
134 | (3) |
|
4.2.2 Nonsmooth Strongly Convex Problems |
|
|
137 | (2) |
|
4.2.3 Smooth and Strongly Convex Problems |
|
|
139 | (5) |
|
4.2.4 Accuracy Certificates |
|
|
144 | (4) |
|
4.3 Stochastic Convex-Concave Saddle Point Problems |
|
|
148 | (10) |
|
4.3.1 General Algorithmic Framework |
|
|
149 | (4) |
|
4.3.2 Minimax Stochastic Problems |
|
|
153 | (2) |
|
4.3.3 Bilinear Matrix Games |
|
|
155 | (3) |
|
4.4 Stochastic Accelerated Primal-Dual Method |
|
|
158 | (23) |
|
4.4.1 Accelerated Primal-Dual Method |
|
|
160 | (10) |
|
4.4.2 Stochastic Bilinear Saddle Point Problems |
|
|
170 | (11) |
|
4.5 Stochastic Accelerated Mirror-Prox Method |
|
|
181 | (18) |
|
4.5.1 Algorithmic Framework |
|
|
182 | (2) |
|
4.5.2 Convergence Analysis |
|
|
184 | (15) |
|
4.6 Stochastic Block Mirror Descent Method |
|
|
199 | (19) |
|
4.6.1 Nonsmooth Convex Optimization |
|
|
201 | (10) |
|
4.6.2 Convex Composite Optimization |
|
|
211 | (7) |
|
|
218 | (3) |
|
5 Convex Finite-Sum and Distributed Optimization |
|
|
221 | (84) |
|
5.1 Random Primal-Dual Gradient Method |
|
|
221 | (30) |
|
5.1.1 Multi-Dual-Player Game Reformulation |
|
|
225 | (2) |
|
5.1.2 Randomization on Gradient Computation |
|
|
227 | (3) |
|
5.1.3 Convergence for Strongly Convex Problems |
|
|
230 | (11) |
|
5.1.4 Lower Complexity Bound for Randomized Methods |
|
|
241 | (5) |
|
5.1.5 Generalization to Problems Without Strong Convexity |
|
|
246 | (5) |
|
5.2 Random Gradient Extrapolation Method |
|
|
251 | (27) |
|
5.2.1 Gradient Extrapolation Method |
|
|
253 | (7) |
|
5.2.2 Deterministic Finite-Sum Problems |
|
|
260 | (11) |
|
5.2.3 Stochastic Finite-Sum Problems |
|
|
271 | (5) |
|
5.2.4 Distributed Implementation |
|
|
276 | (2) |
|
5.3 Variance-Reduced Mirror Descent |
|
|
278 | (9) |
|
5.3.1 Smooth Problems Without Strong Convexity |
|
|
282 | (2) |
|
5.3.2 Smooth and Strongly Convex Problems |
|
|
284 | (3) |
|
5.4 Variance-Reduced Accelerated Gradient Descent |
|
|
287 | (15) |
|
5.4.1 Smooth Problems Without Strong Convexity |
|
|
290 | (4) |
|
5.4.2 Smooth and Strongly Convex Problems |
|
|
294 | (6) |
|
5.4.3 Problems Satisfying an Error-Bound Condition |
|
|
300 | (2) |
|
|
302 | (3) |
|
|
305 | (116) |
|
6.1 Unconstrained Nonconvex Stochastic Optimization |
|
|
305 | (23) |
|
6.1.1 Stochastic First-Order Methods |
|
|
308 | (10) |
|
6.1.2 Stochastic Zeroth-Order Methods |
|
|
318 | (10) |
|
6.2 Nonconvex Stochastic Composite Optimization |
|
|
328 | (24) |
|
6.2.1 Some Properties of Prox-Mapping |
|
|
330 | (2) |
|
6.2.2 Nonconvex Mirror Descent Methods |
|
|
332 | (2) |
|
6.2.3 Nonconvex Stochastic Mirror Descent Methods |
|
|
334 | (13) |
|
6.2.4 Stochastic Zeroth-Order Methods for Composite Problems |
|
|
347 | (5) |
|
6.3 Nonconvex Stochastic Block Mirror Descent |
|
|
352 | (7) |
|
6.4 Nonconvex Stochastic Accelerated Gradient Descent |
|
|
359 | (29) |
|
6.4.1 Nonconvex Accelerated Gradient Descent |
|
|
361 | (13) |
|
6.4.2 Stochastic Accelerated Gradient Descent Method |
|
|
374 | (14) |
|
6.5 Nonconvex Variance-Reduced Mirror Descent |
|
|
388 | (7) |
|
6.5.1 Basic Scheme for Deterministic Problems |
|
|
389 | (3) |
|
6.5.2 Generalization for Stochastic Optimization Problems |
|
|
392 | (3) |
|
6.6 Randomized Accelerated Proximal-Point Methods |
|
|
395 | (24) |
|
6.6.1 Nonconvex Finite-Sum Problems |
|
|
396 | (12) |
|
6.6.2 Nonconvex Multi-Block Problems |
|
|
408 | (11) |
|
|
419 | (2) |
|
7 Projection-Free Methods |
|
|
421 | (62) |
|
7.1 Conditional Gradient Method |
|
|
421 | (23) |
|
7.1.1 Classic Conditional Gradient |
|
|
423 | (10) |
|
7.1.2 New Variants of Conditional Gradient |
|
|
433 | (6) |
|
7.1.3 Lower Complexity Bound |
|
|
439 | (5) |
|
7.2 Conditional Gradient Sliding Method |
|
|
444 | (24) |
|
7.2.1 Deterministic Conditional Gradient Sliding |
|
|
446 | (10) |
|
7.2.2 Stochastic Conditional Gradient Sliding Method |
|
|
456 | (8) |
|
7.2.3 Generalization to Saddle Point Problems |
|
|
464 | (4) |
|
7.3 Nonconvex Conditional Gradient Method |
|
|
468 | (1) |
|
7.4 Stochastic Nonconvex Conditional Gradient |
|
|
469 | (8) |
|
7.4.1 Basic Scheme for Finite-Sum Problems |
|
|
470 | (4) |
|
7.4.2 Generalization for Stochastic Optimization Problems |
|
|
474 | (3) |
|
7.5 Stochastic Nonconvex Conditional Gradient Sliding |
|
|
477 | (5) |
|
7.5.1 Wolfe Gap vs Projected Gradient |
|
|
477 | (1) |
|
7.5.2 Projection-Free Method to Drive Projected Gradient Small |
|
|
478 | (4) |
|
|
482 | (1) |
|
8 Operator Sliding and Decentralized Optimization |
|
|
483 | (84) |
|
8.1 Gradient Sliding for Composite Optimization |
|
|
483 | (26) |
|
8.1.1 Deterministic Gradient Sliding |
|
|
486 | (11) |
|
8.1.2 Stochastic Gradient Sliding |
|
|
497 | (7) |
|
8.1.3 Strongly Convex and Structured Nonsmooth Problems |
|
|
504 | (5) |
|
8.2 Accelerated Gradient Sliding |
|
|
509 | (23) |
|
8.2.1 Composite Smooth Optimization |
|
|
512 | (15) |
|
8.2.2 Composite Bilinear Saddle Point Problems |
|
|
527 | (5) |
|
8.3 Communication Sliding and Decentralized Optimization |
|
|
532 | (33) |
|
8.3.1 Problem Formulation |
|
|
535 | (3) |
|
8.3.2 Decentralized Communication Sliding |
|
|
538 | (11) |
|
8.3.3 Stochastic Decentralized Communication Sliding |
|
|
549 | (6) |
|
8.3.4 High Probability Results |
|
|
555 | (2) |
|
8.3.5 Convergence Analysis |
|
|
557 | (8) |
|
|
565 | (2) |
References |
|
567 | (10) |
Index |
|
577 | |