Muutke küpsiste eelistusi

Optimization for Data Analysis [Kõva köide]

(University of California, Berkeley), (University of Wisconsin, Madison)
  • Formaat: Hardback, 238 pages, kõrgus x laius x paksus: 235x156x16 mm, kaal: 450 g, Worked examples or Exercises
  • Ilmumisaeg: 21-Apr-2022
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1316518981
  • ISBN-13: 9781316518984
  • Formaat: Hardback, 238 pages, kõrgus x laius x paksus: 235x156x16 mm, kaal: 450 g, Worked examples or Exercises
  • Ilmumisaeg: 21-Apr-2022
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1316518981
  • ISBN-13: 9781316518984
Optimization techniques are at the core of data science, including data analysis and machine learning. An understanding of basic optimization techniques and their fundamental properties provides important grounding for students, researchers, and practitioners in these areas. This text covers the fundamentals of optimization algorithms in a compact, self-contained way, focusing on the techniques most relevant to data science. An introductory chapter demonstrates that many standard problems in data science can be formulated as optimization problems. Next, many fundamental methods in optimization are described and analyzed, including: gradient and accelerated gradient methods for unconstrained optimization of smooth (especially convex) functions; the stochastic gradient method, a workhorse algorithm in machine learning; the coordinate descent approach; several key algorithms for constrained optimization problems; algorithms for minimizing nonsmooth functions arising in data science; foundations of the analysis of nonsmooth functions and optimization duality; and the back-propagation approach, relevant to neural networks.

Arvustused

'This delightful compact tome gives the reader all the results they should have in their pocket to contribute to optimization and statistical learning. With the clean, elegant derivations of many of the foundational optimization methods underlying modern large-scale data analysis, everyone from students just getting started to researchers knowing this book inside and out will be well-positioned for both using the algorithms and developing new ones for machine learning, optimization, and statistics.' John C. Duchi, Stanford University 'Optimization algorithms play a vital role in the rapidly evolving field of machine learning, as well as in signal processing, statistics and control. Numerical optimization is a vast field, however, and a student wishing to learn the methods required in the world of data science could easily get lost in the literature. This book does a superb job of presenting the most important algorithms, providing both their mathematical foundations and lucid motivations for their development. Written by two of the foremost experts in the field, this book gently guides a reader without prior knowledge of optimization towards the methods and concepts that are central in modern data science applications.' Jorge Nocedal, Northwestern University 'This timely introductory book gives a rigorous view of continuous optimization techniques which are being used in machine learning. It is an excellent resource for those who are interested in understanding the mathematical concepts behind commonly used machine learning techniques.' Shai Shalev-Shwartz, Hebrew University of Jerusalem 'This textbook is a much-needed exposition of optimization techniques, presented with conciseness and precision, with emphasis on topics most relevant for data science and machine learning applications. I imagine that this book will be immensely popular in university courses across the globe, and become a standard reference used by researchers in the area.' Amitabh Basu, Johns Hopkins University

Muu info

A concise text that presents and analyzes the fundamental techniques and methods in optimization that are useful in data science.
Preface ix
1 Introduction
1(14)
1.1 Data Analysis and Optimization
1(3)
1.2 Least Squares
4(1)
1.3 Matrix Factorization Problems
5(1)
1.4 Support Vector Machines
6(3)
1.5 Logistic Regression
9(2)
1.6 Deep Learning
11(2)
1.7 Emphasis
13(2)
2 Foundations of Smooth Optimization
15(11)
2.1 A Taxonomy of Solutions to Optimization Problems
15(1)
2.2 Taylor's Theorem
16(2)
2.3 Characterizing Minima of Smooth Functions
18(2)
2.4 Convex Sets and Functions
20(2)
2.5 Strongly Convex Functions
22(4)
3 Descent Methods
26(29)
3.1 Descent Directions
27(1)
3.2 Steepest-Descent Method
28(5)
3.2.1 General Case
28(1)
3.2.2 Convex Case
29(1)
3.2.3 Strongly Convex Case
30(2)
3.2.4 Comparison between Rates
32(1)
3.3 Descent Methods: Convergence
33(3)
3.4 Line-Search Methods: Choosing the Direction
36(2)
3.5 Line-Search Methods: Choosing the Steplength
38(4)
3.6 Convergence to Approximate Second-Order Necessary Points
42(2)
3.7 Mirror Descent
44(7)
3.8 The KL and PL Properties
51(4)
4 Gradient Methods Using Momentum
55(20)
4.1 Motivation from Differential Equations
56(2)
4.2 Nesterov's Method: Convex Quadratics
58(4)
4.3 Convergence for Strongly Convex Functions
62(4)
4.4 Convergence for Weakly Convex Functions
66(2)
4.5 Conjugate Gradient Methods
68(2)
4.6 Lower Bounds on Convergence Rates
70(5)
5 Stochastic Gradient
75(25)
5.1 Examples and Motivation
76(4)
5.1.1 Noisy Gradients
76(1)
5.1.2 Incremental Gradient Method
77(1)
5.1.3 Classification and the Perceptron
77(1)
5.1.4 Empirical Risk Minimization
78(2)
5.2 Randomness and Steplength: Insights
80(5)
5.2.1 Example: Computing a Mean
80(2)
5.2.2 The Randomized Kaczmarz Method
82(3)
5.3 Key Assumptions for Convergence Analysis
85(2)
5.3.1 Case 1: Bounded Gradients: Lg =0
86(1)
5.3.2 Case 2: Randomized Kaczmarz: B -- 0, Lg > 0
86(1)
5.3.3 Case 3: Additive Gaussian Noise
86(1)
5.3.4 Case 4: Incremental Gradient
87(1)
5.4 Convergence Analysis
87(6)
5.4.1 Case 1: Lg = 0
89(1)
5.4.2 Case 2: B=0
90(2)
5.4.3 Case 3: B and Ls Both Nonzero
92(1)
5.5 Implementation Aspects
93(7)
5.5.1 Epochs
93(1)
5.5.2 Minibatching
94(1)
5.5.3 Acceleration Using Momentum
94(6)
6 Coordinate Descent
100(18)
6.1 Coordinate Descent in Machine Learning
101(2)
6.2 Coordinate Descent for Smooth Convex Functions
103(10)
6.2.1 Lipschitz Constants
104(1)
6.2.2 Randomized CD: Sampling with Replacement
105(5)
6.2.3 Cyclic CD
110(2)
6.2.4 Random Permutations CD: Sampling without Replacement
112(1)
6.3 Block-Coordinate Descent
113(5)
7 First-Order Methods for Constrained Optimization
118(14)
7.1 Optimality Conditions
118(2)
7.2 Euclidean Projection
120(2)
7.3 The Projected Gradient Algorithm
122(5)
7.3.1 General Case: A Short-Step Approach
123(1)
7.3.2 General Case: Backtracking
124(1)
7.3.3 Smooth Strongly Convex Case
125(1)
7.3.4 Momentum Variants
126(1)
7.3.5 Alternative Search Directions
126(1)
7.4 The Conditional Gradient (Frank-Wolfe) Method
127(5)
8 Nonsmooth Functions and Subgradients
132(21)
8.1 Subgradients and Subdifferentials
134(3)
8.2 The Subdifferential and Directional Derivatives
137(4)
8.3 Calculus of Subdifferentials
141(3)
8.4 Convex Sets and Convex Constrained Optimization
144(2)
8.5 Optimality Conditions for Composite Nonsmooth Functions
146(2)
8.6 Proximal Operators and the Moreau Envelope
148(5)
9 Nonsmooth Optimization Methods
153(17)
9.1 Subgradient Descent
155(1)
9.2 The Subgradient Method
156(4)
9.2.1 Steplengths
158(2)
9.3 Proximal-Gradient Algorithms for Regularized Optimization
160(4)
9.3.1 Convergence Rate for Convex
162(2)
9.4 Proximal Coordinate Descent for Structured Nonsmooth Functions
164(3)
9.5 Proximal Point Method
167(3)
10 Duality and Algorithms
170(18)
10.1 Quadratic Penalty Function
170(2)
10.2 Lagrangians and Duality
172(2)
10.3 First-Order Optimality Conditions
174(4)
10.4 Strong Duality
178(1)
10.5 Dual Algorithms
179(3)
10.5.1 Dual Subgradient
179(1)
10.5.2 Augmented Lagrangian Method
180(1)
10.5.3 Alternating Direction Method of Multipliers
181(1)
10.6 Some Applications of Dual Algorithms
182(6)
10.6.1 Consensus Optimization
182(2)
10.6.2 Utility Maximization
184(1)
10.6.3 Linear and Quadratic Programming
185(3)
11 Differentiation and Adjoints
188(12)
11.1 The Chain Rule for a Nested Composition of Vector Functions
188(2)
11.2 The Method of Adjoints
190(1)
11.3 Adjoints in Deep Learning
191(1)
11.4 Automatic Differentiation
192(3)
11.5 Derivations via the Lagrangian and Implicit Function Theorem
195(5)
11.5.1 A Constrained Optimization Formulation of the Progressive Function
195(2)
11.5.2 A General Perspective on Unconstrained and Constrained Formulations
197(1)
11.5.3 Extension: Control
197(3)
Appendix
200(16)
A.1 Definitions and Basic Concepts
200(3)
A.2 Convergence Rates and Iteration Complexity
203(1)
A.3 Algorithm 3.1 Is an Effective Line-Search Technique
204(1)
A.4 Linear Programming Duality, Theorems of the Alternative
205(3)
A.5 Limiting Feasible Directions
208(1)
A.6 Separation Results
209(4)
A.7 Bounds for Degenerate Quadratic Functions
213(3)
Bibliography 216(7)
Index 223
Stephen J. Wright holds the George B. Dantzig Professorship, the Sheldon Lubar Chair, and the Amar and Balinder Sohi Professorship of Computer Sciences at the University of Wisconsin-Madison and a Discovery Fellow in the Wisconsin Institute for Discovery. He works in computational optimization and its applications to data science and many other areas of science and engineering. He is a Fellow of SIAM, and recipient of the 2014 W. R. G. Baker Award from IEEE for most outstanding paper, the 2020 Khachiyan Prize by the INFORMS Optimization Society for lifetime achievements in optimization, and the 2020 NeurIPS Test of Time Award. Professor Wright is the author and co-author of widely used textbooks and reference books in optimization including Primal Dual Interior-Point Methods (1987) and Numerical Optimization (2006). Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research group studies how to make machine learning systems more robust to interactions with a dynamic and uncertain world by using mathematical tools from optimization, statistics, and dynamical systems. Professor Recht is the recipient of a Presidential Early Career Award for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, the 2015 William O. Baker Award for Initiatives in Research, and the 2017 and 2020 NeurIPS Test of Time Awards.