Muutke küpsiste eelistusi

Foundations of Machine Learning [Kõva köide]

(New York University), (University of California, Berkeley), (Google, Inc.)
  • Formaat: Hardback, 432 pages, kõrgus x laius x paksus: 229x178x21 mm, kaal: 1111 g, 55 color illus., 40 b&w illus.
  • Sari: Adaptive Computation and Machine Learning series
  • Ilmumisaeg: 17-Aug-2012
  • Kirjastus: MIT Press
  • ISBN-10: 026201825X
  • ISBN-13: 9780262018258
Teised raamatud teemal:
  • Formaat: Hardback, 432 pages, kõrgus x laius x paksus: 229x178x21 mm, kaal: 1111 g, 55 color illus., 40 b&w illus.
  • Sari: Adaptive Computation and Machine Learning series
  • Ilmumisaeg: 17-Aug-2012
  • Kirjastus: MIT Press
  • ISBN-10: 026201825X
  • ISBN-13: 9780262018258
Teised raamatud teemal:

This graduate-level textbook introduces fundamental concepts and methods in machinelearning. It describes several important modern algorithms, provides the theoretical underpinningsof these algorithms, and illustrates key aspects for their application. The authors aim to presentnovel theoretical tools and concepts while giving concise proofs even for relatively advancedtopics. Foundations of Machine Learning fills the need for a general textbookthat also offers theoretical details and an emphasis on proofs. Certain topics that are oftentreated with insufficient attention are discussed in more detail here; for example, entire chaptersare devoted to regression, multi-class classification, and ranking. The first three chapters lay thetheoretical foundation for what follows, but each remaining chapter is mostly self-contained. Theappendix offers a concise probability review, a short introduction to convex optimization, tools forconcentration bounds, and several basic properties of matrices and norms used in thebook.

The book is intended for graduate students and researchers in machinelearning, statistics, and related areas; it can be used either as a textbook or as a reference textfor a research seminar.

Preface x
1 Introduction
1(1)
1.1 Applications and problems
1(2)
1.2 Definitions and terminology
3(2)
1.3 Cross-validation
5(2)
1.4 Learning scenarios
7(1)
1.5 Outline
8
2 The PAC Learning Framework
1(32)
2.1 The PAC learning model
11(6)
2.2 Guarantees for finite hypothesis sets --- consistent case
17(4)
2.3 Guarantees for finite hypothesis sets --- inconsistent case
21(3)
2.4 Generalities
24(4)
2.4.1 Deterministic versus stochastic scenarios
24(1)
2.4.2 Bayes error and noise
25(1)
2.4.3 Estimation and approximation errors
26(1)
2.4.4 Model selection
27(1)
2.5
Chapter notes
28(1)
2.6 Exercises
29(4)
3 Rademacher Complexity and VC-Dimension
33(30)
3.1 Rademacher complexity
34(4)
3.2 Growth function
38(3)
3.3 VC-dimension
41(7)
3.4 Lower bounds
48(6)
3.5
Chapter notes
54(1)
3.6 Exercises
55(8)
4 Support Vector Machines
63(26)
4.1 Linear classification
63(1)
4.2 SVMs --- separable case
64(7)
4.2.1 Primal optimization problem
64(2)
4.2.2 Support vectors
66(1)
4.2.3 Dual optimization problem
67(2)
4.2.4 Lcav0-one-out analysis
69(2)
4.3 SVMs non-separable case
71(4)
4.3.1 Primal optimization problem
72(1)
4.3.2 Support vectors
73(1)
4.3.3 Dual optimization problem
74(1)
4.4 Margin theory
75(8)
4.5
Chapter notes
83(1)
4.6 Exercises
84(5)
5 Kernel Methods
89(32)
5.1 Introduction
89(3)
5.2 Positive definite symmetric kernels
92(8)
5.2.1 Definitions
92(2)
5.2.2 Reproducing kernel Hilbert space
94(2)
5.2.3 Properties
96(4)
5.3 Kernel-based algorithms
100(3)
5.3.1 SVMs with PDS kernels
100(1)
5.3.2 Representer theorem
101(1)
5.3.3 Learning guarantees
102(1)
5.4 Negative definite symmetrics Kernels
103(3)
5.5 Sequence kernels
106(9)
5.5.1 Weighted transducers
106(5)
5.5.2 Rational kernels
111(4)
5.6
Chapter notes
115(1)
5.7 Exercises
116(5)
6 Boosting
121(26)
6.2 Introduction
121(1)
6.2 AdaBoost
122(8)
6.2.1 Bound on the empirical error
124(2)
6.2.2 Relationship with coordinate descent
126(3)
6.2.3 Relationship with logistic regression
129(1)
6.2.4 Standard use in practice
129(1)
6.3 Theoretical results
130(10)
6.3.1 VC-dimension-based analysis
131(1)
6.3.2 Margin-based analysis
131(5)
6.3.3 Margin maximization
136(1)
6.3.4 Game-theoretic interpretation
137(3)
6.4 Discussion
140(1)
6.5
Chapter notes
141(1)
6.6 Exercises
142(5)
7 On-Line Learning
147(36)
7.1 Introduction
147(1)
7.2 Prediction with expert advice
148(11)
7.2.1 Mistake bounds and Halving algorithm
148(2)
7.2.2 Weighted majority algorithm
150(2)
7.2.3 Randomized weighted majority algorithm
152(4)
7.2.4 Exponential weighted average algorithm
156(3)
7.3 Linear classification
159(12)
7.3.1 Perceptron algorithm
160(8)
7.3.2 Winnow algorithm
168(3)
7.4 On-line to batch conversion
171(3)
7.5 Game-theoretic connection
174(1)
7.6
Chapter notes
175(1)
7.7 Exercises
176(7)
8 Multi-Class Classification
183(26)
8.1 Multi-class classification problem
183(2)
8.2 Generalization bounds
185(6)
8.3 Uncombined multi-class algorithms
191(7)
8.3.1 Multi-class SVMs
191(1)
8.3.2 Multi-class boosting algorithms
192(2)
8.3.3 Decision trees
194(4)
8.4 Aggregated multi-class algorithms
198(5)
8.4.1 One-versus-all
198(1)
8.4.2 One-versus-one
199(2)
8.4.3 Error-correction codes
201(2)
8.5 Structured prediction algorithms
203(3)
8.6
Chapter notes
206(1)
8.7 Exercises
207(2)
9 Ranking
209(28)
9.1 The problem of ranking
209(2)
9.2 Generalization bound
211(2)
9.3 Ranking with SVMs
213(1)
9.4 RankBoost
214(7)
9.4.1 Bound on the empirical error
216(2)
9.4.2 Relationship with coordinate descent
218(2)
9.4.3 Margin bound for ensemble methods in ranking
220(1)
9.5 Bipartite ranking
221(5)
9.5.1 Boosting in bipartite ranking
222(2)
9.5.2 Area under the ROC curve
224(2)
9.6 Preference-based setting
226(6)
9.6.1 Second-stage ranking problem
227(2)
9.6.2 Deterministic algorithm
229(1)
9.6.3 Randomized algorithm
230(1)
9.6.4 Extension to other loss functions
231(1)
9.7 Discussion
232(1)
9.8
Chapter notes
233(1)
9.9 Exercises
234(3)
10 Regression
237(30)
10.1 The problem of regression
237(1)
10.2 Generalization bounds
238(7)
10.2.1 Finite hypothesis sets
238(1)
10.2.2 Rademacher complexity bounds
239(2)
10.2.3 Pseudo-dimension bounds
241(4)
10.3 Regression algorithms
245(17)
10.3.1 Linear regression
245(2)
10.3.2 Kernel ridge regression
247(5)
10.3.3 Support vector regression
252(5)
10.3.4 Lasso
257(3)
10.3.5 Group norm regression algorithms
260(1)
10.3.6 On-line regression algorithms
261(1)
10.4
Chapter notes
262(1)
10.5 Exercises
263(4)
11 Algorithmic Stability
267(14)
11.1 Definitions
267(1)
11.2 Stability-based generalization guarantee
268(2)
11.3 Stability of kernel-based regularization algorithms
270(7)
11.3.1 Application to regression algorithms: SVR and KRR
274(2)
11.3.2 Application to classification algorithms: SVMs
276(1)
11.3.3 Discussion
276(1)
11.4
Chapter notes
277(1)
11.5 Exercises
277(4)
12 Dimensionality Reduction
281(12)
12.1 Principal Component Analysis
282(1)
12.2 Kernel Principal Component Analysis (KPCA)
283(2)
12.3 KPCA and manifold learning
285(3)
12.3.1 Isomap
285(1)
12.3.2 Laplacian eigenmaps
285(2)
12.3.3 Locally linear embedding (LLE)
287(1)
12.4 Johnson-Lindenstrauss lemma
288(2)
12.5
Chapter notes
290(1)
12.6 Exercises
290(3)
13 Learning Automata and Languages
293(20)
13.1 Introduction
293(1)
13.2 Finite automata
294(1)
13.3 Efficient exact learning
295(8)
13.3.1 Passive learning
296(1)
13.3.2 Learning with queries
297(1)
13.3.3 Learning automata with queries
298(5)
13.4 Identification in the limit
303(6)
13.4.1 Learning reversible automata
304(5)
13.5
Chapter notes
309(1)
13.6 Exercises
310(3)
14 Reinforcement Learning
313(26)
14.1 Learning scenario
313(1)
14.2 Markov decision process model
314(1)
14.3 Policy
315(4)
14.3.1 Definition
315(1)
14.3.2 Policy value
316(1)
14.3.3 Policy evaluation
316(2)
14.3.4 Optimal policy
318(1)
14.4 Planning algorithms
319(6)
14.4.1 Value iteration
319(3)
14.4.2 Policy iteration
322(2)
14.4.3 Linear programming
324(1)
14.5 Learning algorithms
325(12)
14.5.1 Stochastic approximation
326(4)
14.5.2 TD(0) algorithm
330(1)
14.5.3 Q-learning algorithm
331(3)
14.5.4 SARSA
334(1)
14.5.5 TD(λ) algorithm
335(1)
14.5.6 Large state space
336(1)
14.6
Chapter notes
337(2)
Conclusion
339(2)
A Linear Algebra Review
341(8)
A.1 Vectors and norms
341(3)
A.1.1 Norms
341(1)
A.1.2 Dual norms
342(2)
A.2 Matrices
344(5)
A.2.1 Matrix norms
344(1)
A.2.2 Singular value decomposition
345(1)
A.2.3 Symmetric positive semidefinite (SPSD) matrices
346(3)
B Convex Optimization
349(10)
B.1 Differentiation and unconstrained optimization
349(1)
B.2 Convexity
350(3)
B.3 Constrained optimization
353(4)
B.4
Chapter notes
357(2)
C Probability Review
359(10)
C.1 Probability
359(1)
C.2 Random variables
359(2)
C.3 Conditional probability and independence
361(2)
C.4 Expectation, Markov's inequality, and moment-generating function
363(2)
C.5 Variance and Chebyshev's inequality
365(4)
D Concentration inequalities
369(10)
D.1 Hoeffding's inequality
369(2)
D.2 McDiarmid's inequality
371(2)
D.3 Other inequalities
373(3)
D.3.1 Binomial distribution: Slud's inequality
374(1)
D.3.2 Normal distribution: tail bound
374(1)
D.3.3 Khintchine-Kahane inequality
374(2)
D.4
Chapter notes
376(1)
D.5 Exercises
377(2)
E Notation
379(2)
References 381(16)
Index 397