Muutke küpsiste eelistusi

E-raamat: Foundations of Machine Learning, second edition

(University of California, Berkeley), (Google, Inc.), (New York University)
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 82,08 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

A new edition of a graduate-level machine learning textbook that focuses on the analysis and theory of algorithms.

This book is a general introduction to machine learning that can serve as a textbook for graduate students and a reference for researchers. It covers fundamental modern topics in machine learning while providing the theoretical basis and conceptual tools needed for the discussion and justification of algorithms. It also describes several key aspects of the application of these algorithms. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics.

Foundations of Machine Learning is unique in its focus on the analysis and theory of algorithms. The first four chapters lay the theoretical foundation for what follows; subsequent chapters are mostly self-contained. Topics covered include the Probably Approximately Correct (PAC) learning framework; generalization bounds based on Rademacher complexity and VC-dimension; Support Vector Machines (SVMs); kernel methods; boosting; on-line learning; multi-class classification; ranking; regression; algorithmic stability; dimensionality reduction; learning automata and languages; and reinforcement learning. Each chapter ends with a set of exercises. Appendixes provide additional material including concise probability review.

This second edition offers three new chapters, on model selection, maximum entropy models, and conditional entropy models. New material in the appendixes includes a major section on Fenchel duality, expanded coverage of concentration inequalities, and an entirely new entry on information theory. More than half of the exercises are new to this edition.



A new edition of a graduate-level machine learning textbook that focuses on the analysis and theory of algorithms.
Preface xiii
1 Introduction
1(8)
1.1 What is machine learning?
1(1)
1.2 What kind of problems can be tackled using machine learning?
2(1)
1.3 Some standard learning tasks
3(1)
1.4 Learning stages
4(2)
1.5 Learning scenarios
6(1)
1.6 Generalization
7(2)
2 The PAC Learning Framework
9(20)
2.1 The PAC learning model
9(6)
2.2 Guarantees for finite hypothesis sets --- consistent case
15(4)
2.3 Guarantees for finite hypothesis sets --- inconsistent case
19(2)
2.4 Generalities
21(2)
2.4.1 Deterministic versus stochastic scenarios
21(1)
2.4.2 Bayes error and noise
22(1)
2.5
Chapter notes
23(1)
2.6 Exercises
23(6)
3 Rademacher Complexity and VC-Dimension
29(32)
3.1 Rademacher complexity
30(4)
3.2 Growth function
34(2)
3.3 VC-dimension
36(7)
3.4 Lower bounds
43(5)
3.5
Chapter notes
48(2)
3.6 Exercises
50(11)
4 Model Selection
61(18)
4.1 Estimation and approximation errors
61(1)
4.2 Empirical risk minimization (ERM)
62(2)
4.3 Structural risk minimization (SRM)
64(4)
4.4 Cross-validation
68(3)
4.5 n-Fold cross-validation
71(1)
4.6 Regularization-based algorithms
72(1)
4.7 Convex surrogate losses
73(4)
4.8
Chapter notes
77(1)
4.9 Exercises
78(1)
5 Support Vector Machines
79(26)
5.1 Linear classification
79(1)
5.2 Separable case
80(7)
5.2.1 Primal optimization problem
81(2)
5.2.2 Support vectors
83(1)
5.2.3 Dual optimization problem
83(2)
5.2.4 Leave-one-out analysis
85(2)
5.3 Non-separable case
87(4)
5.3.1 Primal optimization problem
88(1)
5.3.2 Support vectors
89(1)
5.3.3 Dual optimization problem
90(1)
5.4 Margin theory
91(9)
5.5
Chapter notes
100(1)
5.6 Exercises
100(5)
6 Kernel Methods
105(40)
6.1 Introduction
105(3)
6.2 Positive definite symmetric kernels
108(8)
6.2.1 Definitions
108(2)
6.2.2 Reproducing kernel Hilbert space
110(2)
6.2.3 Properties
112(4)
6.3 Kernel-based algorithms
116(3)
6.3.1 SVMs with PDS kernels
116(1)
6.3.2 Representer theorem
117(1)
6.3.3 Learning guarantees
117(2)
6.4 Negative definite symmetric kernels
119(2)
6.5 Sequence kernels
121(9)
6.5.1 Weighted transducers
122(4)
6.5.2 Rational kernels
126(4)
6.6 Approximate kernel feature maps
130(5)
6.7
Chapter notes
135(2)
6.8 Exercises
137(8)
7 Boosting
145(32)
7.1 Introduction
145(1)
7.2 AdaBoost
146(8)
7.2.1 Bound on the empirical error
149(1)
7.2.2 Relationship with coordinate descent
150(4)
7.2.3 Practical use
154(1)
7.3 Theoretical results
154(11)
7.3.1 VC-dimension-based analysis
154(1)
7.3.2 L1-geometric margin
155(2)
7.3.3 Margin-based analysis
157(4)
7.3.4 Margin maximization
161(1)
7.3.5 Game-theoretic interpretation
162(3)
7.4 Li-regularization
165(2)
7.5 Discussion
167(1)
7.6
Chapter notes
168(2)
7.7 Exercises
170(7)
8 On-Line Learning
177(36)
8.1 Introduction
178(1)
8.2 Prediction with expert advice
178(12)
8.2.1 Mistake bounds and Halving algorithm
179(2)
8.2.2 Weighted majority algorithm
181(2)
8.2.3 Randomized weighted majority algorithm
183(3)
8.2.4 Exponential weighted average algorithm
186(4)
8.3 Linear classification
190(11)
8.3.1 Perceptron algorithm
190(8)
8.3.2 Winnow algorithm
198(3)
8.4 On-line to batch conversion
201(3)
8.5 Game-theoretic connection
204(1)
8.6
Chapter notes
205(1)
8.7 Exercises
206(7)
9 Multi-Class Classification
213(26)
9.1 Multi-class classification problem
213(2)
9.2 Generalization bounds
215(6)
9.3 Uncombined multi-class algorithms
221(7)
9.3.1 Multi-class SVMs
221(1)
9.3.2 Multi-class boosting algorithms
222(2)
9.3.3 Decision trees
224(4)
9.4 Aggregated multi-class algorithms
228(5)
9.4.1 One-versus-all
229(1)
9.4.2 One-versus-one
229(2)
9.4.3 Error-correcting output codes
231(2)
9.5 Structured prediction algorithms
233(2)
9.6
Chapter notes
235(2)
9.7 Exercises
237(2)
10 Ranking
239(28)
10.1 The problem of ranking
240(1)
10.2 Generalization bound
241(2)
10.3 Ranking with SVMs
243(1)
10.4 RankBoost
244(7)
10.4.1 Bound on the empirical error
246(2)
10.4.2 Relationship with coordinate descent
248(2)
10.4.3 Margin bound for ensemble methods in ranking
250(1)
10.5 Bipartite ranking
251(6)
10.5.1 Boosting in bipartite ranking
252(3)
10.5.2 Area under the ROC curve
255(2)
10.6 Preference-based setting
257(5)
10.6.1 Second-stage ranking problem
257(2)
10.6.2 Deterministic algorithm
259(1)
10.6.3 Randomized algorithm
260(2)
10.6.4 Extension to other loss functions
262(1)
10.7 Other ranking criteria
262(1)
10.8
Chapter notes
263(1)
10.9 Exercises
264(3)
11 Regression
267(28)
11.1 The problem of regression
267(1)
11.2 Generalization bounds
268(7)
11.2.1 Finite hypothesis sets
268(1)
11.2.2 Rademacher complexity bounds
269(2)
11.2.3 Pseudo-dimension bounds
271(4)
11.3 Regression algorithms
275(15)
11.3.1 Linear regression
275(1)
11.3.2 Kernel ridge regression
276(5)
11.3.3 Support vector regression
281(4)
11.3.4 Lasso
285(4)
11.3.5 Group norm regression algorithms
289(1)
11.3.6 On-line regression algorithms
289(1)
11.4
Chapter notes
290(2)
11.5 Exercises
292(3)
12 Maximum Entropy Models
295(20)
12.1 Density estimation problem
295(2)
12.1.1 Maximum Likelihood (ML) solution
296(1)
12.1.2 Maximum a Posteriori (MAP) solution
297(1)
12.2 Density estimation problem augmented with features
297(1)
12.3 Maxent principle
298(1)
12.4 Maxent models
299(1)
12.5 Dual problem
299(4)
12.6 Generalization bound
303(1)
12.7 Coordinate descent algorithm
304(2)
12.8 Extensions
306(2)
12.9 L2-regularization
308(4)
12.10
Chapter notes
312(1)
12.11 Exercises
313(2)
13 Conditional Maximum Entropy Models
315(18)
13.1 Learning problem
315(1)
13.2 Conditional Maxent principle
316(1)
13.3 Conditional Maxent models
316(1)
13.4 Dual problem
317(2)
13.5 Properties
319(2)
13.5.1 Optimization problem
320(1)
13.5.2 Feature vectors
320(1)
13.5.3 Prediction
321(1)
13.6 Generalization bounds
321(4)
13.7 Logistic regression
325(1)
13.7.1 Optimization problem
325(1)
13.7.2 Logistic model
325(1)
13.8 L2-regularization
326(2)
13.9 Proof of the duality theorem
328(2)
13.10
Chapter notes
330(1)
13.11 Exercises
331(2)
14 Algorithmic Stability
333(14)
14.1 Definitions
333(1)
14.2 Stability-based generalization guarantee
334(2)
14.3 Stability of kernel-based regularization algorithms
336(6)
14.3.1 Application to regression algorithms: SVR and KRR
339(2)
14.3.2 Application to classification algorithms: SVMs
341(1)
14.3.3 Discussion
342(1)
14.4
Chapter notes
342(1)
14.5 Exercises
343(4)
15 Dimensionality Reduction
347(12)
15.1 Principal component analysis
348(1)
15.2 Kernel principal component analysis (KPCA)
349(2)
15.3 KPCA and manifold learning
351(3)
15.3.1 Isomap
351(1)
15.3.2 Laplacian eigenmaps
352(1)
15.3.3 Locally linear embedding (LLE)
353(1)
15.4 Johnson-Lindenstrauss lemma
354(2)
15.5
Chapter notes
356(1)
15.6 Exercises
356(3)
16 Learning Automata and Languages
359(20)
16.1 Introduction
359(1)
16.2 Finite automata
360(1)
16.3 Efficient exact learning
361(8)
16.3.1 Passive learning
362(1)
16.3.2 Learning with queries
363(1)
16.3.3 Learning automata with queries
364(5)
16.4 Identification in the limit
369(6)
16.4.1 Learning reversible automata
370(5)
16.5
Chapter notes
375(1)
16.6 Exercises
376(3)
17 Reinforcement Learning
379(30)
17.1 Learning scenario
379(1)
17.2 Markov decision process model
380(1)
17.3 Policy
381(6)
17.3.1 Definition
381(1)
17.3.2 Policy value
382(1)
17.3.3 Optimal policies
382(3)
17.3.4 Policy evaluation
385(2)
17.4 Planning algorithms
387(6)
17.4.1 Value iteration
387(3)
17.4.2 Policy iteration
390(2)
17.4.3 Linear programming
392(1)
17.5 Learning algorithms
393(12)
17.5.1 Stochastic approximation
394(3)
17.5.2 TD(0) algorithm
397(1)
17.5.3 Q-learning algorithm
398(4)
17.5.4 SARSA
402(1)
17.5.5 TD(λ) algorithm
402(1)
17.5.6 Large state space
403(2)
17.6
Chapter notes
405(2)
Conclusion
407(2)
A Linear Algebra Review
409(6)
A.1 Vectors and norms
409(2)
A.1.1 Norms
409(1)
A.1.2 Dual norms
410(1)
A.1.3 Relationship between norms
411(1)
A.2 Matrices
411(4)
A.2.1 Matrix norms
411(1)
A.2.2 Singular value decomposition
412(1)
A.2.3 Symmetric positive semidefinite (SPSD) matrices
412(3)
B Convex Optimization
415(14)
B.1 Differentiation and unconstrained optimization
415(1)
B.2 Convexity
415(4)
B.3 Constrained optimization
419(3)
B.4 Fenchel duality
422(4)
B.4.1 Subgradients
422(1)
B.4.2 Core
423(1)
B.4.3 Conjugate functions
423(3)
B.5
Chapter notes
426(1)
B.6 Exercises
427(2)
C Probability Review
429(8)
C.1 Probability
429(1)
C.2 Random variables
429(2)
C.3 Conditional probability and independence
431(1)
C.4 Expectation and Markov's inequality
431(1)
C.5 Variance and Chebyshev's inequality
432(2)
C.6 Moment-generating functions
434(1)
C.7 Exercises
435(2)
D Concentration Inequalities
437(12)
D.1 Hoeffding's inequality
437(1)
D.2 Sahov's theorem
438(1)
D.3 Multiplicative Chernoff bounds
439(1)
D.4 Binomial distribution tails: Upper bounds
440(1)
D.5 Binomial distribution tails: Lower bound
440(1)
D.6 Azuma's inequality
441(1)
D.7 McDiarmid's inequality
442(1)
D.8 Normal distribution tails: Lower bound
443(1)
D.9 Khintchine-Kahane inequality
443(1)
D.10 Maximal inequality
444(1)
D.11
Chapter notes
445(1)
D.12 Exercises
445(4)
E Notions of Information Theory
449(10)
E.1 Entropy
449(1)
E.2 Relative entropy
450(3)
E.3 Mutual information
453(1)
E.4 Bregman divergences
453(3)
E.5
Chapter notes
456(1)
E.6 Exercises
457(2)
F Notation
459(2)
Bibliography 461(14)
Index 475