Muutke küpsiste eelistusi

Introduction to Machine Learning third edition [Kõva köide]

(Boaziçi University)
Teised raamatud teemal:
Teised raamatud teemal:

The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data.Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. Subjects include supervised learning; Bayesian decision theory; parametric, semi-parametric, and nonparametric methods; multivariate analysis; hidden Markov models; reinforcement learning; kernel machines; graphical models; Bayesian estimation; and statistical testing.

Machine learning is rapidly becoming a skill that computer science students must master before graduation. The third edition ofIntroduction to Machine Learning reflects this shift, with added support for beginners, including selected solutions for exercises and additional example data sets (with code available online). Other substantial changes include discussions of outlier detection; ranking algorithms for perceptrons and support vector machines; matrix decomposition and spectral methods; distance estimation; new kernel algorithms; deep learning in multilayered perceptrons; and the nonparametric approach to Bayesian methods. All learning algorithms are explained so that students can easily move from the equations in the book to a computer program. The book can be used by both advanced undergraduates and graduate students. It will also be of interest to professionals who are concerned with the application of machine learning methods.

Preface xvii
Notations xxi
1 Introduction
1(20)
1.1 What Is Machine Learning?
1(3)
1.2 Examples of Machine Learning Applications
4(10)
1.2.1 Learning Associations
4(1)
1.2.2 Classification
5(4)
1.2.3 Regression
9(2)
1.2.4 Unsupervised Learning
11(2)
1.2.5 Reinforcement Learning
13(1)
1.3 Notes
14(3)
1.4 Relevant Resources
17(1)
1.5 Exercises
18(2)
1.6 References
20(1)
2 Supervised Learning
21(28)
2.1 Learning a Class from Examples
21(6)
2.2 Vapnik-Chervonenkis Dimension
27(2)
2.3 Probably Approximately Correct Learning
29(1)
2.4 Noise
30(2)
2.5 Learning Multiple Classes
32(2)
2.6 Regression
34(3)
2.7 Model Selection and Generalization
37(4)
2.8 Dimensions of a Supervised Machine Learning Algorithm
41(1)
2.9 Notes
42(1)
2.10 Exercises
43(4)
2.11 References
47(2)
3 Bayesian Decision Theory
49(16)
3.1 Introduction
49(2)
3.2 Classification
51(2)
3.3 Losses and Risks
53(2)
3.4 Discriminant Functions
55(1)
3.5 Association Rules
56(3)
3.6 Notes
59(1)
3.7 Exercises
60(4)
3.8 References
64(1)
4 Parametric Methods
65(28)
4.1 Introduction
65(1)
4.2 Maximum Likelihood Estimation
66(3)
4.2.1 Bernoulli Density
67(1)
4.2.2 Multinomial Density
68(1)
4.2.3 Gaussian (Normal) Density
68(1)
4.3 Evaluating an Estimator: Bias and Variance
69(1)
4.4 The Bayes' Estimator
70(3)
4.5 Parametric Classification
73(4)
4.6 Regression
77(3)
4.7 Tuning Model Complexity: Bias/Variance Dilemma
80(3)
4.8 Model Selection Procedures
83(4)
4.9 Notes
87(1)
4.10 Exercises
88(2)
4.11 References
90(3)
5 Multivariate Methods
93(22)
5.1 Multivariate Data
93(1)
5.2 Parameter Estimation
94(1)
5.3 Estimation of Missing Values
95(1)
5.4 Multivariate Normal Distribution
96(4)
5.5 Multivariate Classification
100(6)
5.6 Tuning Complexity
106(2)
5.7 Discrete Features
108(1)
5.8 Multivariate Regression
109(2)
5.9 Notes
111(1)
5.10 Exercises
112(1)
5.11 References
113(2)
6 Dimensionality Reduction
115(46)
6.1 Introduction
115(1)
6.2 Subset Selection
116(4)
6.3 Principal Component Analysis
120(7)
6.4 Feature Embedding
127(3)
6.5 Factor Analysis
130(5)
6.6 Singular Value Decomposition and Matrix Factorization
135(1)
6.7 Multidimensional Scaling
136(4)
6.8 Linear Discriminant Analysis
140(5)
6.9 Canonical Correlation Analysis
145(3)
6.10 Isomap
148(2)
6.11 Locally Linear Embedding
150(3)
6.12 Laplacian Eigenmaps
153(2)
6.13 Notes
155(2)
6.14 Exercises
157(1)
6.15 References
158(3)
7 Clustering
161(24)
7.1 Introduction
161(1)
7.2 Mixture Densities
162(1)
7.3 k-Means Clustering
163(4)
7.4 Expectation-Maximization Algorithm
167(5)
7.5 Mixtures of Latent Variable Models
172(1)
7.6 Supervised Learning after Clustering
173(2)
7.7 Spectral Clustering
175(1)
7.8 Hierarchical Clustering
176(2)
7.9 Choosing the Number of Clusters
178(1)
7.10 Notes
179(1)
7.11 Exercises
180(2)
7.12 References
182(3)
8 Nonparametric Methods
185(28)
8.1 Introduction
185(1)
8.2 Nonparametric Density Estimation
186(6)
8.2.1 Histogram Estimator
187(1)
8.2.2 Kernel Estimator
188(2)
8.2.3 k-Nearest Neighbor Estimator
190(2)
8.3 Generalization to Multivariate Data
192(1)
8.4 Nonparametric Classification
193(1)
8.5 Condensed Nearest Neighbor
194(2)
8.6 Distance-Based Classification
196(3)
8.7 Outlier Detection
199(2)
8.8 Nonparametric Regression: Smoothing Models
201(3)
8.8.1 Running Mean Smoother
201(2)
8.8.2 Kernel Smoother
203(1)
8.8.3 Running Line Smoother
204(1)
8.9 How to Choose the Smoothing Parameter
204(1)
8.10 Notes
205(3)
8.11 Exercises
208(2)
8.12 References
210(3)
9 Decision Trees
213(26)
9.1 Introduction
213(2)
9.2 Univariate Trees
215(7)
9.2.1 Classification Trees
216(4)
9.2.2 Regression Trees
220(2)
9.3 Pruning
222(3)
9.4 Rule Extraction from Trees
225(1)
9.5 Learning Rules from Data
226(4)
9.6 Multivariate Trees
230(2)
9.7 Notes
232(3)
9.8 Exercises
235(2)
9.9 References
237(2)
10 Linear Discrimination
239(28)
10.1 Introduction
239(2)
10.2 Generalizing the Linear Model
241(1)
10.3 Geometry of the Linear Discriminant
242(4)
10.3.1 Two Classes
242(2)
10.3.2 Multiple Classes
244(2)
10.4 Pairwise Separation
246(1)
10.5 Parametric Discrimination Revisited
247(1)
10.6 Gradient Descent
248(2)
10.7 Logistic Discrimination
250(7)
10.7.1 Two Classes
250(4)
10.7.2 Multiple Classes
254(3)
10.8 Discrimination by Regression
257(3)
10.9 Learning to Rank
260(3)
10.10 Notes
263(1)
10.11 Exercises
263(3)
10.12 References
266(1)
11 Multilayer Perceptrons
267(50)
11.1 Introduction
267(4)
11.1.1 Understanding the Brain
268(1)
11.1.2 Neural Networks as a Paradigm for Parallel Processing
269(2)
11.2 The Perceptron
271(3)
11.3 Training a Perceptron
274(3)
11.4 Learning Boolean Functions
277(2)
11.5 Multilayer Perceptrons
279(2)
11.6 MLP as a Universal Approximator
281(2)
11.7 Backpropagation Algorithm
283(7)
11.7.1 Nonlinear Regression
284(2)
11.7.2 Two-Class Discrimination
286(2)
11.7.3 Multiclass Discrimination
288(2)
11.7.4 Multiple Hidden Layers
290(1)
11.8 Training Procedures
290(7)
11.8.1 Improving Convergence
290(1)
11.8.2 Overtraining
291(1)
11.8.3 Structuring the Network
292(3)
11.8.4 Hints
295(2)
11.9 Tuning the Network Size
297(3)
11.10 Bayesian View of Learning
300(1)
11.11 Dimensionality Reduction
301(3)
11.12 Learning Time
304(2)
11.12.1 Time Delay Neural Networks
304(1)
11.12.2 Recurrent Networks
305(1)
11.13 Deep Learning
306(3)
11.14 Notes
309(2)
11.15 Exercises
311(2)
11.16 References
313(4)
12 Local Models
317(32)
12.1 Introduction
317(1)
12.2 Competitive Learning
318(8)
12.2.1 Online k-Means
318(5)
12.2.2 Adaptive Resonance Theory
323(1)
12.2.3 Self-Organizing Maps
324(2)
12.3 Radial Basis Functions
326(6)
12.4 Incorporating Rule-Based Knowledge
332(1)
12.5 Normalized Basis Functions
333(2)
12.6 Competitive Basis Functions
335(3)
12.7 Learning Vector Quantization
338(1)
12.8 The Mixture of Experts
338(4)
12.8.1 Cooperative Experts
341(1)
12.8.2 Competitive Experts
342(1)
12.9 Hierarchical Mixture of Experts
342(1)
12.10 Notes
343(1)
12.11 Exercises
344(3)
12.12 References
347(2)
13 Kernel Machines
349(38)
13.1 Introduction
349(2)
13.2 Optimal Separating Hyperplane
351(4)
13.3 The Nonseparable Case: Soft Margin Hyperplane
355(3)
13.4 v-SVM
358(1)
13.5 Kernel Trick
359(2)
13.6 Vectorial Kernels
361(3)
13.7 Defining Kernels
364(1)
13.8 Multiple Kernel Learning
365(2)
13.9 Multiclass Kernel Machines
367(1)
13.10 Kernel Machines for Regression
368(5)
13.11 Kernel Machines for Ranking
373(1)
13.12 One-Class Kernel Machines
374(3)
13.13 Large Margin Nearest Neighbor Classifier
377(2)
13.14 Kernel Dimensionality Reduction
379(1)
13.15 Notes
380(2)
13.16 Exercises
382(1)
13.17 References
383(4)
14 Graphical Models
387(30)
14.1 Introduction
387(2)
14.2 Canonical Cases for Conditional Independence
389(7)
14.3 Generative Models
396(3)
14.4 d-Separation
399(1)
14.5 Belief Propagation
399(8)
14.5.1 Chains
400(2)
14.5.2 Trees
402(2)
14.5.3 Polytrees
404(2)
14.5.4 Junction Trees
406(1)
14.6 Undirected Graphs: Markov Random Fields
407(3)
14.7 Learning the Structure of a Graphical Model
410(1)
14.8 Influence Diagrams
411(1)
14.9 Notes
412(1)
14.10 Exercises
413(2)
14.11 References
415(2)
15 Hidden Markov Models
417(28)
15.1 Introduction
417(1)
15.2 Discrete Markov Processes
418(3)
15.3 Hidden Markov Models
421(2)
15.4 Three Basic Problems of HMMs
423(1)
15.5 Evaluation Problem
423(4)
15.6 Finding the State Sequence
427(2)
15.7 Learning Model Parameters
429(3)
15.8 Continuous Observations
432(1)
15.9 The HMM as a Graphical Model
433(3)
15.10 Model Selection in HMMs
436(2)
15.11 Notes
438(2)
15.12 Exercises
440(3)
15.13 References
443(2)
16 Bayesian Estimation
445(42)
16.1 Introduction
445(4)
16.2 Bayesian Estimation of the Parameters of a Discrete Distribution
449(2)
16.2.1 K > 2 States: Dirichlet Distribution
449(1)
16.2.2 K = 2 States: Beta Distribution
450(1)
16.3 Bayesian Estimation of the Parameters of a Gaussian Distribution
451(5)
16.3.1 Univariate Case: Unknown Mean, Known Variance
451(2)
16.3.2 Univariate Case: Unknown Mean, Unknown Variance
453(2)
16.3.3 Multivariate Case: Unknown Mean, Unknown Covariance
455(1)
16.4 Bayesian Estimation of the Parameters of a Function
456(10)
16.4.1 Regression
456(4)
16.4.2 Regression with Prior on Noise Precision
460(1)
16.4.3 The Use of Basis/Kernel Functions
461(2)
16.4.4 Bayesian Classification
463(3)
16.5 Choosing a Prior
466(1)
16.6 Bayesian Model Comparison
467(3)
16.7 Bayesian Estimation of a Mixture Model
470(3)
16.8 Nonparametric Bayesian Modeling
473(1)
16.9 Gaussian Processes
474(4)
16.10 Dirichlet Processes and Chinese Restaurants
478(2)
16.11 Latent Dirichlet Allocation
480(2)
16.12 Beta Processes and Indian Buffets
482(1)
16.13 Notes
483(1)
16.14 Exercises
484(1)
16.15 References
485(2)
17 Combining Multiple Learners
487(30)
17.1 Rationale
487(1)
17.2 Generating Diverse Learners
488(3)
17.3 Model Combination Schemes
491(1)
17.4 Voting
492(4)
17.5 Error-Correcting Output Codes
496(2)
17.6 Bagging
498(1)
17.7 Boosting
499(3)
17.8 The Mixture of Experts Revisited
502(2)
17.9 Stacked Generalization
504(1)
17.10 Fine-Tuning an Ensemble
505(2)
17.10.1 Choosing a Subset of the Ensemble
506(1)
17.10.2 Constructing Metalearners
506(1)
17.11 Cascading
507(2)
17.12 Notes
509(2)
17.13 Exercises
511(2)
17.14 References
513(4)
18 Reinforcement Learning
517(30)
18.1 Introduction
517(2)
18.2 Single State Case: K-Armed Bandit
519(1)
18.3 Elements of Reinforcement Learning
520(3)
18.4 Model-Based Learning
523(2)
18.4.1 Value Iteration
523(1)
18.4.2 Policy Iteration
524(1)
18.5 Temporal Difference Learning
525(6)
18.5.1 Exploration Strategies
525(1)
18.5.2 Deterministic Rewards and Actions
526(1)
18.5.3 Nondeterministic Rewards and Actions
527(3)
18.5.4 Eligibility Traces
530(1)
18.6 Generalization
531(3)
18.7 Partially Observable States
534(7)
18.7.1 The Setting
534(2)
18.7.2 Example: The Tiger Problem
536(5)
18.8 Notes
541(1)
18.9 Exercises
542(2)
18.10 References
544(3)
19 Design and Analysis of Machine Learning Experiments
547(46)
19.1 Introduction
547(3)
19.2 Factors, Response, and Strategy of Experimentation
550(3)
19.3 Response Surface Design
553(1)
19.4 Randomization, Replication, and Blocking
554(1)
19.5 Guidelines for Machine Learning Experiments
555(3)
19.6 Cross-Validation and Resampling Methods
558(3)
19.6.1 K-Fold Cross-Validation
559(1)
19.6.2 5 X 2 Cross-Validation
560(1)
19.6.3 Bootstrapping
561(1)
19.7 Measuring Classifier Performance
561(3)
19.8 Interval Estimation
564(4)
19.9 Hypothesis Testing
568(2)
19.10 Assessing a Classification Algorithm's Performance
570(3)
19.10.1 Binomial Test
571(1)
19.10.2 Approximate Normal Test
572(1)
19.10.3 t Test
572(1)
19.11 Comparing Two Classification Algorithms
573(3)
19.11.1 McNemar's Test
573(1)
19.11.2 K-Fold Cross-Validated Paired T Test
573(1)
19.11.3 5 X 2 cv Paired T Test
574(1)
19.11.4 5 X 2 cv Paired F Test
575(1)
19.12 Comparing Multiple Algorithms: Analysis of Variance
576(4)
19.13 Comparison over Multiple Datasets
580(4)
19.13.1 Comparing Two Algorithms
581(2)
19.13.2 Multiple Algorithms
583(1)
19.14 Multivariate Tests
584(3)
19.14.1 Comparing Two Algorithms
585(1)
19.14.2 Comparing Multiple Algorithms
586(1)
19.15 Notes
587(1)
19.16 Exercises
588(2)
19.17 References
590(3)
A Probability
593(12)
A.1 Elements of Probability
593(2)
A.1.1 Axioms of Probability
594(1)
A.1.2 Conditional Probability
594(1)
A.2 Random Variables
595(4)
A.2.1 Probability Distribution and Density Functions
595(1)
A.2.2 Joint Distribution and Density Functions
596(1)
A.2.3 Conditional Distributions
596(1)
A.2.4 Bayes' Rule
597(1)
A.2.5 Expectation
597(1)
A.2.6 Variance
598(1)
A.2.7 Weak Law of Large Numbers
599(1)
A.3 Special Random Variables
599(4)
A.3.1 Bernoulli Distribution
599(1)
A.3.2 Binomial Distribution
600(1)
A.3.3 Multinomial Distribution
600(1)
A.3.4 Uniform Distribution
600(1)
A.3.5 Normal (Gaussian) Distribution
601(1)
A.3.6 Chi-Square Distribution
602(1)
A.3.7 t Distribution
603(1)
A.3.8 F Distribution
603(1)
A.4 References
603(2)
Index 605