| Preface |
|
xi | |
| About the Authors |
|
xiii | |
|
1 Introduction to Artificial Intelligence |
|
|
1 | (8) |
|
1.1 History of Artificial Intelligence |
|
|
2 | (5) |
|
1.1.1 What Is Artificial Intelligence? |
|
|
2 | (2) |
|
|
|
4 | (1) |
|
1.1.3 Cognitive Science and AI |
|
|
4 | (1) |
|
1.1.4 Logical Approach to AI |
|
|
4 | (1) |
|
1.1.5 Knowledge-Based Systems |
|
|
5 | (1) |
|
1.1.6 Probabilistic Approach to AI |
|
|
6 | (1) |
|
1.1.7 Evolutionary Computation and Swarm Intelligence |
|
|
6 | (1) |
|
1.1.8 Neural Networks & Deep Learning |
|
|
7 | (1) |
|
1.1.9 A Return to Creating HAL |
|
|
7 | (1) |
|
|
|
7 | (2) |
| I Logical Intelligence |
|
9 | (104) |
|
|
|
11 | (42) |
|
2.1 Basics of Propositional Logic |
|
|
12 | (12) |
|
|
|
12 | (1) |
|
|
|
13 | (4) |
|
2.1.3 Tautologies and Logical Implication |
|
|
17 | (1) |
|
|
|
18 | (3) |
|
|
|
21 | (3) |
|
|
|
24 | (6) |
|
|
|
25 | (1) |
|
2.2.2 Derivations Using Resolution |
|
|
26 | (4) |
|
2.2.3 Resolution Algorithm |
|
|
30 | (1) |
|
2.3 Artificial Intelligence Applications |
|
|
30 | (18) |
|
2.3.1 Knowledge-Based Systems |
|
|
30 | (11) |
|
|
|
41 | (7) |
|
2.4 Discussion and Further Reading |
|
|
48 | (5) |
|
|
|
53 | (24) |
|
3.1 Basics of First-Order Logic |
|
|
53 | (15) |
|
|
|
54 | (2) |
|
|
|
56 | (4) |
|
3.1.3 Validity and Logical Implication |
|
|
60 | (2) |
|
|
|
62 | (3) |
|
3.1.5 Modus Ponens for First-Order Logic |
|
|
65 | (3) |
|
3.2 Artificial Intelligence Applications |
|
|
68 | (5) |
|
3.2.1 Wumpus World Revisited |
|
|
69 | (1) |
|
|
|
69 | (4) |
|
3.3 Discussion and Further Reading |
|
|
73 | (4) |
|
4 Certain Knowledge Representation |
|
|
77 | (12) |
|
|
|
78 | (2) |
|
|
|
78 | (1) |
|
4.1.2 Model of Human Organization of Knowledge |
|
|
79 | (1) |
|
|
|
80 | (4) |
|
4.2.1 Frame Data Structure |
|
|
80 | (1) |
|
4.2.2 Planning a Trip Using Frames |
|
|
81 | (3) |
|
|
|
84 | (2) |
|
|
|
84 | (1) |
|
|
|
85 | (1) |
|
|
|
86 | (1) |
|
4.4 Discussion and Further Reading |
|
|
86 | (3) |
|
5 Learning Deterministic Models |
|
|
89 | (24) |
|
|
|
89 | (1) |
|
|
|
90 | (6) |
|
5.2.1 Simple Linear Regression |
|
|
91 | (2) |
|
5.2.2 Multiple Linear Regression |
|
|
93 | (1) |
|
5.2.3 Overfitting and Cross Validation |
|
|
94 | (2) |
|
|
|
96 | (6) |
|
5.3.1 Estimating the Parameters for Simple Linear Regression |
|
|
96 | (2) |
|
|
|
98 | (2) |
|
5.3.3 Logistic Regression and Gradient Descent |
|
|
100 | (1) |
|
5.3.4 Stochastic Gradient Descent |
|
|
101 | (1) |
|
5.4 Learning a Decision Tree |
|
|
102 | (13) |
|
|
|
102 | (4) |
|
5.4.2 Information Gain and the ID3 Algorithm |
|
|
106 | (2) |
|
|
|
108 | (5) |
| II Probabilistic Intelligence |
|
113 | (236) |
|
|
|
115 | (30) |
|
|
|
117 | (6) |
|
|
|
117 | (3) |
|
6.1.2 Conditional Probability and Independence |
|
|
120 | (2) |
|
|
|
122 | (1) |
|
|
|
123 | (8) |
|
6.2.1 Probability Distributions of Random Variables |
|
|
123 | (5) |
|
6.2.2 Independence of Random Variables |
|
|
128 | (3) |
|
6.3 Meaning of Probability |
|
|
131 | (4) |
|
6.3.1 Relative Frequency Approach to Probability |
|
|
132 | (2) |
|
6.3.2 Subjective Approach to Probability |
|
|
134 | (1) |
|
6.4 Random Variables in Applications |
|
|
135 | (4) |
|
6.5 Probability in the Wumpus World |
|
|
139 | (6) |
|
7 Uncertain Knowledge Representation |
|
|
145 | (36) |
|
7.1 Intuitive Introduction to Bayesian Networks |
|
|
147 | (2) |
|
7.2 Properties of Bayesian Networks |
|
|
149 | (5) |
|
7.2.1 Definition of a Bayesian Network |
|
|
149 | (3) |
|
7.2.2 Representation of a Bayesian Network |
|
|
152 | (2) |
|
7.3 Causal Networks as Bayesian Networks |
|
|
154 | (6) |
|
|
|
154 | (1) |
|
7.3.2 Causality and the Markov Condition |
|
|
155 | (4) |
|
7.3.3 Markov Condition without Causality |
|
|
159 | (1) |
|
7.4 Inference in Bayesian Networks |
|
|
160 | (5) |
|
7.4.1 Examples of Inference |
|
|
160 | (2) |
|
7.4.2 Inference Algorithms and Packages |
|
|
162 | (1) |
|
7.4.3 Inference Using Netica |
|
|
163 | (2) |
|
7.5 Networks with Continuous Variables |
|
|
165 | (5) |
|
7.5.1 Gaussian Bayesian Networks |
|
|
165 | (3) |
|
|
|
168 | (2) |
|
7.6 Obtaining the Probabilities |
|
|
170 | (4) |
|
7.6.1 Difficulty Inherent in Multiple Parents |
|
|
170 | (1) |
|
7.6.2 Basic Noisy OR-Gate Model |
|
|
170 | (2) |
|
7.6.3 Leaky Noisy OR-Gate Model |
|
|
172 | (2) |
|
|
|
174 | (1) |
|
7.7 Large-Scale Application: Promedas |
|
|
174 | (7) |
|
8 Advanced Properties of Bayesian Networks |
|
|
181 | (20) |
|
8.1 Entailed Conditional Independencies |
|
|
182 | (6) |
|
8.1.1 Examples of Entailed Conditional Independencies |
|
|
182 | (3) |
|
|
|
185 | (3) |
|
|
|
188 | (3) |
|
8.2.1 Unfaithful Probability Distributions |
|
|
188 | (2) |
|
8.2.2 Faithfulness Condition |
|
|
190 | (1) |
|
|
|
191 | (1) |
|
8.4 Markov Blankets and Boundaries |
|
|
192 | (9) |
|
|
|
201 | (56) |
|
|
|
202 | (14) |
|
|
|
202 | (3) |
|
9.1.2 Solving More Complex Decision Trees |
|
|
205 | (11) |
|
|
|
216 | (15) |
|
9.2.1 Representing Decision Problems with Influence Diagrams |
|
|
216 | (6) |
|
9.2.2 Solving Influence Diagrams |
|
|
222 | (1) |
|
9.2.3 Techniques for Solving Influence Diagrams |
|
|
222 | (4) |
|
9.2.4 Solving Influence Diagrams Using Netica |
|
|
226 | (5) |
|
9.3 Modeling Risk Preferences |
|
|
231 | (2) |
|
9.3.1 Exponential Utility Function |
|
|
231 | (1) |
|
|
|
232 | (1) |
|
9.4 Analyzing Risk Directly |
|
|
233 | (6) |
|
9.4.1 Using the Variance to Measure Risk |
|
|
233 | (2) |
|
|
|
235 | (1) |
|
|
|
236 | (3) |
|
9.5 Good Decision versus Good Outcome |
|
|
239 | (1) |
|
|
|
239 | (2) |
|
|
|
241 | (4) |
|
9.7.1 Expected Value of Perfect Information |
|
|
242 | (2) |
|
9.7.2 Expected Value of Imperfect Information |
|
|
244 | (1) |
|
9.8 Discussion and Further Reading |
|
|
245 | (12) |
|
|
|
246 | (1) |
|
9.8.2 Business and Finance |
|
|
247 | (1) |
|
|
|
247 | (1) |
|
|
|
247 | (1) |
|
|
|
247 | (1) |
|
|
|
247 | (1) |
|
|
|
248 | (1) |
|
9.8.8 Natural Language Processing |
|
|
248 | (1) |
|
|
|
248 | (1) |
|
|
|
248 | (1) |
|
9.8.11 Reliability Analysis |
|
|
248 | (1) |
|
|
|
249 | (1) |
|
9.8.13 Speech Recognition |
|
|
249 | (1) |
|
9.8.14 Vehicle Control and Malfunction Diagnosis |
|
|
249 | (8) |
|
10 Learning Probabilistic Model Parameters |
|
|
257 | (18) |
|
10.1 Learning a Single Parameter |
|
|
257 | (4) |
|
10.1.1 Binomial Random Variables |
|
|
258 | (2) |
|
10.1.2 Multinomial Random Variables |
|
|
260 | (1) |
|
10.2 Learning Parameters in a Bayesian Network |
|
|
261 | (5) |
|
10.2.1 Procedure for Learning Parameters |
|
|
262 | (1) |
|
10.2.2 Equivalent Sample Size |
|
|
263 | (3) |
|
10.3 Learning Parameters with Missing Data |
|
|
266 | (9) |
|
11 Learning Probabilistic Model Structure |
|
|
275 | (56) |
|
11.1 Structure Learning Problem |
|
|
276 | (1) |
|
11.2 Score-Based Structure Learning |
|
|
276 | (27) |
|
|
|
276 | (7) |
|
|
|
283 | (1) |
|
11.2.3 Consistent Scoring Criteria |
|
|
284 | (1) |
|
11.2.4 How Many DAGs Must We Score? |
|
|
285 | (1) |
|
11.2.5 Using the Learned Network to Do Inference |
|
|
285 | (1) |
|
11.2.6 Learning Structure with Missing Data |
|
|
286 | (7) |
|
11.2.7 Approximate Structure Learning |
|
|
293 | (4) |
|
|
|
297 | (3) |
|
11.2.9 Approximate Model Averaging |
|
|
300 | (3) |
|
11.3 Constraint-Based Structure Learning |
|
|
303 | (5) |
|
11.3.1 Learning a DAG Faithful to P |
|
|
303 | (4) |
|
11.3.2 Learning a DAG in which P Is Embedded Faithfully |
|
|
307 | (1) |
|
|
|
308 | (3) |
|
11.4.1 Developing the Network |
|
|
308 | (2) |
|
|
|
310 | (1) |
|
11.5 Software Packages for Learning |
|
|
311 | (1) |
|
|
|
312 | (8) |
|
11.6.1 Causal Faithfulness Assumption |
|
|
312 | (2) |
|
11.6.2 Causal Embedded Faithfulness Assumption |
|
|
314 | (3) |
|
11.6.3 Application: College Student Retention Rate |
|
|
317 | (3) |
|
11.7 Class Probability Trees |
|
|
320 | (5) |
|
11.7.1 Theory of Class Probability Trees |
|
|
320 | (2) |
|
11.7.2 Application to Targeted Advertising |
|
|
322 | (3) |
|
11.8 Discussion and Further Reading |
|
|
325 | (6) |
|
|
|
325 | (1) |
|
11.8.2 Business and Finance |
|
|
326 | (1) |
|
|
|
326 | (1) |
|
|
|
326 | (1) |
|
|
|
326 | (1) |
|
11.8.6 Weather Forecasting |
|
|
326 | (5) |
|
12 Unsupervised Learning and Reinforcement Learning |
|
|
331 | (18) |
|
12.1 Unsupervised Learning |
|
|
331 | (2) |
|
|
|
331 | (2) |
|
12.1.2 Automated Discovery |
|
|
333 | (1) |
|
12.2 Reinforcement Learning |
|
|
333 | (12) |
|
12.2.1 Multi-Armed Bandit Algorithms |
|
|
333 | (3) |
|
|
|
336 | (9) |
|
12.3 Discussion and Further Reading |
|
|
345 | (4) |
| III Emergent Intelligence |
|
349 | (38) |
|
13 Evolutionary Computation |
|
|
351 | (26) |
|
|
|
352 | (2) |
|
|
|
354 | (10) |
|
|
|
354 | (1) |
|
13.2.2 Illustrative Example |
|
|
355 | (2) |
|
13.2.3 Traveling Salesperson Problem |
|
|
357 | (7) |
|
|
|
364 | (9) |
|
13.3.1 Illustrative Example |
|
|
365 | (2) |
|
|
|
367 | (3) |
|
13.3.3 Application to Financial Trading |
|
|
370 | (3) |
|
13.4 Discussion and Further Reading |
|
|
373 | (4) |
|
|
|
377 | (10) |
|
|
|
377 | (4) |
|
|
|
378 | (1) |
|
14.1.2 Artificial Ants for Solving the TSP |
|
|
378 | (3) |
|
|
|
381 | (2) |
|
14.3 Discussion and Further Reading |
|
|
383 | (4) |
| IV Neural Intelligence |
|
387 | (26) |
|
15 Neural Networks and Deep Learning |
|
|
389 | (24) |
|
|
|
389 | (6) |
|
15.1.1 Learning the Weights for a Perceptron |
|
|
391 | (3) |
|
15.1.2 The Perceptron and Logistic Regression |
|
|
394 | (1) |
|
15.2 Feedforward Neural Networks |
|
|
395 | (8) |
|
|
|
395 | (3) |
|
15.2.2 Example with Two Hidden Layers |
|
|
398 | (3) |
|
15.2.3 Structure of a Feedforward Neural Network |
|
|
401 | (2) |
|
15.3 Activation Functions |
|
|
403 | (4) |
|
|
|
403 | (2) |
|
|
|
405 | (2) |
|
15.4 Application to Image Recognition |
|
|
407 | (1) |
|
15.5 Discussion and Further Reading |
|
|
407 | (6) |
| V Language Understanding |
|
413 | (24) |
|
16 Natural Language Understanding |
|
|
415 | (22) |
|
|
|
417 | (13) |
|
|
|
418 | (2) |
|
|
|
420 | (2) |
|
16.1.3 Dynamic Programming Parser |
|
|
422 | (4) |
|
16.1.4 Probabilistic Parser |
|
|
426 | (2) |
|
16.1.5 Obtaining Probabilities for a PCFG |
|
|
428 | (1) |
|
|
|
428 | (2) |
|
16.2 Semantic Interpretation |
|
|
430 | (1) |
|
16.3 Concept/Knowledge Interpretation |
|
|
431 | (1) |
|
16.4 Information Extraction |
|
|
432 | (3) |
|
16.4.1 Applications of Information Extraction |
|
|
432 | (1) |
|
16.4.2 Architecture for an Information Extraction System |
|
|
433 | (2) |
|
16.5 Discussion and Further Reading |
|
|
435 | (2) |
| References |
|
437 | (22) |
| Index |
|
459 | |