Acknowledgments |
|
xxiii | |
|
|
xxv | |
|
|
xxxi | |
|
|
xxxiii | |
|
|
1 | (14) |
|
|
1 | (1) |
|
Structured Probabilistic Models |
|
|
2 | (4) |
|
Probabilistic Graphical Models |
|
|
3 | (2) |
|
Representation, Inference, Learning |
|
|
5 | (1) |
|
|
6 | (6) |
|
|
6 | (3) |
|
|
9 | (2) |
|
Connection to Other Disciplines |
|
|
11 | (1) |
|
|
12 | (3) |
|
|
15 | (28) |
|
|
15 | (19) |
|
Probability Distributions |
|
|
15 | (3) |
|
Basic Concepts in Probability |
|
|
18 | (1) |
|
Random Variables and Joint Distributions |
|
|
19 | (4) |
|
Independence and Conditional Independence |
|
|
23 | (2) |
|
|
25 | (2) |
|
|
27 | (4) |
|
|
31 | (3) |
|
|
34 | (5) |
|
|
34 | (1) |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
36 | (3) |
|
|
39 | (1) |
|
|
39 | (4) |
|
|
43 | (242) |
|
The Bayesian Network Representation |
|
|
45 | (58) |
|
Exploiting Independence Properties |
|
|
45 | (6) |
|
Independent Random Variables |
|
|
45 | (1) |
|
The Conditional Parameterization |
|
|
46 | (2) |
|
|
48 | (3) |
|
|
51 | (17) |
|
The Student Example Revisited |
|
|
52 | (4) |
|
Basic Independencies in Bayesian Networks |
|
|
56 | (4) |
|
|
60 | (8) |
|
|
68 | (10) |
|
|
69 | (3) |
|
Soundness and Completeness |
|
|
72 | (2) |
|
An Algorithm for d-Separation |
|
|
74 | (2) |
|
|
76 | (2) |
|
From Distributions to Graphs |
|
|
78 | (14) |
|
|
78 | (3) |
|
|
81 | (2) |
|
|
83 | (9) |
|
|
92 | (1) |
|
|
93 | (3) |
|
|
96 | (7) |
|
Undirected Graphical Models |
|
|
103 | (54) |
|
The Misconception Example |
|
|
103 | (3) |
|
|
106 | (8) |
|
|
106 | (2) |
|
Gibbs Distributions and Markov Networks |
|
|
108 | (2) |
|
|
110 | (4) |
|
Markov Network Independencies |
|
|
114 | (8) |
|
|
114 | (3) |
|
|
117 | (3) |
|
From Distributions to Graphs |
|
|
120 | (2) |
|
Parameterization Revisited |
|
|
122 | (12) |
|
Finer-Grained Parameterization |
|
|
123 | (5) |
|
|
128 | (6) |
|
Bayesian Networks and Markov Networks |
|
|
134 | (8) |
|
From Bayesian Networks to Markov Networks |
|
|
134 | (4) |
|
From Markov Networks to Bayesian Networks |
|
|
138 | (1) |
|
|
139 | (3) |
|
Partially Directed Models |
|
|
142 | (9) |
|
Conditional Random Fields |
|
|
142 | (6) |
|
|
148 | (3) |
|
|
151 | (1) |
|
|
152 | (1) |
|
|
153 | (4) |
|
Local Probabilistic Models |
|
|
157 | (42) |
|
|
157 | (1) |
|
|
158 | (4) |
|
|
158 | (1) |
|
|
159 | (3) |
|
|
162 | (13) |
|
|
162 | (9) |
|
|
171 | (4) |
|
Independence of Causal Influence |
|
|
175 | (10) |
|
|
175 | (3) |
|
Generalized Linear Models |
|
|
178 | (4) |
|
|
182 | (2) |
|
|
184 | (1) |
|
|
185 | (6) |
|
|
189 | (2) |
|
Conditional Bayesian Networks |
|
|
191 | (2) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
195 | (4) |
|
Template-Based Representations |
|
|
199 | (48) |
|
|
199 | (1) |
|
|
200 | (12) |
|
|
201 | (1) |
|
Dynamic Bayesian Networks |
|
|
202 | (5) |
|
|
207 | (5) |
|
Template Variables and Template Factors |
|
|
212 | (4) |
|
Directed Probabilistic Models for Object-Relational Domains |
|
|
216 | (12) |
|
|
216 | (6) |
|
Probabilistic Relational Models |
|
|
222 | (6) |
|
Undirected Representation |
|
|
228 | (4) |
|
|
232 | (8) |
|
|
233 | (2) |
|
|
235 | (5) |
|
|
240 | (2) |
|
|
242 | (1) |
|
|
243 | (4) |
|
|
247 | (14) |
|
|
247 | (4) |
|
|
247 | (2) |
|
|
249 | (1) |
|
Independencies in Gaussians |
|
|
250 | (1) |
|
Gaussian Bayesian Networks |
|
|
251 | (3) |
|
Gaussian Markov Random Fields |
|
|
254 | (3) |
|
|
257 | (1) |
|
|
258 | (1) |
|
|
258 | (3) |
|
|
261 | (24) |
|
|
261 | (1) |
|
|
261 | (5) |
|
Linear Exponential Families |
|
|
263 | (3) |
|
Factored Exponential Families |
|
|
266 | (3) |
|
|
266 | (1) |
|
|
267 | (2) |
|
Entropy and Relative Entropy |
|
|
269 | (4) |
|
|
269 | (3) |
|
|
272 | (1) |
|
|
273 | (9) |
|
|
274 | (3) |
|
|
277 | (5) |
|
|
282 | (1) |
|
|
282 | (1) |
|
|
283 | (1) |
|
|
283 | (2) |
|
|
285 | (410) |
|
Exact Inference: Variable Elimination |
|
|
287 | (58) |
|
|
288 | (4) |
|
Analysis of Exact Inference |
|
|
288 | (2) |
|
Analysis of Approximate Inference |
|
|
290 | (2) |
|
Variable Elimination: The Basic Ideas |
|
|
292 | (4) |
|
|
296 | (10) |
|
|
297 | (6) |
|
|
303 | (3) |
|
Complexity and Graph Structure: Variable Elimination |
|
|
306 | (9) |
|
|
306 | (1) |
|
|
306 | (4) |
|
Finding Elimination Orderings |
|
|
310 | (5) |
|
|
315 | (10) |
|
The Conditioning Algorithm |
|
|
315 | (3) |
|
Conditioning and Variable Elimination |
|
|
318 | (4) |
|
|
322 | (1) |
|
|
323 | (2) |
|
Inference with Structured CPDs |
|
|
325 | (11) |
|
Independence of Causal Influence |
|
|
325 | (4) |
|
Context-Specific Independence |
|
|
329 | (6) |
|
|
335 | (1) |
|
|
336 | (1) |
|
|
337 | (1) |
|
|
338 | (7) |
|
Exact Inference: Clique Trees |
|
|
345 | (36) |
|
Variable Elimination and Clique Trees |
|
|
345 | (3) |
|
|
346 | (1) |
|
|
346 | (2) |
|
Message Passing: Sum Product |
|
|
348 | (16) |
|
Variable Elimination in a Clique Tree |
|
|
349 | (6) |
|
|
355 | (6) |
|
A Calibrated Clique Tree as a Distribution |
|
|
361 | (3) |
|
Message Passing: Belief Update |
|
|
364 | (8) |
|
Message Passing with Division |
|
|
364 | (4) |
|
Equivalence of Sum-Product and Belief Update Messages |
|
|
368 | (1) |
|
|
369 | (3) |
|
Constructing a Clique Tree |
|
|
372 | (4) |
|
Clique Trees from Variable Elimination |
|
|
372 | (2) |
|
Clique Trees from Chordal Graphs |
|
|
374 | (2) |
|
|
376 | (1) |
|
|
377 | (1) |
|
|
378 | (3) |
|
Inference as Optimization |
|
|
381 | (106) |
|
|
381 | (5) |
|
Exact Inference Revisited |
|
|
382 | (2) |
|
|
384 | (2) |
|
Optimizing the Energy Functional |
|
|
386 | (1) |
|
Exact Inference as Optimization |
|
|
386 | (5) |
|
Fixed-Point Characterization |
|
|
388 | (2) |
|
Inference as Optimization |
|
|
390 | (1) |
|
Propagation-Based Approximation |
|
|
391 | (39) |
|
|
391 | (5) |
|
Cluster-Graph Belief Propagation |
|
|
396 | (3) |
|
Properties of Cluster-Graph Belief Propagation |
|
|
399 | (2) |
|
|
401 | (3) |
|
Constructing Cluster Graphs |
|
|
404 | (7) |
|
|
411 | (3) |
|
Other Entropy Approximations |
|
|
414 | (14) |
|
|
428 | (2) |
|
Propagation with Approximate Messages |
|
|
430 | (18) |
|
|
431 | (2) |
|
Approximate Message Computation |
|
|
433 | (3) |
|
Inference with Approximate Messages |
|
|
436 | (6) |
|
|
442 | (3) |
|
|
445 | (3) |
|
|
448 | (1) |
|
Structured Variational Approximations |
|
|
448 | (25) |
|
The Mean Field Approximation |
|
|
449 | (7) |
|
Structured Approximations |
|
|
456 | (13) |
|
Local Variational Methods |
|
|
469 | (4) |
|
|
473 | (2) |
|
|
475 | (2) |
|
|
477 | (10) |
|
Particle-Based Approximate Inference |
|
|
487 | (64) |
|
|
488 | (4) |
|
Sampling from a Bayesian Network |
|
|
488 | (2) |
|
|
490 | (1) |
|
Conditional Probability Queries |
|
|
491 | (1) |
|
Likelihood Weighting and Importance Sampling |
|
|
492 | (13) |
|
Likelihood Weighting: Intuition |
|
|
492 | (2) |
|
|
494 | (4) |
|
Importance Sampling for Bayesian Networks |
|
|
498 | (6) |
|
Importance Sampling Revisited |
|
|
504 | (1) |
|
Markov Chain Monte Carlo Methods |
|
|
505 | (21) |
|
|
505 | (2) |
|
|
507 | (5) |
|
|
512 | (3) |
|
A Broader Class of Markov Chains |
|
|
515 | (3) |
|
|
518 | (8) |
|
|
526 | (10) |
|
Collapsed Likelihood Weighting |
|
|
527 | (4) |
|
|
531 | (5) |
|
Deterministic Search Methods |
|
|
536 | (4) |
|
|
540 | (1) |
|
|
541 | (3) |
|
|
544 | (7) |
|
|
551 | (54) |
|
|
551 | (3) |
|
|
551 | (1) |
|
Overview of Solution Methods |
|
|
552 | (2) |
|
Variable Elimination for (Marginal) MAP |
|
|
554 | (8) |
|
Max-Product Variable Elimination |
|
|
554 | (2) |
|
Finding the Most Probable Assignment |
|
|
556 | (3) |
|
Variable Elimination for Marginal MAP |
|
|
559 | (3) |
|
Max-Product in Clique Trees |
|
|
562 | (5) |
|
|
562 | (2) |
|
Message Passing as Reparameterization |
|
|
564 | (1) |
|
|
565 | (2) |
|
Max-Product Belief Propagation in Loopy Cluster Graphs |
|
|
567 | (10) |
|
Standard Max-Product Message Passing |
|
|
567 | (5) |
|
Max-Product BP with Counting Numbers |
|
|
572 | (3) |
|
|
575 | (2) |
|
MAP as a Linear Optimization Problem |
|
|
577 | (11) |
|
The Integer Program Formulation |
|
|
577 | (2) |
|
Linear Programming Relaxation |
|
|
579 | (2) |
|
|
581 | (7) |
|
|
588 | (7) |
|
Inference Using Graph Cuts |
|
|
588 | (4) |
|
|
592 | (3) |
|
|
595 | (2) |
|
|
597 | (1) |
|
|
598 | (3) |
|
|
601 | (4) |
|
Inference in Hybrid Networks |
|
|
605 | (46) |
|
|
605 | (3) |
|
|
605 | (1) |
|
|
606 | (1) |
|
|
607 | (1) |
|
Variable Elimination in Gaussian Networks |
|
|
608 | (7) |
|
|
609 | (2) |
|
|
611 | (1) |
|
Gaussian Belief Propagation |
|
|
612 | (3) |
|
|
615 | (15) |
|
|
615 | (3) |
|
Factor Operations for Hybrid Gaussian Networks |
|
|
618 | (3) |
|
|
621 | (5) |
|
An ``Exact'' CLG Algorithm |
|
|
626 | (4) |
|
|
630 | (12) |
|
|
631 | (6) |
|
Expectation Propagation with Gaussian Approximation |
|
|
637 | (5) |
|
Particle-Based Approximation Methods |
|
|
642 | (4) |
|
Sampling in Continuous Spaces |
|
|
642 | (1) |
|
Forward Sampling in Bayesian Networks |
|
|
643 | (1) |
|
|
644 | (1) |
|
|
645 | (1) |
|
Nonparametric Message Passing |
|
|
646 | (1) |
|
|
646 | (1) |
|
|
647 | (2) |
|
|
649 | (2) |
|
Inference in Temporal Models |
|
|
651 | (44) |
|
|
652 | (1) |
|
|
653 | (7) |
|
Filtering in State-Observation Models |
|
|
653 | (1) |
|
Filtering as Clique Tree Propagation |
|
|
654 | (1) |
|
Clique Tree Inference in DBNs |
|
|
655 | (1) |
|
|
656 | (4) |
|
|
660 | (15) |
|
|
661 | (1) |
|
Factored Belief State Methods |
|
|
662 | (3) |
|
|
665 | (10) |
|
Deterministic Search Techniques |
|
|
675 | (1) |
|
|
675 | (13) |
|
|
676 | (8) |
|
|
684 | (4) |
|
|
688 | (2) |
|
|
690 | (2) |
|
|
692 | (3) |
|
|
695 | (312) |
|
Learning Graphical Models: Overview |
|
|
697 | (20) |
|
|
697 | (1) |
|
|
698 | (4) |
|
|
698 | (2) |
|
Specific Prediction Tasks |
|
|
700 | (1) |
|
|
701 | (1) |
|
|
702 | (9) |
|
Empirical Risk and Overfitting |
|
|
703 | (6) |
|
Discriminative versus Generative Training |
|
|
709 | (2) |
|
|
711 | (4) |
|
|
712 | (1) |
|
|
712 | (2) |
|
Taxonomy of Learning Tasks |
|
|
714 | (1) |
|
|
715 | (2) |
|
|
717 | (66) |
|
Maximum Likelihood Estimation |
|
|
717 | (5) |
|
|
717 | (3) |
|
The Maximum Likelihood Principle |
|
|
720 | (2) |
|
MLE for Bayesian Networks |
|
|
722 | (11) |
|
|
723 | (1) |
|
Global Likelihood Decomposition |
|
|
724 | (1) |
|
|
725 | (3) |
|
Gaussian Bayesian Networks |
|
|
728 | (3) |
|
Maximum Likelihood Estimation as M-Projection |
|
|
731 | (2) |
|
Bayesian Parameter Estimation |
|
|
733 | (8) |
|
The Thumbtack Example Revisited |
|
|
733 | (4) |
|
|
737 | (4) |
|
Bayesian Parameter Estimation in Bayesian Networks |
|
|
741 | (13) |
|
Parameter Independence and Global Decomposition |
|
|
742 | (4) |
|
|
746 | (2) |
|
Priors for Bayesian Network Learning |
|
|
748 | (3) |
|
|
751 | (3) |
|
Learning Models with Shared Parameters |
|
|
754 | (15) |
|
|
755 | (5) |
|
|
760 | (2) |
|
Bayesian Inference with Shared Parameters |
|
|
762 | (1) |
|
|
763 | (6) |
|
|
769 | (7) |
|
|
769 | (1) |
|
|
770 | (6) |
|
|
776 | (1) |
|
|
777 | (1) |
|
|
778 | (5) |
|
Structure Learning in Bayesian Networks |
|
|
783 | (66) |
|
|
783 | (3) |
|
|
783 | (2) |
|
|
785 | (1) |
|
Constraint-Based Approaches |
|
|
786 | (4) |
|
|
786 | (1) |
|
|
787 | (3) |
|
|
790 | (17) |
|
|
791 | (3) |
|
|
794 | (3) |
|
Marginal Likelihood for a Single Variable |
|
|
797 | (2) |
|
Bayesian Score for Bayesian Networks |
|
|
799 | (2) |
|
Understanding the Bayesian Score |
|
|
801 | (3) |
|
|
804 | (3) |
|
|
807 | (1) |
|
|
807 | (17) |
|
Learning Tree-Structured Networks |
|
|
808 | (1) |
|
|
809 | (2) |
|
|
811 | (10) |
|
Learning with Equivalence Classes |
|
|
821 | (3) |
|
|
824 | (8) |
|
|
824 | (2) |
|
Model Averaging Given an Order |
|
|
826 | (2) |
|
|
828 | (4) |
|
Learning Models with Additional Structure |
|
|
832 | (6) |
|
Learning with Local Structure |
|
|
833 | (4) |
|
|
837 | (1) |
|
|
838 | (2) |
|
|
840 | (3) |
|
|
843 | (6) |
|
|
849 | (94) |
|
|
849 | (13) |
|
Likelihood of Data and Observation Models |
|
|
849 | (4) |
|
Decoupling of Observation Mechanism |
|
|
853 | (3) |
|
|
856 | (4) |
|
|
860 | (2) |
|
|
862 | (35) |
|
|
863 | (5) |
|
Expectation Maximization (EM) |
|
|
868 | (19) |
|
Comparison: Gradient Ascent versus EM |
|
|
887 | (6) |
|
|
893 | (4) |
|
Bayesian Learning with Incomplete Data |
|
|
897 | (11) |
|
|
897 | (2) |
|
|
899 | (5) |
|
Variational Bayesian Learning |
|
|
904 | (4) |
|
|
908 | (17) |
|
|
909 | (8) |
|
|
917 | (3) |
|
|
920 | (5) |
|
Learning Models with Hidden Variables |
|
|
925 | (8) |
|
Information Content of Hidden Variables |
|
|
926 | (2) |
|
Determining the Cardinality |
|
|
928 | (2) |
|
Introducing Hidden Variables |
|
|
930 | (3) |
|
|
933 | (1) |
|
|
934 | (1) |
|
|
935 | (8) |
|
Learning Undirected Models |
|
|
943 | (64) |
|
|
943 | (1) |
|
|
944 | (5) |
|
|
944 | (2) |
|
Form of the Likelihood Function |
|
|
946 | (1) |
|
Properties of the Likelihood Function |
|
|
947 | (2) |
|
Maximum (Conditional) Likelihood Parameter Estimation |
|
|
949 | (9) |
|
Maximum Likelihood Estimation |
|
|
949 | (1) |
|
Conditionally Trained Models |
|
|
950 | (4) |
|
Learning with Missing Data |
|
|
954 | (2) |
|
Maximum Entropy and Maximum Likelihood |
|
|
956 | (2) |
|
Parameter Priors and Regularization |
|
|
958 | (3) |
|
|
958 | (3) |
|
|
961 | (1) |
|
Learning with Approximate Inference |
|
|
961 | (8) |
|
|
962 | (5) |
|
|
967 | (2) |
|
|
969 | (9) |
|
Pseudolikelihood and Its Generalizations |
|
|
970 | (4) |
|
Contrastive Optimization Criteria |
|
|
974 | (4) |
|
|
978 | (18) |
|
Structure Learning Using Independence Tests |
|
|
979 | (2) |
|
Score-Based Learning: Hypothesis Spaces |
|
|
981 | (1) |
|
|
982 | (3) |
|
|
985 | (7) |
|
Evaluating Changes to the Model |
|
|
992 | (4) |
|
|
996 | (2) |
|
|
998 | (3) |
|
|
1001 | (6) |
|
|
1007 | (128) |
|
|
1009 | (48) |
|
|
1009 | (5) |
|
Conditioning and Intervention |
|
|
1009 | (3) |
|
Correlation and Causation |
|
|
1012 | (2) |
|
|
1014 | (3) |
|
Structural Causal Identifiability |
|
|
1017 | (9) |
|
Query Simplification Rules |
|
|
1017 | (3) |
|
Iterated Query Simplification |
|
|
1020 | (6) |
|
Mechanisms and Response Variables |
|
|
1026 | (5) |
|
Partial Identifiability in Functional Causal Models |
|
|
1031 | (3) |
|
|
1034 | (5) |
|
|
1034 | (3) |
|
Bounds on Counterfactual Queries |
|
|
1037 | (2) |
|
|
1039 | (13) |
|
Learning Causal Models without Confounding Factors |
|
|
1040 | (3) |
|
Learning from Interventional Data |
|
|
1043 | (4) |
|
Dealing with Latent Variables |
|
|
1047 | (3) |
|
Learning Functional Causal Models |
|
|
1050 | (2) |
|
|
1052 | (1) |
|
|
1053 | (1) |
|
|
1054 | (3) |
|
|
1057 | (26) |
|
Foundations: Maximizing Expected Utility |
|
|
1057 | (5) |
|
Decision Making Under Uncertainty |
|
|
1057 | (3) |
|
Theoretical Justification |
|
|
1060 | (2) |
|
|
1062 | (4) |
|
|
1063 | (1) |
|
|
1064 | (1) |
|
|
1065 | (1) |
|
|
1066 | (3) |
|
Utility Elicitation Procedures |
|
|
1066 | (1) |
|
|
1067 | (2) |
|
Utilities of Complex Outcomes |
|
|
1069 | (10) |
|
Preference and Utility Independence |
|
|
1069 | (3) |
|
Additive Independence Properties |
|
|
1072 | (7) |
|
|
1079 | (1) |
|
|
1080 | (2) |
|
|
1082 | (1) |
|
Structured Decision Problems |
|
|
1083 | (48) |
|
|
1083 | (3) |
|
|
1083 | (2) |
|
Backward Induction Algorithm |
|
|
1085 | (1) |
|
|
1086 | (7) |
|
|
1087 | (1) |
|
|
1088 | (2) |
|
|
1090 | (1) |
|
Semantics and Optimality Criterion |
|
|
1091 | (2) |
|
Backward Induction in Influence Diagrams |
|
|
1093 | (5) |
|
Decision Trees for Influence Diagrams |
|
|
1094 | (2) |
|
|
1096 | (2) |
|
Computing Expected Utilities |
|
|
1098 | (7) |
|
Simple Variable Elimination |
|
|
1098 | (2) |
|
Multiple Utility Variables: Simple Approaches |
|
|
1100 | (1) |
|
Generalized Variable Elimination |
|
|
1101 | (4) |
|
Optimization in Influence Diagrams |
|
|
1105 | (12) |
|
Optimizing a Single Decision Rule |
|
|
1105 | (1) |
|
Iterated Optimization Algorithm |
|
|
1106 | (2) |
|
Strategic Relevance and Global Optimality |
|
|
1108 | (9) |
|
Ignoring Irrelevant Information |
|
|
1117 | (2) |
|
|
1119 | (5) |
|
|
1120 | (2) |
|
|
1122 | (2) |
|
|
1124 | (1) |
|
|
1125 | (3) |
|
|
1128 | (3) |
|
|
1131 | (4) |
|
|
1135 | (36) |
|
|
1135 | (6) |
|
|
1135 | (2) |
|
Conditional Entropy and Information |
|
|
1137 | (1) |
|
Relative Entropy and Distances Between Distributions |
|
|
1138 | (3) |
|
|
1141 | (3) |
|
|
1142 | (1) |
|
|
1143 | (1) |
|
Algorithms and Algorithmic Complexity |
|
|
1144 | (8) |
|
|
1144 | (1) |
|
Analysis of Algorithmic Complexity |
|
|
1145 | (2) |
|
|
1147 | (1) |
|
|
1148 | (4) |
|
Combinatorial Optimization and Search |
|
|
1152 | (7) |
|
|
1152 | (1) |
|
|
1152 | (6) |
|
|
1158 | (1) |
|
|
1159 | (12) |
|
Characterizing Optima of a Continuous Function |
|
|
1159 | (2) |
|
|
1161 | (4) |
|
|
1165 | (4) |
|
|
1169 | (2) |
Bibliography |
|
1171 | (38) |
Notation Index |
|
1209 | (4) |
Subject Index |
|
1213 | |