|
|
1 | (6) |
|
1.1 An End-to-End Approach |
|
|
5 | (2) |
|
2 A Few Words on Topic Modeling |
|
|
7 | (14) |
|
2.1 Dirichlet Distribution |
|
|
7 | (5) |
|
2.1.1 Exponential Distributions |
|
|
7 | (1) |
|
2.1.2 Multinomial Distribution |
|
|
8 | (1) |
|
2.1.3 Dirichlet Distribution |
|
|
9 | (1) |
|
2.1.4 Example on the Dirichlet Distribution |
|
|
10 | (2) |
|
2.2 Latent Dirichlet Allocation |
|
|
12 | (4) |
|
|
16 | (5) |
|
3 Sequential Decision Making in Spoken Dialog Management |
|
|
21 | (24) |
|
3.1 Sequential Decision Making |
|
|
21 | (14) |
|
3.1.1 Markov Decision Processes |
|
|
23 | (1) |
|
3.1.2 Partially Observable Markov Decision Processes |
|
|
24 | (3) |
|
3.1.3 Reinforcement Learning |
|
|
27 | (1) |
|
3.1.4 Solving MDPs/POMDPs |
|
|
27 | (8) |
|
3.2 Spoken Dialog Management |
|
|
35 | (10) |
|
3.2.1 MDP-Based Dialog Policy Learning |
|
|
37 | (1) |
|
3.2.2 POMDP-Based Dialog Policy Learning |
|
|
38 | (2) |
|
3.2.3 User Modeling in Dialog POMDPs |
|
|
40 | (5) |
|
4 Learning the Dialog POMDP Model Components |
|
|
45 | (22) |
|
|
45 | (1) |
|
4.2 Learning Intents as States |
|
|
46 | (6) |
|
4.2.1 Hidden Topic Markov Model for Dialogs |
|
|
46 | (4) |
|
4.2.2 Learning Intents from SACTI-1 Dialogs |
|
|
50 | (2) |
|
4.3 Learning the Transition Model |
|
|
52 | (2) |
|
4.4 Learning Observations and Observation Model |
|
|
54 | (3) |
|
4.4.1 Keyword Observation Model |
|
|
55 | (1) |
|
4.4.2 Intent Observation Model |
|
|
56 | (1) |
|
4.5 Example on SACTI Dialogs |
|
|
57 | (7) |
|
|
60 | (2) |
|
4.5.2 Learned POMDP Evaluation |
|
|
62 | (2) |
|
|
64 | (3) |
|
5 Learning the Reward Function |
|
|
67 | (22) |
|
|
67 | (2) |
|
5.2 IRL in the MDP Framework |
|
|
69 | (6) |
|
5.3 IRL in the POMDP Framework |
|
|
75 | (9) |
|
|
75 | (5) |
|
|
80 | (3) |
|
5.3.3 PB-POMDP-IRL Evaluation |
|
|
83 | (1) |
|
|
84 | (2) |
|
|
86 | (1) |
|
5.6 POMDP-IRL-BT and PB-POMDP-IRL Performance |
|
|
87 | (1) |
|
|
88 | (1) |
|
6 Application on Healthcare Dialog Management |
|
|
89 | (20) |
|
|
89 | (2) |
|
6.2 Dialog POMDP Model Learning for SmartWheeler |
|
|
91 | (6) |
|
6.2.1 Observation Model Learning |
|
|
94 | (2) |
|
6.2.2 Comparison of the Intent POMDP to the Keyword POMDP |
|
|
96 | (1) |
|
6.3 Reward Function Learning for SmartWheeler |
|
|
97 | (9) |
|
|
98 | (1) |
|
6.3.2 MDP-IRL Learned Rewards |
|
|
99 | (2) |
|
6.3.3 POMDP-IRL-BT Evaluation |
|
|
101 | (1) |
|
6.3.4 Comparison of POMDP-IRL-BT to POMDP-IRL-MC |
|
|
102 | (4) |
|
|
106 | (3) |
|
7 Conclusions and Future Work |
|
|
109 | (4) |
|
|
109 | (2) |
|
|
111 | (2) |
References |
|
113 | |