Introduction |
|
xiii | |
|
|
|
Part 1 Information Retrieval |
|
|
1 | (58) |
|
Chapter 1 Probabilistic Models for Information Retrieval |
|
|
3 | (30) |
|
|
|
|
3 | (5) |
|
1.1.1 Heuristic retrieval constraints |
|
|
6 | (2) |
|
|
8 | (2) |
|
1.3 Probability ranking principle (PRP) |
|
|
10 | (5) |
|
|
12 | (1) |
|
|
13 | (2) |
|
|
15 | (6) |
|
|
16 | (3) |
|
1.4.2 The Kullback-Leibler model |
|
|
19 | (1) |
|
1.4.3 Noisy channel model |
|
|
20 | (1) |
|
|
20 | (1) |
|
1.5 Informational approaches |
|
|
21 | (6) |
|
|
22 | (3) |
|
1.5.2 Information-based models |
|
|
25 | (2) |
|
1.6 Experimental comparison |
|
|
27 | (1) |
|
1.7 Tools for information retrieval |
|
|
28 | (1) |
|
|
28 | (1) |
|
|
29 | (4) |
|
Chapter 2 Learnable Ranking Models for Automatic Text Summarization and Information Retrieval |
|
|
33 | (26) |
|
|
|
|
|
|
|
33 | (12) |
|
2.1.1 Ranking of instances |
|
|
34 | (8) |
|
2.1.2 Ranking of alternatives |
|
|
42 | (2) |
|
2.1.3 Relation to existing frameworks |
|
|
44 | (1) |
|
2.2 Application to automatic text summarization |
|
|
45 | (4) |
|
2.2.1 Presentation of the application |
|
|
45 | (3) |
|
2.2.2 Automatic summary and learning |
|
|
48 | (1) |
|
2.3 Application to information retrieval |
|
|
49 | (5) |
|
2.3.1 Application presentation |
|
|
49 | (1) |
|
2.3.2 Search engines and learning |
|
|
50 | (3) |
|
2.3.3 Experimental results |
|
|
53 | (1) |
|
|
54 | (1) |
|
|
54 | (5) |
|
Part 2 Classification and Clustering |
|
|
59 | (162) |
|
Chapter 3 Logistic Regression and Text Classification |
|
|
61 | (24) |
|
|
|
|
|
|
|
61 | (1) |
|
3.2 Generalized linear model |
|
|
62 | (3) |
|
|
65 | (3) |
|
|
68 | (2) |
|
3.4.1 Multinomial logistic regression |
|
|
69 | (1) |
|
|
70 | (4) |
|
3.5.1 Ridge regularization |
|
|
71 | (1) |
|
3.5.2 LASSO regularization |
|
|
71 | (1) |
|
3.5.3 Selected Ridge regularization |
|
|
72 | (2) |
|
3.6 Logistic regression applied to text classification |
|
|
74 | (7) |
|
|
74 | (1) |
|
3.6.2 Data pre-processing |
|
|
75 | (1) |
|
3.6.3 Experimental results |
|
|
76 | (5) |
|
|
81 | (1) |
|
|
82 | (3) |
|
Chapter 4 Kernel Methods for Textual Information Access |
|
|
85 | (44) |
|
|
4.1 Kernel methods: context and intuitions |
|
|
85 | (3) |
|
4.2 General principles of kernel methods |
|
|
88 | (7) |
|
4.3 General problems with kernel choices (kernel engineering) |
|
|
95 | (2) |
|
4.4 Kernel versions of standard algorithms: examples of solvers |
|
|
97 | (6) |
|
4.4.1 Kernal logistic regression |
|
|
98 | (1) |
|
4.4.2 Support vector machines |
|
|
99 | (2) |
|
4.4.3 Principal component analysis |
|
|
101 | (1) |
|
|
102 | (1) |
|
4.5 Kernels for text entities |
|
|
103 | (20) |
|
4.5.1 "Bag-of-words" kernels |
|
|
104 | (1) |
|
|
105 | (2) |
|
|
107 | (2) |
|
|
109 | (3) |
|
|
112 | (4) |
|
|
116 | (3) |
|
4.5.7 Kernels derived from generative models |
|
|
119 | (4) |
|
|
123 | (1) |
|
|
124 | (5) |
|
Chapter 5 Topic-Based Generative Models for Text Information Access |
|
|
129 | (50) |
|
|
|
129 | (6) |
|
5.1.1 Generative versus discriminative models |
|
|
129 | (2) |
|
|
131 | (2) |
|
5.1.3 Estimation, prediction and smoothing |
|
|
133 | (1) |
|
5.1.4 Terminology and notations |
|
|
134 | (1) |
|
|
135 | (7) |
|
5.2.1 Fundamental principles |
|
|
135 | (1) |
|
|
136 | (2) |
|
|
138 | (1) |
|
5.2.4 Geometric interpretation |
|
|
139 | (2) |
|
5.2.5 Application to text categorization |
|
|
141 | (1) |
|
|
142 | (19) |
|
5.3.1 Probabilistic Latent Semantic Indexing |
|
|
143 | (3) |
|
5.3.2 Latent Dirichlet Allocation |
|
|
146 | (14) |
|
|
160 | (1) |
|
|
161 | (3) |
|
5.4.1 Limitations of the multinomial |
|
|
161 | (1) |
|
5.4.2 Dirichlet compound multinomial |
|
|
162 | (1) |
|
|
163 | (1) |
|
5.5 Similarity measures between documents |
|
|
164 | (4) |
|
|
165 | (1) |
|
5.5.2 Similarity between topic distributions |
|
|
165 | (1) |
|
|
166 | (2) |
|
|
168 | (1) |
|
5.7 Appendix: topic model software |
|
|
169 | (1) |
|
|
170 | (9) |
|
Chapter 6 Conditional Random Fields for Information Extraction |
|
|
179 | (42) |
|
|
|
|
179 | (1) |
|
6.2 Information extraction |
|
|
180 | (4) |
|
|
180 | (2) |
|
|
182 | (1) |
|
|
182 | (1) |
|
6.2.4 Approaches not based on machine learning |
|
|
183 | (1) |
|
6.3 Machine learning for information extraction |
|
|
184 | (3) |
|
6.3.1 Usage and limitations |
|
|
184 | (1) |
|
6.3.2 Some applicable machine learning methods |
|
|
185 | (1) |
|
6.3.3 Annotating to extract |
|
|
186 | (1) |
|
6.4 Introduction to conditional random fields |
|
|
187 | (6) |
|
6.4.1 Formalization of a labelling problem |
|
|
187 | (1) |
|
6.4.2 Maximum entropy model approach |
|
|
188 | (2) |
|
6.4.3 Hidden Markov model approach |
|
|
190 | (1) |
|
|
191 | (2) |
|
6.5 Conditional random fields |
|
|
193 | (10) |
|
|
193 | (2) |
|
6.5.2 Factorization and graphical models |
|
|
195 | (1) |
|
|
196 | (2) |
|
|
198 | (2) |
|
6.5.5 Inference algorithms |
|
|
200 | (1) |
|
|
201 | (2) |
|
6.6 Conditional random fields and their applications |
|
|
203 | (11) |
|
6.6.1 Linear conditional random fields |
|
|
204 | (1) |
|
6.6.2 Links between linear CRFs and hidden Markov models |
|
|
205 | (3) |
|
6.6.3 Interests and applications of CRFs |
|
|
208 | (2) |
|
|
210 | (1) |
|
|
211 | (3) |
|
|
214 | (1) |
|
|
215 | (6) |
|
|
221 | (84) |
|
Chapter 7 Statistical Methods for Machine Translation |
|
|
223 | (82) |
|
|
|
|
223 | (4) |
|
7.1.1 Machine translation in the age of the Internet |
|
|
223 | (3) |
|
7.1.2 Organization of the Chapter |
|
|
226 | (1) |
|
7.1.3 Terminological remarks |
|
|
227 | (1) |
|
7.2 Probabilistic machine translation: an overview |
|
|
227 | (8) |
|
7.2.1 Statistical machine translation: the standard model |
|
|
228 | (2) |
|
7.2.2 Word-based models and their limitations |
|
|
230 | (4) |
|
7.2.3 Phrase-based models |
|
|
234 | (1) |
|
|
235 | (15) |
|
7.3.1 Building word alignments |
|
|
237 | (8) |
|
7.3.2 Word alignment models: a summary |
|
|
245 | (1) |
|
7.3.3 Extracting bisegments |
|
|
246 | (4) |
|
|
250 | (9) |
|
7.4.1 The space of possible reorderings |
|
|
250 | (5) |
|
7.4.2 Evaluating permutations |
|
|
255 | (4) |
|
7.5 Translation: a search problem |
|
|
259 | (13) |
|
|
259 | (2) |
|
7.5.2 The decoding problem |
|
|
261 | (1) |
|
7.5.3 Exact search algorithms |
|
|
262 | (5) |
|
7.5.4 Heuristic search algorithms |
|
|
267 | (5) |
|
7.5.5 Decoding: a solved problem? |
|
|
272 | (1) |
|
7.6 Evaluating machine translation |
|
|
272 | (7) |
|
7.6.1 Subjective evaluations |
|
|
273 | (2) |
|
|
275 | (2) |
|
7.6.3 Alternatives to BLEU |
|
|
277 | (2) |
|
7.6.4 Evaluating machine translation: an open problem |
|
|
279 | (1) |
|
7.7 State-of-the-art and recent developments |
|
|
279 | (8) |
|
7.7.1 Using source context |
|
|
279 | (2) |
|
7.7.2 Hierarchical models |
|
|
281 | (2) |
|
7.7.3 Translating with linguistic resources |
|
|
283 | (4) |
|
|
287 | (2) |
|
7.8.1 Bibliographic data and online resources |
|
|
288 | (1) |
|
|
288 | (1) |
|
7.8.3 Tools for statistical machine translation |
|
|
288 | (1) |
|
|
289 | (2) |
|
|
291 | (1) |
|
|
291 | (14) |
|
Part 4 EMERGING APPLICATIONS |
|
|
305 | (64) |
|
Chapter 8 Information Mining: Methods and Interfaces for Accessing Complex Information |
|
|
307 | (30) |
|
|
|
|
|
307 | (2) |
|
8.2 The multidimensional visualization of information |
|
|
309 | (11) |
|
8.2.1 Accessing information based on the knowledge of the structured domain |
|
|
309 | (4) |
|
8.2.2 Visualization of a set of documents via their content |
|
|
313 | (4) |
|
8.2.3 OLAP principles applied to document sets |
|
|
317 | (3) |
|
8.3 Domain mapping via social networks |
|
|
320 | (3) |
|
8.4 Analyzing the variability of searches and data merging |
|
|
323 | (4) |
|
8.4.1 Analysis of IR engine results |
|
|
323 | (2) |
|
8.4.2 Use of data unification |
|
|
325 | (2) |
|
8.5 The seven types of evaluation measures used in IR |
|
|
327 | (4) |
|
|
331 | (1) |
|
|
332 | (1) |
|
|
332 | (5) |
|
Chapter 9 Opinion Detection as a Topic Classification Problem |
|
|
337 | (32) |
|
Juan-Manuel Torres-Moreno |
|
|
|
|
|
|
337 | (2) |
|
9.2 The TREC and TAC evaluation campaigns |
|
|
339 | (8) |
|
9.2.1 Opinion detection by question-answering |
|
|
340 | (2) |
|
9.2.2 Automatic summarization of opinions |
|
|
342 | (1) |
|
9.2.3 The text mining challenge of opinion classification (DEFT (DEfi Fouille de Textes)) |
|
|
343 | (4) |
|
9.3 Cosine weights - a second glance |
|
|
347 | (1) |
|
9.4 Which components for a opinion vectors? |
|
|
348 | (4) |
|
9.4.1 How to pass from words to terms? |
|
|
349 | (3) |
|
|
352 | (5) |
|
9.5.1 Performance, analysis, and visualization of the results on the IMDB corpus |
|
|
354 | (3) |
|
9.6 Extracting opinions from speech: automatic analysis of phone polls |
|
|
357 | (8) |
|
9.6.1 France Telecom opinion investigation corpus |
|
|
358 | (2) |
|
9.6.2 Automatic recognition of spontaneous speech in opinion corpora |
|
|
360 | (3) |
|
|
363 | (2) |
|
|
365 | (1) |
|
|
366 | (3) |
|
Appendix A Probabilistic Models: An Introduction |
|
|
369 | (54) |
|
|
|
369 | (1) |
|
A.2 Supervised categorization |
|
|
370 | (14) |
|
A.2.1 Filtering documents |
|
|
370 | (2) |
|
A.2.2 The Bernoulli model |
|
|
372 | (4) |
|
A.2.3 The multinomial model |
|
|
376 | (3) |
|
A.2.4 Evaluating categorization systems |
|
|
379 | (1) |
|
|
380 | (3) |
|
|
383 | (1) |
|
A.3 Unsupervised learning: the multinomial mixture model |
|
|
384 | (7) |
|
|
384 | (2) |
|
A.3.2 Parameter estimation |
|
|
386 | (4) |
|
|
390 | (1) |
|
A.4 Markov models: statistical models for sequences |
|
|
391 | (6) |
|
|
391 | (3) |
|
A.4.2 Estimating a Markov model |
|
|
394 | (1) |
|
|
395 | (2) |
|
|
397 | (13) |
|
|
398 | (1) |
|
A.5.2 Algorithms for hidden Markov models |
|
|
399 | (11) |
|
|
410 | (1) |
|
A.7 A primer of probability theory |
|
|
411 | (9) |
|
A.7.1 Probability space, event |
|
|
411 | (1) |
|
A.7.2 Conditional independence and probability |
|
|
412 | (1) |
|
A.7.3 Random variables, moments |
|
|
413 | (5) |
|
A.7.4 Some useful distributions |
|
|
418 | (2) |
|
|
420 | (3) |
List of Authors |
|
423 | (2) |
Index |
|
425 | |