Preface |
|
xv | |
Acknowledgments |
|
xviii | |
|
PART I OVERVIEW AND BACKGROUND |
|
|
1 | (70) |
|
|
3 | (18) |
|
1.1 Functions of Text Information Systems |
|
|
7 | (3) |
|
1.2 Conceptual Framework for Text Information Systems |
|
|
10 | (3) |
|
1.3 Organization of the Book |
|
|
13 | (2) |
|
|
15 | (6) |
|
Bibliographic Notes and Further Reading |
|
|
18 | (3) |
|
|
21 | (18) |
|
2.1 Basics of Probability and Statistics |
|
|
21 | (10) |
|
|
31 | (3) |
|
|
34 | (5) |
|
Bibliographic Notes and Further Reading |
|
|
36 | (1) |
|
|
37 | (2) |
|
Chapter 3 Text Data Understanding |
|
|
39 | (18) |
|
3.1 History and State of the Art in NLP |
|
|
42 | (1) |
|
3.2 NLP and Text Information Systems |
|
|
43 | (3) |
|
|
46 | (4) |
|
3.4 Statistical Language Models |
|
|
50 | (7) |
|
Bibliographic Notes and Further Reading |
|
|
54 | (1) |
|
|
55 | (2) |
|
Chapter 4 MeTA: A Unified Toolkit for Text Data Management and Analysis |
|
|
57 | (14) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
60 | (1) |
|
4.4 Tokenization with MeTA |
|
|
61 | (3) |
|
|
64 | (7) |
|
|
65 | (6) |
|
|
71 | (168) |
|
Chapter 5 Overview of Text Data Access |
|
|
73 | (14) |
|
5.1 Access Mode: Pull vs. Push |
|
|
73 | (3) |
|
5.2 Multimode Interactive Access |
|
|
76 | (2) |
|
|
78 | (2) |
|
5.4 Text Retrieval vs. Database Retrieval |
|
|
80 | (2) |
|
5.5 Document Selection vs. Document Ranking |
|
|
82 | (5) |
|
Bibliographic Notes and Further Reading |
|
|
84 | (1) |
|
|
85 | (2) |
|
Chapter 6 Retrieval Models |
|
|
87 | (46) |
|
|
87 | (1) |
|
6.2 Common Form of a Retrieval Function |
|
|
88 | (2) |
|
6.3 Vector Space Retrieval Models |
|
|
90 | (20) |
|
6.4 Probabilistic Retrieval Models |
|
|
110 | (23) |
|
Bibliographic Notes and Further Reading |
|
|
128 | (1) |
|
|
129 | (4) |
|
|
133 | (14) |
|
7.1 Feedback in the Vector Space Model |
|
|
135 | (3) |
|
7.2 Feedback in Language Models |
|
|
138 | (9) |
|
Bibliographic Notes and Further Reading |
|
|
144 | (1) |
|
|
144 | (3) |
|
Chapter 8 Search Engine Implementation |
|
|
147 | (20) |
|
|
148 | (2) |
|
|
150 | (3) |
|
|
153 | (4) |
|
8.4 Feedback Implementation |
|
|
157 | (1) |
|
|
158 | (4) |
|
|
162 | (5) |
|
Bibliographic Notes and Further Reading |
|
|
165 | (1) |
|
|
165 | (2) |
|
Chapter 9 Search Engine Evaluation |
|
|
167 | (24) |
|
|
167 | (3) |
|
9.2 Evaluation of Set Retrieval |
|
|
170 | (4) |
|
9.3 Evaluation of a Ranked List |
|
|
174 | (6) |
|
9.4 Evaluation with Multi-level Judgements |
|
|
180 | (3) |
|
9.5 Practical Issues in Evaluation |
|
|
183 | (8) |
|
Bibliographic Notes and Further Reading |
|
|
187 | (1) |
|
|
188 | (3) |
|
|
191 | (30) |
|
|
192 | (2) |
|
|
194 | (6) |
|
|
200 | (8) |
|
|
208 | (4) |
|
10.5 The Future of Web Search |
|
|
212 | (9) |
|
Bibliographic Notes and Further Reading |
|
|
216 | (1) |
|
|
216 | (5) |
|
Chapter 11 Recommender Systems |
|
|
221 | (18) |
|
11.1 Content-based Recommendation |
|
|
222 | (7) |
|
11.2 Collaborative Filtering |
|
|
229 | (4) |
|
11.3 Evaluation of Recommender Systems |
|
|
233 | (6) |
|
Bibliographic Notes and Further Reading |
|
|
235 | (1) |
|
|
235 | (4) |
|
PART III TEXT DATA ANALYSIS |
|
|
239 | (204) |
|
Chapter 12 Overview of Text Data Analysis |
|
|
241 | (10) |
|
12.1 Motivation: Applications of Text Data Analysis |
|
|
242 | (2) |
|
12.2 Text vs. Non-text Data: Humans as Subjective Sensors |
|
|
244 | (2) |
|
12.3 Landscape of text mining tasks |
|
|
246 | (5) |
|
Chapter 13 Word Association Mining |
|
|
251 | (24) |
|
13.1 General idea of word association mining |
|
|
252 | (3) |
|
13.2 Discovery of paradigmatic relations |
|
|
255 | (5) |
|
13.3 Discovery of Syntagmatic Relations |
|
|
260 | (11) |
|
13.4 Evaluation of Word Association Mining |
|
|
271 | (4) |
|
Bibliographic Notes and Further Reading |
|
|
273 | (1) |
|
|
273 | (2) |
|
Chapter 14 Text Clustering |
|
|
275 | (24) |
|
14.1 Overview of Clustering Techniques |
|
|
277 | (2) |
|
|
279 | (5) |
|
|
284 | (10) |
|
14.4 Evaluation of Text Clustering |
|
|
294 | (5) |
|
Bibliographic Notes and Further Reading |
|
|
296 | (1) |
|
|
296 | (3) |
|
Chapter 15 Text Categorization |
|
|
299 | (18) |
|
|
299 | (1) |
|
15.2 Overview of Text Categorization Methods |
|
|
300 | (2) |
|
15.3 Text Categorization Problem |
|
|
302 | (2) |
|
15.4 Features for Text Categorization |
|
|
304 | (3) |
|
15.5 Classification Algorithms |
|
|
307 | (6) |
|
15.6 Evaluation of Text Categorization |
|
|
313 | (4) |
|
Bibliographic Notes and Further Reading |
|
|
315 | (1) |
|
|
315 | (2) |
|
Chapter 16 Text Summarization |
|
|
317 | (12) |
|
16.1 Overview of Text Summarization Techniques |
|
|
318 | (1) |
|
16.2 Extractive Text Summarization |
|
|
319 | (2) |
|
16.3 Abstractive Text Summarization |
|
|
321 | (3) |
|
16.4 Evaluation of Text Summarization |
|
|
324 | (1) |
|
16.5 Applications of Text Summarization |
|
|
325 | (4) |
|
Bibliographic Notes and Further Reading |
|
|
327 | (1) |
|
|
327 | (2) |
|
Chapter 17 Topic Analysis |
|
|
329 | (60) |
|
|
332 | (3) |
|
17.2 Topics as Word Distributions |
|
|
335 | (5) |
|
17.3 Mining One Topic from Text |
|
|
340 | (28) |
|
17.4 Probabilistic Latent Semantic Analysis |
|
|
368 | (9) |
|
17.5 Extension of PLSA and Latent Dirichlet Allocation |
|
|
377 | (6) |
|
17.6 Evaluating Topic Analysis |
|
|
383 | (1) |
|
17.7 Summary of Topic Models |
|
|
384 | (5) |
|
Bibliographic Notes and Further Reading |
|
|
385 | (1) |
|
|
386 | (3) |
|
Chapter 18 Opinion Mining and Sentiment Analysis |
|
|
389 | (24) |
|
18.1 Sentiment Classification |
|
|
393 | (3) |
|
|
396 | (4) |
|
18.3 Latent Aspect Rating Analysis |
|
|
400 | (9) |
|
18.4 Evaluation of Opinion Mining and Sentiment Analysis |
|
|
409 | (4) |
|
Bibliographic Notes and Further Reading |
|
|
410 | (1) |
|
|
410 | (3) |
|
Chapter 19 Joint Analysis of Text and Structured Data |
|
|
413 | (30) |
|
|
413 | (4) |
|
19.2 Contextual Text Mining |
|
|
417 | (2) |
|
19.3 Contextual Probabilistic Latent Semantic Analysis |
|
|
419 | (9) |
|
19.4 Topic Analysis with Social Networks as Context |
|
|
428 | (5) |
|
19.5 Topic Analysis with Time Series Context |
|
|
433 | (6) |
|
|
439 | (4) |
|
Bibliographic Notes and Further Reading |
|
|
440 | (1) |
|
|
440 | (3) |
|
PART IV UNIFIED TEXT DATA MANAGEMENT ANALYSIS SYSTEM |
|
|
443 | (14) |
|
Chapter 20 Toward A Unified System for Text Management and Analysis |
|
|
445 | (12) |
|
20.1 Text Analysis Operators |
|
|
448 | (4) |
|
|
452 | (1) |
|
20.3 MeTA as a Unified System |
|
|
453 | (4) |
|
Appendix A Bayesian Statistics |
|
|
457 | (8) |
|
A.1 Binomial Estimation and the Beta Distribution |
|
|
457 | (2) |
|
A.2 Pseudo Counts, Smoothing, and Setting Hyperparameters |
|
|
459 | (1) |
|
A.3 Generalizing to a Multinomial Distribution |
|
|
460 | (1) |
|
A.4 The Dirichlet Distribution |
|
|
461 | (2) |
|
A.5 Bayesian Estimate of Multinomial Parameters |
|
|
463 | (1) |
|
|
464 | (1) |
|
Appendix B Expectation- Maximization |
|
|
465 | (8) |
|
B.1 A Simple Mixture Unigram Language Model |
|
|
466 | (1) |
|
B.2 Maximum Likelihood Estimation |
|
|
466 | (1) |
|
B.3 Incomplete vs. Complete Data |
|
|
467 | (1) |
|
B.4 A Lower Bound of Likelihood |
|
|
468 | (1) |
|
B.5 The General Procedure of EM |
|
|
469 | (4) |
|
Appendix C KL-divergence and Dirichlet Prior Smoothing |
|
|
473 | (4) |
|
C.1 Using KL-divergence for Retrieval |
|
|
473 | (2) |
|
C.2 Using Dirichlet Prior Smoothing |
|
|
475 | (1) |
|
C.3 Computing the Query Model p(w θQ) |
|
|
475 | (2) |
References |
|
477 | (12) |
Index |
|
489 | (20) |
Authors' Biographies |
|
509 | |