|
|
xi | |
Preface |
|
xv | |
|
|
1 | (17) |
|
An example information retrieval problem |
|
|
3 | (3) |
|
A first take at building an inverted index |
|
|
6 | (3) |
|
Processing Boolean queries |
|
|
9 | (4) |
|
The extended Boolean model versus ranked retrieval |
|
|
13 | (3) |
|
References and further reading |
|
|
16 | (2) |
|
The term vocabulary and posting lists |
|
|
18 | (27) |
|
Document delineation and character sequence decoding |
|
|
18 | (3) |
|
Determining the vocabulary of terms |
|
|
21 | (12) |
|
Faster posting list intersection via skip pointers |
|
|
33 | (3) |
|
Positional postings and phrase queries |
|
|
36 | (7) |
|
References and further reading |
|
|
43 | (2) |
|
Dictionaries and tolerant retrieval |
|
|
45 | (16) |
|
Search structures for dictionaries |
|
|
45 | (3) |
|
|
48 | (4) |
|
|
52 | (6) |
|
|
58 | (1) |
|
References and further reading |
|
|
59 | (2) |
|
|
61 | (17) |
|
|
62 | (1) |
|
Blocked sort-based indexing |
|
|
63 | (3) |
|
Single-pass in memory indexing |
|
|
66 | (2) |
|
|
68 | (3) |
|
|
71 | (2) |
|
|
73 | (3) |
|
References and further reading |
|
|
76 | (2) |
|
|
78 | (22) |
|
Statistical properties of terms in information retrieval |
|
|
79 | (3) |
|
|
82 | (5) |
|
|
87 | (10) |
|
References and further reading |
|
|
97 | (3) |
|
Scoring, term weighting, and the vector space model |
|
|
100 | (24) |
|
Parametric and zone indexes |
|
|
101 | (6) |
|
Term frequency and weighting |
|
|
107 | (3) |
|
The vector space model for scoring |
|
|
110 | (6) |
|
|
116 | (6) |
|
References and further reading |
|
|
122 | (2) |
|
Computing scores in a complete search system |
|
|
124 | (15) |
|
Efficient scoring and ranking |
|
|
124 | (8) |
|
Components of an information retrieval system |
|
|
132 | (4) |
|
Vector space scoring and query operator interaction |
|
|
136 | (1) |
|
References and further reading |
|
|
137 | (2) |
|
Evaluation in information retrieval |
|
|
139 | (23) |
|
Information retrieval system evaluation |
|
|
140 | (1) |
|
Standard test collections |
|
|
141 | (1) |
|
Evaluation of unranked retrieval sets |
|
|
142 | (3) |
|
Evaluation of ranked retrieval results |
|
|
145 | (6) |
|
|
151 | (3) |
|
A broader perspective: System quality and user utility |
|
|
154 | (3) |
|
|
157 | (2) |
|
References and further reading |
|
|
159 | (3) |
|
Relevance feedback and query expansion |
|
|
162 | (16) |
|
Relevance feedback and pseudo relevance feedback |
|
|
163 | (10) |
|
Global methods for query reformulation |
|
|
173 | (4) |
|
References and further reading |
|
|
177 | (1) |
|
|
178 | (23) |
|
|
180 | (3) |
|
Challenges in XML retrieval |
|
|
183 | (5) |
|
A vector space model for XML retrieval |
|
|
188 | (4) |
|
Evaluation of XML retrieval |
|
|
192 | (4) |
|
Text-centric versus data-centric XML retrieval |
|
|
196 | (2) |
|
References and further reading |
|
|
198 | (3) |
|
Probabilistic information retrieval |
|
|
201 | (17) |
|
Review of basic probability theory |
|
|
202 | (1) |
|
The probability ranking principle |
|
|
203 | (1) |
|
The binary independence model |
|
|
204 | (8) |
|
An appraisal and some extensions |
|
|
212 | (4) |
|
References and further reading |
|
|
216 | (2) |
|
Language models for information retrieval |
|
|
218 | (16) |
|
|
218 | (5) |
|
The query likelihood model |
|
|
223 | (6) |
|
Language modeling versus other approaches in information retrieval |
|
|
229 | (1) |
|
Extended language modeling approaches |
|
|
230 | (2) |
|
References and further reading |
|
|
232 | (2) |
|
Text classification and Naive Bayes |
|
|
234 | (32) |
|
The text classification problem |
|
|
237 | (1) |
|
Naive Bayes text classification |
|
|
238 | (5) |
|
|
243 | (2) |
|
Properties of Naive Bayes |
|
|
245 | (6) |
|
|
251 | (7) |
|
Evaluation of text classification |
|
|
258 | (6) |
|
References and further reading |
|
|
264 | (2) |
|
Vector space classification |
|
|
266 | (27) |
|
Document representations and measures of relatedness in vector spaces |
|
|
267 | (2) |
|
|
269 | (4) |
|
|
273 | (4) |
|
Linear versus nonlinear classifiers |
|
|
277 | (4) |
|
Classification with more than two classes |
|
|
281 | (3) |
|
The bias-variance tradeoff |
|
|
284 | (7) |
|
References and further reading |
|
|
291 | (2) |
|
Support vector machines and machine learning on documents |
|
|
293 | (28) |
|
Support vector machines: The linearly separable case |
|
|
294 | (6) |
|
Extensions to the support vector machine model |
|
|
300 | (7) |
|
Issues in the classification of text documents |
|
|
307 | (7) |
|
Machine-learning methods in ad hoc information retrieval |
|
|
314 | (4) |
|
References and further reading |
|
|
318 | (3) |
|
|
321 | (25) |
|
Clustering in information retrieval |
|
|
322 | (4) |
|
|
326 | (1) |
|
|
327 | (4) |
|
|
331 | (7) |
|
|
338 | (5) |
|
References and further reading |
|
|
343 | (3) |
|
|
346 | (23) |
|
Hierarchical agglomerative clustering |
|
|
347 | (3) |
|
Single-link and complete-link clustering |
|
|
350 | (6) |
|
Group-average agglomerative clustering |
|
|
356 | (2) |
|
|
358 | (2) |
|
Optimality of hierarchical agglomerative clustering |
|
|
360 | (2) |
|
|
362 | (1) |
|
|
363 | (2) |
|
|
365 | (2) |
|
References and further reading |
|
|
367 | (2) |
|
Matrix decompositions and latent semantic indexing |
|
|
369 | (16) |
|
|
369 | (4) |
|
Term-document matrices and singular value decompositions |
|
|
373 | (3) |
|
|
376 | (2) |
|
|
378 | (5) |
|
References and further reading |
|
|
383 | (2) |
|
|
385 | (20) |
|
|
385 | (2) |
|
|
387 | (5) |
|
Advertising as the economic model |
|
|
392 | (3) |
|
The search user experience |
|
|
395 | (1) |
|
Index size and estimation |
|
|
396 | (4) |
|
Near-duplicates and shingling |
|
|
400 | (4) |
|
References and further reading |
|
|
404 | (1) |
|
|
405 | (16) |
|
|
405 | (1) |
|
|
406 | (9) |
|
|
415 | (1) |
|
|
416 | (3) |
|
References and further reading |
|
|
419 | (2) |
|
|
421 | (20) |
|
|
422 | (2) |
|
|
424 | (9) |
|
|
433 | (6) |
|
References and further reading |
|
|
439 | (2) |
Bibliography |
|
441 | (28) |
Index |
|
469 | |