Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Information Retrieval: Implementing and Evaluating Search Engines [Pehme köide]

4.22/5 (18 hinnangut Goodreads-ist)

Stefan Büttcher, Gordon V. Cormack (University of Waterloo), Charles L. A. Clarke (University of Waterloo)

Formaat: Paperback / softback, 632 pages, kõrgus x laius x paksus: 229x203x27 mm, 127 b&w illus.; 254 Illustrations
Sari: The MIT Press
Ilmumisaeg: 12-Feb-2016
Kirjastus: MIT Press
ISBN-10: 0262528878
ISBN-13: 9780262528870

Teised raamatud teemal:

Pehme köide
Hind: 60,85 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 632 pages, kõrgus x laius x paksus: 229x203x27 mm, 127 b&w illus.; 254 Illustrations
Sari: The MIT Press
Ilmumisaeg: 12-Feb-2016
Kirjastus: MIT Press
ISBN-10: 0262528878
ISBN-13: 9780262528870

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9780262528870.html

Märksõnad:

Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus -- a multiuser open-source information retrieval system developed by one of the authors and available online -- provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use,Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

Muu info

Winner of Honorable Mention, 2010 American Publishers Award for Professional and Scholarly Excellence (PROSE) in the Computing and Information Sciences category 2010.

Foreword

xix

Preface

xxi

Notation

xxv

I Foundations

(102)

1 Introduction

(31)

1.1 What Is Information Retrieval?

(3)

1.1.1 Web Search

(1)

1.1.2 Other Search Applications

(1)

1.1.3 Other IR Applications

(1)

1.2 Information Retrieval Systems

(4)

1.2.1 Basic IR System Architecture

(2)

1.2.2 Documents and Update

(1)

1.2.3 Performance Evaluation

(1)

1.3 Working with Electronic Text

(14)

1.3.1 Text Formats

(4)

1.3.2 A Simple Tokenization of English Text

(2)

1.3.3 Term Distributions

(2)

1.3.4 Language Modeling

(6)

1.4 Test Collections

(4)

1.4.1 TREC Tasks

(3)

1.5 Open-Source IR Systems

(1)

1.5.1 Lucene

(1)

1.5.2 Indri

(1)

1.5.3 Wumpus

(1)

1.6 Further Reading

(2)

1.7 Exercises

(2)

1.8 Bibliography

(1)

2 Basic Techniques

(51)

2.1 Inverted Indices

(18)

2.1.1 Extended Example: Phrase Search

(4)

2.1.2 Implementing Inverted Indices

(6)

2.1.3 Documents and Other Elements

(6)

2.2 Retrieval and Ranking

(15)

2.2.1 The Vector Space Model

(6)

2.2.2 Proximity Ranking

(3)

2.2.3 Boolean Retrieval

(3)

2.3 Evaluation

(10)

2.3.1 Recall and Precision

(1)

2.3.2 Effectiveness Measures for Ranked Retrieval

(5)

2.3.3 Building a Test Collection

(2)

2.3.4 Efficiency Measures

(1)

2.4 Summary

(1)

2.5 Further Reading

(2)

2.6 Exercises

(3)

2.7 Bibliography

(2)

3 Tokens and Terms

(19)

3.1 English

(6)

3.1.1 Punctuation and Capitalization

(1)

3.1.2 Stemming

(3)

3.1.3 Stopping

(2)

3.2 Characters

(1)

3.3 Character N-Grams

(2)

3.4 European Languages

(1)

3.5 CJK Languages

(2)

3.6 Further Reading

(2)

3.7 Exercises

(1)

3.8 Bibliography

100

(3)

II Indexing

103

(154)

4 Static Inverted Indices

104

(33)

4.1 Index Components and Index Life Cycle

104

(2)

4.2 The Dictionary

106

(4)

4.3 Postings Lists

110

(4)

4.4 Interleaving Dictionary and Postings Lists

114

(4)

4.5 Index Construction

118

(13)

4.5.1 In-Memory Index Construction

119

(6)

4.5.2 Sort-Based Index Construction

125

(2)

4.5.3 Merge-Based Index Construction

127

(4)

4.6 Other Types of Indices

131

(1)

4.7 Summary

132

(1)

4.8 Further Reading

132

(1)

4.9 Exercises

133

(2)

4.10 Bibliography

135

(2)

5 Query Processing

137

(37)

5.1 Query Processing for Ranked Retrieval

137

(23)

5.1.1 Document-at-a-Time Query Processing

139

(6)

5.1.2 Term-at-a-Time Query Processing

145

(6)

5.1.3 Precomputing Score Contributions

151

(2)

5.1.4 Impact Ordering

153

(1)

5.1.5 Static Index Pruning

153

(7)

5.2 Lightweight Structure

160

(9)

5.2.1 Generalized Concordance Lists

160

(2)

5.2.2 Operators

162

(1)

5.2.3 Examples

163

(2)

5.2.4 Implementation

165

(4)

5.3 Further Reading

169

(1)

5.4 Exercises

170

(1)

5.5 Bibliography

171

(3)

6 Index Compression

174

(54)

6.1 General-Purpose Data Compression

175

(1)

6.2 Symbolwise Data Compression

176

(15)

6.2.1 Modeling and Coding

177

(4)

6.2.2 Huffman Coding

181

(5)

6.2.3 Arithmetic Coding

186

(3)

6.2.4 Symbolwise Text Compression

189

(2)

6.3 Compressing Postings Lists

191

(25)

6.3.1 Nonparametric Gap Compression

192

(3)

6.3.2 Parametric Gap Compression

195

(6)

6.3.3 Context-Aware Compression Methods

201

(3)

6.3.4 Index Compression for High Query Performance

204

(5)

6.3.5 Compression Effectiveness

209

(3)

6.3.6 Decoding Performance

212

(2)

6.3.7 Document Reordering

214

(2)

6.4 Compressing the Dictionary

216

(6)

6.5 Summary

222

(1)

6.6 Further Reading

223

(1)

6.7 Exercises

224

(1)

6.8 Bibliography

225

(3)

7 Dynamic Inverted Indices

228

(29)

7.1 Batch Updates

229

(2)

7.2 Incremental Index Updates

231

(12)

7.2.1 Contiguous Inverted Lists

233

(6)

7.2.2 Noncontiguous Inverted Lists

239

(4)

7.3 Document Deletions

243

(7)

7.3.1 Invalidation List

243

(2)

7.3.2 Garbage Collection

245

(5)

7.4 Document Modifications

250

(1)

7.5 Discussion and Further Reading

251

(2)

7.6 Exercises

253

(1)

7.7 Bibliography

254

(3)

III Retrieval and Ranking

257

(148)

8 Probabilistic Retrieval

258

(28)

8.1 Modeling Relevance

259

(2)

8.2 The Binary Independence Model

261

(3)

8.3 The Robertson/Sparck Jones Weighting Formula

264

(2)

8.4 Term Frequency

266

(5)

8.4.1 Bookstein's Two-Poisson Model

267

(3)

8.4.2 Approximating the Two-Poisson Model

270

(1)

8.4.3 Query Term Frequency

271

(1)

8.5 Document Length: BM25

271

(2)

8.6 Relevance Feedback

273

(4)

8.6.1 Term Selection

274

(1)

8.6.2 Pseudo-Relevance Feedback

275

(2)

8.7 Field Weights: BM25F

277

(2)

8.8 Experimental Comparison

279

(1)

8.9 Further Reading

280

(1)

8.10 Exercises

281

(1)

8.11 Bibliography

282

(4)

9 Language Modeling and Related Methods

286

(24)

9.1 Generating Queries from Documents

287

(2)

9.2 Language Models and Smoothing

289

(3)

9.3 Ranking with Language Models

292

(4)

9.4 Kullback-Leibler Divergence

296

(2)

9.5 Divergence from Randomness

298

(4)

9.5.1 A Model of Randomness

299

(2)

9.5.2 Eliteness

301

(1)

9.5.3 Document Length Normalization

301

(1)

9.6 Passage Retrieval and Ranking

302

(4)

9.6.1 Passage Scoring

304

(1)

9.6.2 Implementation

304

(2)

9.7 Experimental Comparison

306

(1)

9.8 Further Reading

306

(1)

9.9 Exercises

307

(1)

9.10 Bibliography

307

(3)

10 Categorization and Filtering

310

(66)

10.1 Detailed Examples

313

(18)

10.1.1 Topic-Oriented Batch Filtering

313

(4)

10.1.2 On-Line Filtering

317

(1)

10.1.3 Learning from Historical Examples

318

(2)

10.1.4 Language Categorization

320

(5)

10.1.5 On-Line Adaptive Spam Filtering

325

(4)

10.1.6 Threshold Choice for Binary Categorization

329

(2)

10.2 Classification

331

(8)

10.2.1 Odds and Odds Ratios

333

(1)

10.2.2 Building Classifiers

334

(2)

10.2.3 Learning Modes

336

(2)

10.2.4 Feature Engineering

338

(1)

10.3 Probabilistic Classifiers

339

(10)

10.3.1 Probability Estimates

340

(3)

10.3.2 Combining Probability Estimates

343

(4)

10.3.3 Practical Considerations

347

(2)

10.4 Linear Classifiers

349

(5)

10.4.1 Perceptron Algorithm

352

(1)

10.4.2 Support Vector Machines

353

(1)

10.5 Similarity-Based Classifiers

354

(1)

10.5.1 Rocchio's Method

354

(1)

10.5.2 Memory-Based Methods

355

(1)

10.6 Generalized Linear Models

355

(4)

10.6.1 Kernel Methods

357

(2)

10.7 Information-Theoretic Models

359

(7)

10.7.1 Comparing Models

360

(1)

10.7.2 Sequential Compression Models

361

(3)

10.7.3 Decision Trees and Stumps

364

(2)

10.8 Experimental Comparison

366

(5)

10.8.1 Topic-Oriented On-Line Filtering

367

(2)

10.8.2 On-Line Adaptive Spam Filtering

369

(2)

10.9 Further Reading

371

(1)

10.10 Exercises

372

(1)

10.11 Bibliography

373

(3)

11 Fusion and Metalearning

376

(29)

11.1 Search-Result Fusion

377

(4)

11.1.1 Fixed-Cutoff Aggregation

379

(1)

11.1.2 Rank and Score Aggregation

380

(1)

11.2 Stacking Adaptive Filters

381

(2)

11.3 Stacking Batch Classifiers

383

(2)

11.3.1 Holdout Validation

383

(1)

11.3.2 Cross-Validation

384

(1)

11.4 Bagging

385

(2)

11.5 Boosting

387

(1)

11.6 Multicategory Ranking and Classification

388

(6)

11.6.1 Document Versus Category Scores

389

(1)

11.6.2 Document Versus Category Rank Fusion

390

(1)

11.6.3 Multicategory Methods

391

(3)

11.7 Learning to Rank

394

(6)

11.7.1 What Is Learning to Rank?

395

(1)

11.7.2 Learning-to-Rank Methods

396

(1)

11.7.3 What to Optimize?

396

(1)

11.7.4 Learning to Rank for Categorization

397

(1)

11.7.5 Learning for Ranked IR

398

(1)

11.7.6 The LETOR Data Set

399

(1)

11.8 Further Reading

400

(1)

11.9 Exercises

401

(1)

11.10 Bibliography

401

(4)

IV Evaluation

405

(82)

12 Measuring Effectiveness

406

(62)

12.1 Traditional Effectiveness Measures

407

(3)

12.1.1 Recall and Precision

407

(1)

12.1.2 Precision at k Documents (P@k)

408

(1)

12.1.3 Average Precision (AP)

408

(1)

12.1.4 Reciprocal Rank (RR)

409

(1)

12.1.5 Arithmetic Mean Versus Geometric Mean

409

(1)

12.1.6 User Satisfaction

410

(1)

12.2 The Text REtrieval Conference (TREC)

410

(2)

12.3 Using Statistics in Evaluation

412

(29)

12.3.1 Foundations and Terminology

413

(3)

12.3.2 Confidence Intervals

416

(8)

12.3.3 Comparative Evaluation

424

(3)

12.3.4 Hypothesis Tests Considered Harmful

427

(2)

12.3.5 Paired and Unpaired Differences

429

(1)

12.3.6 Significance Tests

430

(4)

12.3.7 Validity and Power of Statistical Tests

434

(4)

12.3.8 Reporting the Precision of Measurement

438

(1)

12.3.9 Meta-Analysis

439

(2)

12.4 Minimizing Adjudication Effort

441

(10)

12.4.1 Selecting Documents for Adjudication

443

(6)

12.4.2 Sampling the Pool

449

(2)

12.5 Nontraditional Effectiveness Measures

451

(9)

12.5.1 Graded Relevance

451

(2)

12.5.2 Incomplete and Biased Judgments

453

(2)

12.5.3 Novelty and Diversity

455

(5)

12.6 Further Reading

460

(2)

12.7 Exercises

462

(1)

12.8 Bibliography

463

(5)

13 Measuring Efficiency

468

(19)

13.1 Efficiency Criteria

468

(4)

13.1.1 Throughput and Latency

469

(3)

13.1.2 Aggregate Statistics and User Satisfaction

472

(1)

13.2 Queueing Theory

472

(6)

13.2.1 Kendall's Notation

474

(1)

13.2.2 The M/M/1 Queueing Model

475

(2)

13.2.3 Latency Quantiles and Average Utilization

477

(1)

13.3 Query Scheduling

478

(1)

13.4 Caching

479

(5)

13.4.1 Three-Level Caching

480

(2)

13.4.2 Cache Policies

482

(1)

13.4.3 Prefetching Search Results

483

(1)

13.5 Further Reading

484

(1)

13.6 Exercises

484

(1)

13.7 Bibliography

485

(2)

V Applications and Extensions

487

(104)

14 Parallel Information Retrieval

488

(19)

14.1 Parallel Query Processing

488

(10)

14.1.1 Document Partitioning

490

(3)

14.1.2 Term Partitioning

493

(2)

14.1.3 Hybrid Schemes

495

(1)

14.1.4 Redundancy and Fault Tolerance

496

(2)

14.2 MapReduce

498

(5)

14.2.1 The Basic Framework

498

(2)

14.2.2 Combiners

500

(2)

14.2.3 Secondary Keys

502

(1)

14.2.4 Machine Failures

502

(1)

14.3 Further Reading

503

(1)

14.4 Exercises

504

(1)

14.5 Bibliography

505

(2)

15 Web Search

507

(57)

15.1 The Structure of the Web

508

(5)

15.1.1 The Web Graph

508

(2)

15.1.2 Static and Dynamic Pages

510

(1)

15.1.3 The Hidden Web

511

(1)

15.1.4 The Size of the Web

511

(2)

15.2 Queries and Users

513

(4)

15.2.1 User Intent

513

(3)

15.2.2 Clickthrough Curves

516

(1)

15.3 Static Ranking

517

(18)

15.3.1 Basic PageRank

517

(5)

15.3.2 Extended PageRank

522

(6)

15.3.3 Properties of PageRank

528

(4)

15.3.4 Other Link Analysis Methods: HITS and SALSA

532

(3)

15.3.5 Other Static Ranking Methods

535

(1)

15.4 Dynamic Ranking

535

(3)

15.4.1 Anchor Text

536

(1)

15.4.2 Novelty

537

(1)

15.5 Evaluating Web Search

538

(3)

15.5.1 Named Page Finding

538

(2)

15.5.2 Implicit User Feedback

540

(1)

15.6 Web Crawlers

541

(12)

15.6.1 Components of a Crawler

542

(5)

15.6.2 Crawl Order

547

(2)

15.6.3 Duplicates and Near-Duplicates

549

(4)

15.7 Summary

553

(1)

15.8 Further Reading

553

(3)

15.8.1 Link Analysis

554

(1)

15.8.2 Anchor Text

555

(1)

15.8.3 Implicit Feedback

555

(1)

15.8.4 Web Crawlers

556

(1)

15.9 Exercises

556

(2)

15.10 Bibliography

558

(6)

16 XML Retrieval

564

(27)

16.1 The Essence of XML

565

(6)

16.1.1 Document Type Definitions

568

(2)

16.1.2 XML Schema

570

(1)

16.2 Paths, Trees, and FLWORs

571

(5)

16.2.1 XPath

571

(1)

16.2.2 NEXT

572

(2)

16.2.3 XQuery

574

(2)

16.3 Indexing and Query Processing

576

(3)

16.4 Ranked Retrieval

579

(4)

16.4.1 Ranking Elements

580

(2)

16.4.2 Overlapping Elements

582

(1)

16.4.3 Retrievable Elements

583

(1)

16.5 Evaluation

583

(2)

16.5.1 Test Collections

583

(1)

16.5.2 Effectiveness Measures

584

(1)

16.6 Further Reading

585

(2)

16.7 Exercises

587

(1)

16.8 Bibliography

587

(4)

VI Appendix

591

(6)

A Computer Performance

592

(5)

A.1 Sequential Versus Random Access on Disk

592

(1)

A.2 Sequential Versus Random Access in RAM

593

(1)

A.3 Pipelined Execution and Branch Prediction

594

(3)

Index

597

Information Retrieval: Implementing and Evaluating Search Engines [Pehme köide]

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv