Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Textual Information Access - Statistical Models: Statistical Models [Wiley Online]

Edited by Eric Gaussier, Edited by Francois Yvon

Formaat: 448 pages
Sari: ISTE
Ilmumisaeg: 13-Apr-2012
Kirjastus: ISTE Ltd and John Wiley & Sons Inc
ISBN-10: 1118562798
ISBN-13: 9781118562796

Teised raamatud teemal:

Wiley Online
Hind: 203,00 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 448 pages
Sari: ISTE
Ilmumisaeg: 13-Apr-2012
Kirjastus: ISTE Ltd and John Wiley & Sons Inc
ISBN-10: 1118562798
ISBN-13: 9781118562796

Teised raamatud teemal:

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118562796

This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access: - information extraction and retrieval; - text classification and clustering; - opinion mining; - comprehension aids (automatic summarization, machine translation, visualization). In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections. Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.

Contents

Part 1: Information Retrieval 1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier. 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong and Nicolas Usunier. Part 2: Classification and Clustering 3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis, Michel Burlet and Yves Denneulin. 4. Kernel Methods for Textual Information Access, Jean-Michel Renders. 5. Topic-Based Generative Models for Text Information Access, Jean-Cédric Chappelier. 6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi. Part 3: Multilingualism 7. Statistical Methods for Machine Translation, Alexandre Allauzen and Franēois Yvon. Part 4: Emerging Applications 8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh. 9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bčze, Patrice Bellot and Fréderic Béchet.

Introduction

xiii

Eric Gaussier

Francois Yvon

Part 1 Information Retrieval

(58)

Chapter 1 Probabilistic Models for Information Retrieval

(30)

Stephane Clinchant

Eric Gaussier

1.1 Introduction

(5)

1.1.1 Heuristic retrieval constraints

(2)

1.2 2-Poisson models

(2)

1.3 Probability ranking principle (PRP)

(5)

1.3.1 Reformulation

(1)

1.3.2 BM25

(2)

1.4 Language models

(6)

1.4.1 Smoothing methods

(3)

1.4.2 The Kullback-Leibler model

(1)

1.4.3 Noisy channel model

(1)

1.4.4 Some remarks

(1)

1.5 Informational approaches

(6)

1.5.1 DFR models

(3)

1.5.2 Information-based models

(2)

1.6 Experimental comparison

(1)

1.7 Tools for information retrieval

(1)

1.8 Conclusion

(1)

1.9 Bibliography

(4)

Chapter 2 Learnable Ranking Models for Automatic Text Summarization and Information Retrieval

(26)

Massih-Reza Amini

David Buffoni

Patrick Gallinari

Tuong Vinh Truong

Nicolas Usunier

2.1 Introduction

(12)

2.1.1 Ranking of instances

(8)

2.1.2 Ranking of alternatives

(2)

2.1.3 Relation to existing frameworks

(1)

2.2 Application to automatic text summarization

(4)

2.2.1 Presentation of the application

(3)

2.2.2 Automatic summary and learning

(1)

2.3 Application to information retrieval

(5)

2.3.1 Application presentation

(1)

2.3.2 Search engines and learning

(3)

2.3.3 Experimental results

(1)

2.4 Conclusion

(1)

2.5 Bibliography

(5)

Part 2 Classification and Clustering

(162)

Chapter 3 Logistic Regression and Text Classification

(24)

Sujeevan Aseervatham

Eric Gaussier

Anestis Antoniadis

Michel Burlet

Yves Denneulin

3.1 Introduction

(1)

3.2 Generalized linear model

(3)

3.3 Parameter estimation

(3)

3.4 Logistic regression

(2)

3.4.1 Multinomial logistic regression

(1)

3.5 Model selection

(4)

3.5.1 Ridge regularization

(1)

3.5.2 LASSO regularization

(1)

3.5.3 Selected Ridge regularization

(2)

3.6 Logistic regression applied to text classification

(7)

3.6.1 Problem statement

(1)

3.6.2 Data pre-processing

(1)

3.6.3 Experimental results

(5)

3.7 Conclusion

(1)

3.8 Bibliography

(3)

Chapter 4 Kernel Methods for Textual Information Access

(44)

Jean-Michel Renders

4.1 Kernel methods: context and intuitions

(3)

4.2 General principles of kernel methods

(7)

4.3 General problems with kernel choices (kernel engineering)

(2)

4.4 Kernel versions of standard algorithms: examples of solvers

(6)

4.4.1 Kernal logistic regression

(1)

4.4.2 Support vector machines

(2)

4.4.3 Principal component analysis

101

(1)

4.4.4 Other methods

102

(1)

4.5 Kernels for text entities

103

(20)

4.5.1 "Bag-of-words" kernels

104

(1)

4.5.2 Semantic kernels

105

(2)

4.5.3 Diffusion kernels

107

(2)

4.5.4 Sequence kernels

109

(3)

4.5.5 Tree kernels

112

(4)

4.5.6 Graph kernels

116

(3)

4.5.7 Kernels derived from generative models

119

(4)

4.6 Summary

123

(1)

4.7 Bibliography

124

(5)

Chapter 5 Topic-Based Generative Models for Text Information Access

129

(50)

Jean-Cedric Chappelier

5.1 Introduction

129

(6)

5.1.1 Generative versus discriminative models

129

(2)

5.1.2 Text models

131

(2)

5.1.3 Estimation, prediction and smoothing

133

(1)

5.1.4 Terminology and notations

134

(1)

5.2 Topic-based models

135

(7)

5.2.1 Fundamental principles

135

(1)

5.2.2 Illustration

136

(2)

5.2.3 General framework

138

(1)

5.2.4 Geometric interpretation

139

(2)

5.2.5 Application to text categorization

141

(1)

5.3 Topic models

142

(19)

5.3.1 Probabilistic Latent Semantic Indexing

143

(3)

5.3.2 Latent Dirichlet Allocation

146

(14)

5.3.3 Conclusion

160

(1)

5.4 Term models

161

(3)

5.4.1 Limitations of the multinomial

161

(1)

5.4.2 Dirichlet compound multinomial

162

(1)

5.4.3 DCM-LDA

163

(1)

5.5 Similarity measures between documents

164

(4)

5.5.1 Language models

165

(1)

5.5.2 Similarity between topic distributions

165

(1)

5.5.3 Fisher kernels

166

(2)

5.6 Conclusion

168

(1)

5.7 Appendix: topic model software

169

(1)

5.8 Bibliography

170

(9)

Chapter 6 Conditional Random Fields for Information Extraction

179

(42)

Isabelle Tellier

Marc Tommasi

6.1 Introduction

179

(1)

6.2 Information extraction

180

(4)

6.2.1 The task

180

(2)

6.2.2 Variants

182

(1)

6.2.3 Evaluations

182

(1)

6.2.4 Approaches not based on machine learning

183

(1)

6.3 Machine learning for information extraction

184

(3)

6.3.1 Usage and limitations

184

(1)

6.3.2 Some applicable machine learning methods

185

(1)

6.3.3 Annotating to extract

186

(1)

6.4 Introduction to conditional random fields

187

(6)

6.4.1 Formalization of a labelling problem

187

(1)

6.4.2 Maximum entropy model approach

188

(2)

6.4.3 Hidden Markov model approach

190

(1)

6.4.4 Graphical models

191

(2)

6.5 Conditional random fields

193

(10)

6.5.1 Definition

193

(2)

6.5.2 Factorization and graphical models

195

(1)

6.5.3 Junction tree

196

(2)

6.5.4 Inference in CRFs

198

(2)

6.5.5 Inference algorithms

200

(1)

6.5.6 Training CRFs

201

(2)

6.6 Conditional random fields and their applications

203

(11)

6.6.1 Linear conditional random fields

204

(1)

6.6.2 Links between linear CRFs and hidden Markov models

205

(3)

6.6.3 Interests and applications of CRFs

208

(2)

6.6.4 Beyond linear CRFs

210

(1)

6.6.5 Existing libraries

211

(3)

6.7 Conclusion

214

(1)

6.8 Bibliography

215

(6)

Part 3 MULTILINGUALISM

221

(84)

Chapter 7 Statistical Methods for Machine Translation

223

(82)

Alexandre Allauzen

Francois Yvon

7.1 Introduction

223

(4)

7.1.1 Machine translation in the age of the Internet

223

(3)

7.1.2 Organization of the
Chapter

226

(1)

7.1.3 Terminological remarks

227

(1)

7.2 Probabilistic machine translation: an overview

227

(8)

7.2.1 Statistical machine translation: the standard model

228

(2)

7.2.2 Word-based models and their limitations

230

(4)

7.2.3 Phrase-based models

234

(1)

7.3 Phrase-based models

235

(15)

7.3.1 Building word alignments

237

(8)

7.3.2 Word alignment models: a summary

245

(1)

7.3.3 Extracting bisegments

246

(4)

7.4 Modeling reorderings

250

(9)

7.4.1 The space of possible reorderings

250

(5)

7.4.2 Evaluating permutations

255

(4)

7.5 Translation: a search problem

259

(13)

7.5.1 Combining models

259

(2)

7.5.2 The decoding problem

261

(1)

7.5.3 Exact search algorithms

262

(5)

7.5.4 Heuristic search algorithms

267

(5)

7.5.5 Decoding: a solved problem?

272

(1)

7.6 Evaluating machine translation

272

(7)

7.6.1 Subjective evaluations

273

(2)

7.6.2 The BLEU metric

275

(2)

7.6.3 Alternatives to BLEU

277

(2)

7.6.4 Evaluating machine translation: an open problem

279

(1)

7.7 State-of-the-art and recent developments

279

(8)

7.7.1 Using source context

279

(2)

7.7.2 Hierarchical models

281

(2)

7.7.3 Translating with linguistic resources

283

(4)

7.8 Useful resources

287

(2)

7.8.1 Bibliographic data and online resources

288

(1)

7.8.2 Parallel corpora

288

(1)

7.8.3 Tools for statistical machine translation

288

(1)

7.9 Conclusion

289

(2)

7.10 Acknowledgments

291

(1)

7.11 Bibliography

291

(14)

Part 4 EMERGING APPLICATIONS

305

(64)

Chapter 8 Information Mining: Methods and Interfaces for Accessing Complex Information

307

(30)

Josiane Mothe

Kurt Englmeier

Fionn Murtagh

8.1 Introduction

307

(2)

8.2 The multidimensional visualization of information

309

(11)

8.2.1 Accessing information based on the knowledge of the structured domain

309

(4)

8.2.2 Visualization of a set of documents via their content

313

(4)

8.2.3 OLAP principles applied to document sets

317

(3)

8.3 Domain mapping via social networks

320

(3)

8.4 Analyzing the variability of searches and data merging

323

(4)

8.4.1 Analysis of IR engine results

323

(2)

8.4.2 Use of data unification

325

(2)

8.5 The seven types of evaluation measures used in IR

327

(4)

8.6 Conclusion

331

(1)

8.7 Acknowledgments

332

(1)

8.8 Bibliography

332

(5)

Chapter 9 Opinion Detection as a Topic Classification Problem

337

(32)

Juan-Manuel Torres-Moreno

Marc El-Beze

Patrice Bellot

Frederic Bechet

9.1 Introduction

337

(2)

9.2 The TREC and TAC evaluation campaigns

339

(8)

9.2.1 Opinion detection by question-answering

340

(2)

9.2.2 Automatic summarization of opinions

342

(1)

9.2.3 The text mining challenge of opinion classification (DEFT (DEfi Fouille de Textes))

343

(4)

9.3 Cosine weights - a second glance

347

(1)

9.4 Which components for a opinion vectors?

348

(4)

9.4.1 How to pass from words to terms?

349

(3)

9.5 Experiments

352

(5)

9.5.1 Performance, analysis, and visualization of the results on the IMDB corpus

354

(3)

9.6 Extracting opinions from speech: automatic analysis of phone polls

357

(8)

9.6.1 France Telecom opinion investigation corpus

358

(2)

9.6.2 Automatic recognition of spontaneous speech in opinion corpora

360

(3)

9.6.3 Evaluation

363

(2)

9.7 Conclusion

365

(1)

9.8 Bibliography

366

(3)

Appendix A Probabilistic Models: An Introduction

369

(54)

Francois Yvon

A.1 Introduction

369

(1)

A.2 Supervised categorization

370

(14)

A.2.1 Filtering documents

370

(2)

A.2.2 The Bernoulli model

372

(4)

A.2.3 The multinomial model

376

(3)

A.2.4 Evaluating categorization systems

379

(1)

A.2.5 Extensions

380

(3)

A.2.6 A first summary

383

(1)

A.3 Unsupervised learning: the multinomial mixture model

384

(7)

A.3.1 Mixture models

384

(2)

A.3.2 Parameter estimation

386

(4)

A.3.3 Applications

390

(1)

A.4 Markov models: statistical models for sequences

391

(6)

A.4.1 Modeling sequences

391

(3)

A.4.2 Estimating a Markov model

394

(1)

A.4.3 Language models

395

(2)

A.5 Hidden Markov models

397

(13)

A.5.1 The model

398

(1)

A.5.2 Algorithms for hidden Markov models

399

(11)

A.6 Conclusion

410

(1)

A.7 A primer of probability theory

411

(9)

A.7.1 Probability space, event

411

(1)

A.7.2 Conditional independence and probability

412

(1)

A.7.3 Random variables, moments

413

(5)

A.7.4 Some useful distributions

418

(2)

A.8 Bibliography

420

(3)

List of Authors

423

(2)

Index

425

Eric Gaussier is deputy director of the Grenoble Informatics Laboratory, one of the largest Computer Science laboratories in France.

Franēois Yvon is professor of Computer Science at the University of Paris Sud in Orsay and member of the Spoken Language Processing group of LIMSI/CNRS, Paris, France.

Püsilink: https://www.kriso.ee/db/9781118562796_pe.html

Märksõnad:

E-raamat: Textual Information Access - Statistical Models: Statistical Models [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv