Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining [Pehme köide]

3.71/5 (10 hinnangut Goodreads-ist)

Sean Massung, ChengXiang Zhai

Formaat: Paperback / softback, 530 pages, kõrgus x laius x paksus: 234x190x27 mm, kaal: 1055 g
Sari: ACM Books
Ilmumisaeg: 30-Jun-2016
Kirjastus: Morgan & Claypool Publishers
ISBN-10: 197000116X
ISBN-13: 9781970001167

Teised raamatud teemal:

Information retrieval
Databases - (Hetkel poes: 1 nimetust)
Information technology: general issues - (Hetkel poes: 1 nimetust)

Pehme köide
Hind: 118,90 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 530 pages, kõrgus x laius x paksus: 234x190x27 mm, kaal: 1055 g
Sari: ACM Books
Ilmumisaeg: 30-Jun-2016
Kirjastus: Morgan & Claypool Publishers
ISBN-10: 197000116X
ISBN-13: 9781970001167

Teised raamatud teemal:

Information retrieval
Databases - (Hetkel poes: 1 nimetust)
Information technology: general issues - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9781970001167.html

Märksõnad:

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people analyze and manage vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and are accompanied by semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to analysis and management of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic.

This book provides a systematic introduction to all these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. The focus is on text mining applications that can help users analyze patterns in text data to extract and reveal useful knowledge. Information retrieval systems, including search engines and recommender systems, are also covered as supporting technology for text mining applications. The book covers the major concepts, techniques, and ideas in text data mining and information retrieval from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of text mining and information retrieval to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. The book can be used as a textbook for a computer science undergraduate course or a reference book for practitioners working on relevant problems in analyzing and managing text data.

Preface

Acknowledgments

xviii

PART I OVERVIEW AND BACKGROUND

(70)

Chapter 1 Introduction

(18)

1.1 Functions of Text Information Systems

(3)

1.2 Conceptual Framework for Text Information Systems

(3)

1.3 Organization of the Book

(2)

1.4 How to Use this Book

(6)

Bibliographic Notes and Further Reading

(3)

Chapter 2 Background

(18)

2.1 Basics of Probability and Statistics

(10)

2.2 Information Theory

(3)

2.3 Machine Learning

(5)

Bibliographic Notes and Further Reading

(1)

Exercises

(2)

Chapter 3 Text Data Understanding

(18)

3.1 History and State of the Art in NLP

(1)

3.2 NLP and Text Information Systems

(3)

3.3 Text Representation

(4)

3.4 Statistical Language Models

(7)

Bibliographic Notes and Further Reading

(1)

Exercises

(2)

Chapter 4 MeTA: A Unified Toolkit for Text Data Management and Analysis

(14)

4.1 Design Philosophy

(1)

4.2 Setting up MeTA

(1)

4.3 Architecture

(1)

4.4 Tokenization with MeTA

(3)

4.5 Related Toolkits

(7)

Exercises

(6)

PART II TEXT DATA ACCESS

(168)

Chapter 5 Overview of Text Data Access

(14)

5.1 Access Mode: Pull vs. Push

(3)

5.2 Multimode Interactive Access

(2)

5.3 Text Retrieval

(2)

5.4 Text Retrieval vs. Database Retrieval

(2)

5.5 Document Selection vs. Document Ranking

(5)

Bibliographic Notes and Further Reading

(1)

Exercises

(2)

Chapter 6 Retrieval Models

(46)

6.1 Overview

(1)

6.2 Common Form of a Retrieval Function

(2)

6.3 Vector Space Retrieval Models

(20)

6.4 Probabilistic Retrieval Models

110

(23)

Bibliographic Notes and Further Reading

128

(1)

Exercises

129

(4)

Chapter 7 Feedback

133

(14)

7.1 Feedback in the Vector Space Model

135

(3)

7.2 Feedback in Language Models

138

(9)

Bibliographic Notes and Further Reading

144

(1)

Exercises

144

(3)

Chapter 8 Search Engine Implementation

147

(20)

8.1 Tokenizer

148

(2)

8.2 Indexer

150

(3)

8.3 Scorer

153

(4)

8.4 Feedback Implementation

157

(1)

8.5 Compression

158

(4)

8.6 Caching

162

(5)

Bibliographic Notes and Further Reading

165

(1)

Exercises

165

(2)

Chapter 9 Search Engine Evaluation

167

(24)

9.1 Introduction

167

(3)

9.2 Evaluation of Set Retrieval

170

(4)

9.3 Evaluation of a Ranked List

174

(6)

9.4 Evaluation with Multi-level Judgements

180

(3)

9.5 Practical Issues in Evaluation

183

(8)

Bibliographic Notes and Further Reading

187

(1)

Exercises

188

(3)

Chapter 10 Web Search

191

(30)

10.1 Web Crawling

192

(2)

10.2 Web Indexing

194

(6)

10.3 Link Analysis

200

(8)

10.4 Learning to Rank

208

(4)

10.5 The Future of Web Search

212

(9)

Bibliographic Notes and Further Reading

216

(1)

Exercises

216

(5)

Chapter 11 Recommender Systems

221

(18)

11.1 Content-based Recommendation

222

(7)

11.2 Collaborative Filtering

229

(4)

11.3 Evaluation of Recommender Systems

233

(6)

Bibliographic Notes and Further Reading

235

(1)

Exercises

235

(4)

PART III TEXT DATA ANALYSIS

239

(204)

Chapter 12 Overview of Text Data Analysis

241

(10)

12.1 Motivation: Applications of Text Data Analysis

242

(2)

12.2 Text vs. Non-text Data: Humans as Subjective Sensors

244

(2)

12.3 Landscape of text mining tasks

246

(5)

Chapter 13 Word Association Mining

251

(24)

13.1 General idea of word association mining

252

(3)

13.2 Discovery of paradigmatic relations

255

(5)

13.3 Discovery of Syntagmatic Relations

260

(11)

13.4 Evaluation of Word Association Mining

271

(4)

Bibliographic Notes and Further Reading

273

(1)

Exercises

273

(2)

Chapter 14 Text Clustering

275

(24)

14.1 Overview of Clustering Techniques

277

(2)

14.2 Document Clustering

279

(5)

14.3 Term Clustering

284

(10)

14.4 Evaluation of Text Clustering

294

(5)

Bibliographic Notes and Further Reading

296

(1)

Exercises

296

(3)

Chapter 15 Text Categorization

299

(18)

15.1 Introduction

299

(1)

15.2 Overview of Text Categorization Methods

300

(2)

15.3 Text Categorization Problem

302

(2)

15.4 Features for Text Categorization

304

(3)

15.5 Classification Algorithms

307

(6)

15.6 Evaluation of Text Categorization

313

(4)

Bibliographic Notes and Further Reading

315

(1)

Exercises

315

(2)

Chapter 16 Text Summarization

317

(12)

16.1 Overview of Text Summarization Techniques

318

(1)

16.2 Extractive Text Summarization

319

(2)

16.3 Abstractive Text Summarization

321

(3)

16.4 Evaluation of Text Summarization

324

(1)

16.5 Applications of Text Summarization

325

(4)

Bibliographic Notes and Further Reading

327

(1)

Exercises

327

(2)

Chapter 17 Topic Analysis

329

(60)

17.1 Topics as Terms

332

(3)

17.2 Topics as Word Distributions

335

(5)

17.3 Mining One Topic from Text

340

(28)

17.4 Probabilistic Latent Semantic Analysis

368

(9)

17.5 Extension of PLSA and Latent Dirichlet Allocation

377

(6)

17.6 Evaluating Topic Analysis

383

(1)

17.7 Summary of Topic Models

384

(5)

Bibliographic Notes and Further Reading

385

(1)

Exercises

386

(3)

Chapter 18 Opinion Mining and Sentiment Analysis

389

(24)

18.1 Sentiment Classification

393

(3)

18.2 Ordinal Regression

396

(4)

18.3 Latent Aspect Rating Analysis

400

(9)

18.4 Evaluation of Opinion Mining and Sentiment Analysis

409

(4)

Bibliographic Notes and Further Reading

410

(1)

Exercises

410

(3)

Chapter 19 Joint Analysis of Text and Structured Data

413

(30)

19.1 Introduction

413

(4)

19.2 Contextual Text Mining

417

(2)

19.3 Contextual Probabilistic Latent Semantic Analysis

419

(9)

19.4 Topic Analysis with Social Networks as Context

428

(5)

19.5 Topic Analysis with Time Series Context

433

(6)

19.6 Summary

439

(4)

Bibliographic Notes and Further Reading

440

(1)

Exercises

440

(3)

PART IV UNIFIED TEXT DATA MANAGEMENT ANALYSIS SYSTEM

443

(14)

Chapter 20 Toward A Unified System for Text Management and Analysis

445

(12)

20.1 Text Analysis Operators

448

(4)

20.2 System Architecture

452

(1)

20.3 MeTA as a Unified System

453

(4)

Appendix A Bayesian Statistics

457

(8)

A.1 Binomial Estimation and the Beta Distribution

457

(2)

A.2 Pseudo Counts, Smoothing, and Setting Hyperparameters

459

(1)

A.3 Generalizing to a Multinomial Distribution

460

(1)

A.4 The Dirichlet Distribution

461

(2)

A.5 Bayesian Estimate of Multinomial Parameters

463

(1)

A.6 Conclusion

464

(1)

Appendix B Expectation- Maximization

465

(8)

B.1 A Simple Mixture Unigram Language Model

466

(1)

B.2 Maximum Likelihood Estimation

466

(1)

B.3 Incomplete vs. Complete Data

467

(1)

B.4 A Lower Bound of Likelihood

468

(1)

B.5 The General Procedure of EM

469

(4)

Appendix C KL-divergence and Dirichlet Prior Smoothing

473

(4)

C.1 Using KL-divergence for Retrieval

473

(2)

C.2 Using Dirichlet Prior Smoothing

475

(1)

C.3 Computing the Query Model p(w θQ)

475

(2)

References

477

(12)

Index

489

(20)

Authors' Biographies

509

ChengXiang Zhai is a Professor of Computer Science and Willett Faculty Scholar at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and then Senior Research Scientist from 1997-2000. His research interests include information retrieval, text mining, natural language processing, machine learning, biomedical and health informatics, and intelligent education information systems. He has published over 200 research papers in major conferences and journals. He served as an Associate Editor for Information Processing and Management, as an Associate Editor of ACM Transactions on Information Systems, and on the editorial board of Information Retrieval Journal. He was a conference program co-chair of ACM CIKM 2004, NAACL HLT 2007, ACM SIGIR 2009, ECIR 2014, ICTIR 2015, and WWW 2015, and conference general co-chair for ACM CIKM 2016. He is an ACM Distinguished Scientist and a recipient of multiple awards, including the ACM SIGIR 2004 Best Paper Award, the ACM SIGIR 2014 Test of Time Paper Award, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, Microsoft Beyond Search Research Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE).

Sean Massung is a Ph.D. candidate in computer science at the University of Illinois at Urbana-Champaign, where he also received both his B.S. and M.S. degrees. He is a co-founder of META and uses it in all of his research. He has been instructor for CS 225: Data Structures and Programming Principles, CS 410: Text Information Systems, and CS 591txt: Text Mining Seminar. He is included in the 2014 List of Teachers Ranked as Excellent at the University of Illinois and has received an Outstanding Teaching Assistant Award and CS@Illinois Outstanding Research Project Award. He has given talks at Jump Labs Champaign and at UIUC for Data and Information Systems Seminar, Intro to Big Data, and Teaching Assistant Seminar. His research interests include text mining applications in information retrieval, natural language processing, and education.

Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv