Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Fundamentals of Predictive Text Mining 2010 [Kõva köide]

3.17/5 (22 hinnangut Goodreads-ist)

Tong Zhang, Nitin Indurkhya, Sholom M. Weiss

Teised formaadid

Kõva köide (Hind: 67,23 €) - 14-Sep-2015
Pehme köide (Hind: 77,49 €) - 05-Sep-2012

Formaat: Hardback, 226 pages, kõrgus x laius x paksus: 234x156x14 mm, kaal: 1140 g, biography
Sari: Texts in Computer Science v. 41
Ilmumisaeg: 16-Jun-2010
Kirjastus: Springer London Ltd
ISBN-10: 1849962251
ISBN-13: 9781849962254

Teised raamatud teemal:

Data mining - (Hetkel poes: 1 nimetust)
Computer science - (Hetkel poes: 7 nimetust)
Data warehousing

Kõva köide
Hind: 53,29 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 62,70 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 226 pages, kõrgus x laius x paksus: 234x156x14 mm, kaal: 1140 g, biography
Sari: Texts in Computer Science v. 41
Ilmumisaeg: 16-Jun-2010
Kirjastus: Springer London Ltd
ISBN-10: 1849962251
ISBN-13: 9781849962254

Teised raamatud teemal:

Data mining - (Hetkel poes: 1 nimetust)
Computer science - (Hetkel poes: 7 nimetust)
Data warehousing

Püsilink: https://www.kriso.ee/db/9781849962254.html

Märksõnad:

This is an introductory textbook and guide to the rapidly evolving field of predictive text mining. There are chapter summaries, historical and bibliographic remarks, and classroom-tested exercises for each chapter. Descriptive case studies are also included.

One consequence of the pervasive use of computers is that most documents originate in digital form. Widespread use of the Internet makes them readily available. Text mining - the process of analyzing unstructured natural-language text - is concerned with how to extract information from these documents.Developed from the authors' highly successful Springer reference on text mining, Fundamentals of Predictive Text Mining is an introductory textbook and guide to this rapidly evolving field. Integrating topics spanning the varied disciplines of data mining, machine learning, databases, and computational linguistics, this uniquely useful book also provides practical advice for text mining. In-depth discussions are presented on issues of document classification, information retrieval, clustering and organizing documents, information extraction, web-based data-sourcing, and prediction and evaluation. Background on data mining is beneficial, but not essential. Where advanced concepts are discussed that require mathematical maturity for a proper understanding, intuitive explanations are also provided for less advanced readers.Topics and features: presents a comprehensive, practical and easy-to-read introduction to text mining; includes chapter summaries, useful historical and bibliographic remarks, and classroom-tested exercises for each chapter; explores the application and utility of each method, as well as the optimum techniques for specific scenarios; provides several descriptive case studies that take readers from problem description to systems deployment in the real world; includes access to industrial-strength text-mining software that runs on any computer; describes methods that rely on basic statistical techniques, thus allowing for relevance to all languages (not just English); contains links to free downloadable software and other supplementary instruction material.Fundamentals of Predictive Text Mining is an essential resource for IT professionals and managers, as well as a key text for advanced undergraduate computer science students and beginning graduate students.Dr. Sholom M. Weiss is a Research Staff Member with the IBM Predictive Modeling group, in Yorktown Heights, New York, and Professor Emeritus of Computer Science at Rutgers University. Dr. Nitin Indurkhya is Professor at the School of Computer Science and Engineering, University of New South Wales, Australia, as well as founder and president of data-mining consulting company Data-Miner Pty Ltd. Dr. Tong Zhang is Associate Professor at the Department of Statistics and Biostatistics at Rutgers University, New Jersey.

Arvustused

From the reviews: "This is a practical, up-to-date account of the various techniques for dealing intelligently with free text. It would be an invaluable resource to any advanced undergraduate student interested in information retrieval." (Patrick Oladimeji, Times Higher Education, 26 May 2011) "This is a well-written and interesting text for information technology (IT) professionals and computer science students. It seems to address all of the topics related to the fields that, when integrated, are known as knowledge engineering. ... Without a doubt, the authors' experience in the field makes this book a successful contribution to the literature that targets the interests of the IT community and beyond." (Jolanta Mizera-Pietraszko, ACM Computing Reviews, June, 2011) "This well-written work, which offers a unifying view of text mining through a systematic introduction to solving real-world problems. ... The uniqueness of this book is the recourse to the prediction problem, which, by providing practical advice, allows for the integration of related topics. ... The book is accompanied by a software implementation of the main algorithmic practices introduced. This is the icing on the cake for both beginners and expert readers ... . This is the book ... I have always wanted to read." (Ernesto D'Avenzo, ACM Computing Reviews, August, 2012)

1 Overview of Text Mining

(12)

1.1 What's Special About Text Mining?

(4)

1.1.1 Structured or Unstructured Data?

(1)

1.1.2 Is Text Different from Numbers?

(2)

1.2 What Types of Problems Can Be Solved?

(1)

1.3 Document Classification

(1)

1.4 Information Retrieval

(1)

1.5 Clustering and Organizing Documents

(1)

1.6 Information Extraction

(1)

1.7 Prediction and Evaluation

(1)

1.8 The Next
Chapters

(1)

1.9 Summary

(1)

1.10 Historical and Bibliographical Remarks

(1)

1.11 Questions and Exercises

(1)

2 From Textual Information to Numerical Vectors

(26)

2.1 Collecting Documents

(2)

2.2 Document Standardization

(1)

2.3 Tokenization

(1)

2.4 Lemmatization

(4)

2.4.1 Inflectional Stemming

(1)

2.4.2 Stemming to a Root

(2)

2.5 Vector Generation for Prediction

(8)

2.5.1 Multiword Features

(2)

2.5.2 Labels for the Right Answers

(1)

2.5.3 Feature Selection by Attribute Ranking

(1)

2.6 Sentence Boundary Determination

(2)

2.7 Part-of-Speech Tagging

(1)

2.8 Word Sense Disambiguation

(1)

2.9 Phrase Recognition

(1)

2.10 Named Entity Recognition

(1)

2.11 Parsing

(2)

2.12 Feature Generation

(1)

2.13 Summary

(1)

2.14 Historical and Bibliographical Remarks

(2)

2.15 Questions and Exercises

(1)

3 Using Text for Prediction

(36)

3.1 Recognizing that Documents Fit a Pattern

(1)

3.2 How Many Documents Are Enough?

(1)

3.3 Document Classification

(1)

3.4 Learning to Predict from Text

(22)

3.4.1 Similarity and Nearest-Neighbor Methods

(1)

3.4.2 Document Similarity

(2)

3.4.3 Decision Rules

(6)

3.4.4 Decision Trees

(1)

3.4.5 Scoring by Probabilities

(3)

3.4.6 Linear Scoring Methods

(8)

3.5 Evaluation of Performance

(3)

3.5.1 Estimating Current and Future Performance

(3)

3.5.2 Getting the Most from a Learning Method

(1)

3.6 Applications

(1)

3.7 Summary

(1)

3.8 Historical and Bibliographical Remarks

(2)

3.9 Questions and Exercises

(3)

4 Information Retrieval and Text Mining

(16)

4.1 Is Information Retrieval a Form of Text Mining?

(1)

4.2 Key Word Search

(1)

4.3 Nearest-Neighbor Methods

(1)

4.4 Measuring Similarity

(2)

4.4.1 Shared Word Count

(1)

4.4.2 Word Count and Bonus

(1)

4.4.3 Cosine Similarity

(1)

4.5 Web-based Document Search

(5)

4.5.1 Link Analysis

(4)

4.6 Document Matching

(1)

4.7 Inverted Lists

(2)

4.8 Evaluation of Performance

(1)

4.9 Summary

(1)

4.10 Historical and Bibliographical Remarks

(1)

4.11 Questions and Exercises

(2)

5 Finding Structure in a Document Collection

(22)

5.1 Clustering Documents by Similarity

(1)

5.2 Similarity of Composite Documents

(11)

5.2.1 k-Means Clustering

(3)

5.2.2 Hierarchical Clustering

(3)

5.2.3 The EM Algorithm

102

(3)

5.3 What Do a Cluster's Labels Mean?

105

(2)

5.4 Applications

107

(1)

5.5 Evaluation of Performance

108

(2)

5.6 Summary

110

(1)

5.7 Historical and Bibliographical Remarks

110

(1)

5.8 Questions and Exercises

111

(2)

6 Looking for Information in Documents

113

(28)

6.1 Goals of Information Extraction

113

(2)

6.2 Finding Patterns and Entities from Text

115

(14)

6.2.1 Entity Extraction as Sequential Tagging

116

(1)

6.2.2 Tag Prediction as Classification

117

(1)

6.2.3 The Maximum Entropy Method

118

(5)

6.2.4 Linguistic Features and Encoding

123

(1)

6.2.5 Local Sequence Prediction Models

124

(4)

6.2.6 Global Sequence Prediction Models

128

(1)

6.3 Coreference and Relationship Extraction

129

(3)

6.3.1 Coreference Resolution

129

(2)

6.3.2 Relationship Extraction

131

(1)

6.4 Template Filling and Database Construction

132

(1)

6.5 Applications

133

(3)

6.5.1 Information Retrieval

133

(1)

6.5.2 Commercial Extraction Systems

134

(1)

6.5.3 Criminal Justice

135

(1)

6.5.4 Intelligence

135

(1)

6.6 Summary

136

(1)

6.7 Historical and Bibliographical Remarks

137

(1)

6.8 Questions and Exercises

138

(3)

7 Data Sources for Prediction: Databases, Hybrid Data and the Web

141

(16)

7.1 Ideal Models of Data

141

(3)

7.1.1 Ideal Data for Prediction

141

(1)

7.1.2 Ideal Data for Text and Unstructured Data

142

(1)

7.1.3 Hybrid and Mixed Data

142

(2)

7.2 Practical Data Sourcing

144

(1)

7.3 Prototypical Examples

145

(6)

7.3.1 Web-based Spreadsheet Data

146

(1)

7.3.2 Web-based XML Data

146

(2)

7.3.3 Opinion Data and Sentiment Analysis

148

(3)

7.4 Hybrid Example: Independent Sources of Numerical and Text Data

151

(1)

7.5 Mixed Data in Standard Table Format

152

(1)

7.6 Summary

153

(1)

7.7 Historical and Bibliographical Remarks

154

(1)

7.8 Questions and Exercises

154

(3)

8 Case Studies

157

(32)

8.1 Market Intelligence from the Web

157

(4)

8.1.1 The Problem

157

(1)

8.1.2 Solution Overview

158

(1)

8.1.3 Methods and Procedures

159

(1)

8.1.4 System Deployment

160

(1)

8.2 Lightweight Document Matching for Digital Libraries

161

(4)

8.2.1 The Problem

161

(1)

8.2.2 Solution Overview

162

(1)

8.2.3 Methods and Procedures

163

(1)

8.2.4 System Deployment

164

(1)

8.3 Generating Model Cases for Help Desk Applications

165

(4)

8.3.1 The Problem

165

(1)

8.3.2 Solution Overview

165

(1)

8.3.3 Methods and Procedures

166

(2)

8.3.4 System Deployment

168

(1)

8.4 Assigning Topics to News Articles

169

(5)

8.4.1 The Problem

169

(1)

8.4.2 Solution Overview

169

(1)

8.4.3 Methods and Procedures

169

(4)

8.4.4 System Deployment

173

(1)

8.5 E-mail Filtering

174

(3)

8.5.1 The Problem

174

(1)

8.5.2 Solution Overview

174

(1)

8.5.3 Methods and Procedures

175

(2)

8.5.4 System Deployment

177

(1)

8.6 Search Engines

177

(4)

8.6.1 The Problem

177

(1)

8.6.2 Solution Overview

177

(1)

8.6.3 Methods and Procedures

178

(1)

8.6.4 System Deployment

179

(2)

8.7 Extracting Named Entities from Documents

181

(3)

8.7.1 The Problem

181

(1)

8.7.2 Solution Overview

181

(1)

8.7.3 Methods and Procedures

182

(2)

8.7.4 System Deployment

184

(1)

8.8 Customized Newspapers

184

(3)

8.8.1 The Problem

184

(1)

8.8.2 Solution Overview

185

(1)

8.8.3 Methods and Procedures

186

(1)

8.8.4 System Deployment

187

(1)

8.9 Summary

187

(1)

8.10 Historical and Bibliographical Remarks

188

(1)

8.11 Questions and Exercises

188

(1)

9 Emerging Directions

189

(18)

9.1 Summarization

189

(3)

9.2 Active Learning

192

(1)

9.3 Learning with Unlabeled Data

193

(1)

9.4 Different Ways of Collecting Samples

194

(4)

9.4.1 Ensembles and Voting Methods

194

(2)

9.4.2 Online Learning

196

(1)

9.4.3 Cost-Sensitive Learning

197

(1)

9.4.4 Unbalanced Samples and Rare Events

198

(1)

9.5 Distributed Text Mining

198

(2)

9.6 Learning to Rank

200

(1)

9.7 Question Answering

201

(1)

9.8 Summary

202

(1)

9.9 Historical and Bibliographical Remarks

203

(1)

9.10 Questions and Exercises

204

(3)

A Software Notes

207

(4)

A.1 Summary of Software

207

(1)

A.2 Requirements

208

(1)

A.3 Download Instructions

208

(3)

References

211

(8)

Author Index

219

(4)

Subject Index

223

Dr. Sholom M. Weiss is a Research Staff Member with the IBM Predictive Modeling group, in Yorktown Heights, New York, and Professor Emeritus of Computer Science at Rutgers University. Dr. Nitin Indurkhya is Professor at the School of Computer Science and Engineering, University of New South Wales, Australia, as well as founder and president of data-mining consulting company Data-Miner Pty Ltd. Dr. Tong Zhang is Associate Professor at the Department of Statistics and Biostatistics at Rutgers University, New Jersey.

Fundamentals of Predictive Text Mining 2010 [Kõva köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv