Muutke küpsiste eelistusi

Fundamentals of Predictive Text Mining 2010 [Kõva köide]

  • Formaat: Hardback, 226 pages, kõrgus x laius x paksus: 234x156x14 mm, kaal: 1140 g, biography
  • Sari: Texts in Computer Science v. 41
  • Ilmumisaeg: 16-Jun-2010
  • Kirjastus: Springer London Ltd
  • ISBN-10: 1849962251
  • ISBN-13: 9781849962254
Teised raamatud teemal:
  • Kõva köide
  • Hind: 53,29 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 62,70 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 226 pages, kõrgus x laius x paksus: 234x156x14 mm, kaal: 1140 g, biography
  • Sari: Texts in Computer Science v. 41
  • Ilmumisaeg: 16-Jun-2010
  • Kirjastus: Springer London Ltd
  • ISBN-10: 1849962251
  • ISBN-13: 9781849962254
Teised raamatud teemal:

This is an introductory textbook and guide to the rapidly evolving field of predictive text mining. There are chapter summaries, historical and bibliographic remarks, and classroom-tested exercises for each chapter. Descriptive case studies are also included.



One consequence of the pervasive use of computers is that most documents originate in digital form. Widespread use of the Internet makes them readily available. Text mining - the process of analyzing unstructured natural-language text - is concerned with how to extract information from these documents.Developed from the authors' highly successful Springer reference on text mining, Fundamentals of Predictive Text Mining is an introductory textbook and guide to this rapidly evolving field. Integrating topics spanning the varied disciplines of data mining, machine learning, databases, and computational linguistics, this uniquely useful book also provides practical advice for text mining. In-depth discussions are presented on issues of document classification, information retrieval, clustering and organizing documents, information extraction, web-based data-sourcing, and prediction and evaluation. Background on data mining is beneficial, but not essential. Where advanced concepts are discussed that require mathematical maturity for a proper understanding, intuitive explanations are also provided for less advanced readers.Topics and features: presents a comprehensive, practical and easy-to-read introduction to text mining; includes chapter summaries, useful historical and bibliographic remarks, and classroom-tested exercises for each chapter; explores the application and utility of each method, as well as the optimum techniques for specific scenarios; provides several descriptive case studies that take readers from problem description to systems deployment in the real world; includes access to industrial-strength text-mining software that runs on any computer; describes methods that rely on basic statistical techniques, thus allowing for relevance to all languages (not just English); contains links to free downloadable software and other supplementary instruction material.Fundamentals of Predictive Text Mining is an essential resource for IT professionals and managers, as well as a key text for advanced undergraduate computer science students and beginning graduate students.Dr. Sholom M. Weiss is a Research Staff Member with the IBM Predictive Modeling group, in Yorktown Heights, New York, and Professor Emeritus of Computer Science at Rutgers University. Dr. Nitin Indurkhya is Professor at the School of Computer Science and Engineering, University of New South Wales, Australia, as well as founder and president of data-mining consulting company Data-Miner Pty Ltd. Dr. Tong Zhang is Associate Professor at the Department of Statistics and Biostatistics at Rutgers University, New Jersey.

Arvustused

From the reviews: "This is a practical, up-to-date account of the various techniques for dealing intelligently with free text. It would be an invaluable resource to any advanced undergraduate student interested in information retrieval." (Patrick Oladimeji, Times Higher Education, 26 May 2011) "This is a well-written and interesting text for information technology (IT) professionals and computer science students. It seems to address all of the topics related to the fields that, when integrated, are known as knowledge engineering. ... Without a doubt, the authors' experience in the field makes this book a successful contribution to the literature that targets the interests of the IT community and beyond." (Jolanta Mizera-Pietraszko, ACM Computing Reviews, June, 2011) "This well-written work, which offers a unifying view of text mining through a systematic introduction to solving real-world problems. ... The uniqueness of this book is the recourse to the prediction problem, which, by providing practical advice, allows for the integration of related topics. ... The book is accompanied by a software implementation of the main algorithmic practices introduced. This is the icing on the cake for both beginners and expert readers ... . This is the book ... I have always wanted to read." (Ernesto D'Avenzo, ACM Computing Reviews, August, 2012)

1 Overview of Text Mining 1(12)
1.1 What's Special About Text Mining?
1(4)
1.1.1 Structured or Unstructured Data?
2(1)
1.1.2 Is Text Different from Numbers?
3(2)
1.2 What Types of Problems Can Be Solved?
5(1)
1.3 Document Classification
6(1)
1.4 Information Retrieval
6(1)
1.5 Clustering and Organizing Documents
7(1)
1.6 Information Extraction
8(1)
1.7 Prediction and Evaluation
9(1)
1.8 The Next
Chapters
10(1)
1.9 Summary
10(1)
1.10 Historical and Bibliographical Remarks
11(1)
1.11 Questions and Exercises
12(1)
2 From Textual Information to Numerical Vectors 13(26)
2.1 Collecting Documents
13(2)
2.2 Document Standardization
15(1)
2.3 Tokenization
16(1)
2.4 Lemmatization
17(4)
2.4.1 Inflectional Stemming
19(1)
2.4.2 Stemming to a Root
19(2)
2.5 Vector Generation for Prediction
21(8)
2.5.1 Multiword Features
26(2)
2.5.2 Labels for the Right Answers
28(1)
2.5.3 Feature Selection by Attribute Ranking
29(1)
2.6 Sentence Boundary Determination
29(2)
2.7 Part-of-Speech Tagging
31(1)
2.8 Word Sense Disambiguation
32(1)
2.9 Phrase Recognition
32(1)
2.10 Named Entity Recognition
33(1)
2.11 Parsing
33(2)
2.12 Feature Generation
35(1)
2.13 Summary
36(1)
2.14 Historical and Bibliographical Remarks
36(2)
2.15 Questions and Exercises
38(1)
3 Using Text for Prediction 39(36)
3.1 Recognizing that Documents Fit a Pattern
41(1)
3.2 How Many Documents Are Enough?
42(1)
3.3 Document Classification
43(1)
3.4 Learning to Predict from Text
44(22)
3.4.1 Similarity and Nearest-Neighbor Methods
45(1)
3.4.2 Document Similarity
46(2)
3.4.3 Decision Rules
48(6)
3.4.4 Decision Trees
54(1)
3.4.5 Scoring by Probabilities
55(3)
3.4.6 Linear Scoring Methods
58(8)
3.5 Evaluation of Performance
66(3)
3.5.1 Estimating Current and Future Performance
66(3)
3.5.2 Getting the Most from a Learning Method
69(1)
3.6 Applications
69(1)
3.7 Summary
70(1)
3.8 Historical and Bibliographical Remarks
70(2)
3.9 Questions and Exercises
72(3)
4 Information Retrieval and Text Mining 75(16)
4.1 Is Information Retrieval a Form of Text Mining?
75(1)
4.2 Key Word Search
76(1)
4.3 Nearest-Neighbor Methods
77(1)
4.4 Measuring Similarity
78(2)
4.4.1 Shared Word Count
78(1)
4.4.2 Word Count and Bonus
78(1)
4.4.3 Cosine Similarity
79(1)
4.5 Web-based Document Search
80(5)
4.5.1 Link Analysis
81(4)
4.6 Document Matching
85(1)
4.7 Inverted Lists
85(2)
4.8 Evaluation of Performance
87(1)
4.9 Summary
88(1)
4.10 Historical and Bibliographical Remarks
88(1)
4.11 Questions and Exercises
89(2)
5 Finding Structure in a Document Collection 91(22)
5.1 Clustering Documents by Similarity
93(1)
5.2 Similarity of Composite Documents
94(11)
5.2.1 k-Means Clustering
96(3)
5.2.2 Hierarchical Clustering
99(3)
5.2.3 The EM Algorithm
102(3)
5.3 What Do a Cluster's Labels Mean?
105(2)
5.4 Applications
107(1)
5.5 Evaluation of Performance
108(2)
5.6 Summary
110(1)
5.7 Historical and Bibliographical Remarks
110(1)
5.8 Questions and Exercises
111(2)
6 Looking for Information in Documents 113(28)
6.1 Goals of Information Extraction
113(2)
6.2 Finding Patterns and Entities from Text
115(14)
6.2.1 Entity Extraction as Sequential Tagging
116(1)
6.2.2 Tag Prediction as Classification
117(1)
6.2.3 The Maximum Entropy Method
118(5)
6.2.4 Linguistic Features and Encoding
123(1)
6.2.5 Local Sequence Prediction Models
124(4)
6.2.6 Global Sequence Prediction Models
128(1)
6.3 Coreference and Relationship Extraction
129(3)
6.3.1 Coreference Resolution
129(2)
6.3.2 Relationship Extraction
131(1)
6.4 Template Filling and Database Construction
132(1)
6.5 Applications
133(3)
6.5.1 Information Retrieval
133(1)
6.5.2 Commercial Extraction Systems
134(1)
6.5.3 Criminal Justice
135(1)
6.5.4 Intelligence
135(1)
6.6 Summary
136(1)
6.7 Historical and Bibliographical Remarks
137(1)
6.8 Questions and Exercises
138(3)
7 Data Sources for Prediction: Databases, Hybrid Data and the Web 141(16)
7.1 Ideal Models of Data
141(3)
7.1.1 Ideal Data for Prediction
141(1)
7.1.2 Ideal Data for Text and Unstructured Data
142(1)
7.1.3 Hybrid and Mixed Data
142(2)
7.2 Practical Data Sourcing
144(1)
7.3 Prototypical Examples
145(6)
7.3.1 Web-based Spreadsheet Data
146(1)
7.3.2 Web-based XML Data
146(2)
7.3.3 Opinion Data and Sentiment Analysis
148(3)
7.4 Hybrid Example: Independent Sources of Numerical and Text Data
151(1)
7.5 Mixed Data in Standard Table Format
152(1)
7.6 Summary
153(1)
7.7 Historical and Bibliographical Remarks
154(1)
7.8 Questions and Exercises
154(3)
8 Case Studies 157(32)
8.1 Market Intelligence from the Web
157(4)
8.1.1 The Problem
157(1)
8.1.2 Solution Overview
158(1)
8.1.3 Methods and Procedures
159(1)
8.1.4 System Deployment
160(1)
8.2 Lightweight Document Matching for Digital Libraries
161(4)
8.2.1 The Problem
161(1)
8.2.2 Solution Overview
162(1)
8.2.3 Methods and Procedures
163(1)
8.2.4 System Deployment
164(1)
8.3 Generating Model Cases for Help Desk Applications
165(4)
8.3.1 The Problem
165(1)
8.3.2 Solution Overview
165(1)
8.3.3 Methods and Procedures
166(2)
8.3.4 System Deployment
168(1)
8.4 Assigning Topics to News Articles
169(5)
8.4.1 The Problem
169(1)
8.4.2 Solution Overview
169(1)
8.4.3 Methods and Procedures
169(4)
8.4.4 System Deployment
173(1)
8.5 E-mail Filtering
174(3)
8.5.1 The Problem
174(1)
8.5.2 Solution Overview
174(1)
8.5.3 Methods and Procedures
175(2)
8.5.4 System Deployment
177(1)
8.6 Search Engines
177(4)
8.6.1 The Problem
177(1)
8.6.2 Solution Overview
177(1)
8.6.3 Methods and Procedures
178(1)
8.6.4 System Deployment
179(2)
8.7 Extracting Named Entities from Documents
181(3)
8.7.1 The Problem
181(1)
8.7.2 Solution Overview
181(1)
8.7.3 Methods and Procedures
182(2)
8.7.4 System Deployment
184(1)
8.8 Customized Newspapers
184(3)
8.8.1 The Problem
184(1)
8.8.2 Solution Overview
185(1)
8.8.3 Methods and Procedures
186(1)
8.8.4 System Deployment
187(1)
8.9 Summary
187(1)
8.10 Historical and Bibliographical Remarks
188(1)
8.11 Questions and Exercises
188(1)
9 Emerging Directions 189(18)
9.1 Summarization
189(3)
9.2 Active Learning
192(1)
9.3 Learning with Unlabeled Data
193(1)
9.4 Different Ways of Collecting Samples
194(4)
9.4.1 Ensembles and Voting Methods
194(2)
9.4.2 Online Learning
196(1)
9.4.3 Cost-Sensitive Learning
197(1)
9.4.4 Unbalanced Samples and Rare Events
198(1)
9.5 Distributed Text Mining
198(2)
9.6 Learning to Rank
200(1)
9.7 Question Answering
201(1)
9.8 Summary
202(1)
9.9 Historical and Bibliographical Remarks
203(1)
9.10 Questions and Exercises
204(3)
A Software Notes 207(4)
A.1 Summary of Software
207(1)
A.2 Requirements
208(1)
A.3 Download Instructions
208(3)
References 211(8)
Author Index 219(4)
Subject Index 223
Dr. Sholom M. Weiss is a Research Staff Member with the IBM Predictive Modeling group, in Yorktown Heights, New York, and Professor Emeritus of Computer Science at Rutgers University. Dr. Nitin Indurkhya is Professor at the School of Computer Science and Engineering, University of New South Wales, Australia, as well as founder and president of data-mining consulting company Data-Miner Pty Ltd. Dr. Tong Zhang is Associate Professor at the Department of Statistics and Biostatistics at Rutgers University, New Jersey.