Muutke küpsiste eelistusi
  • Formaat - PDF+DRM
  • Hind: 55,56 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Sari: Texts in Computer Science
  • Ilmumisaeg: 07-Sep-2015
  • Kirjastus: Springer London Ltd
  • Keel: eng
  • ISBN-13: 9781447167501

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This successful textbook on predictive text mining offers a unified perspective on a rapidly evolving field, integrating topics spanning the varied disciplines of data science, machine learning, databases, and computational linguistics. Serving also as a practical guide, this unique book provides helpful advice illustrated by examples and case studies. This highly anticipated second edition has been thoroughly revised and expanded with new material on deep learning, graph models, mining social media, errors and pitfalls in big data evaluation, Twitter sentiment analysis, and dependency parsing discussion. The fully updated content also features in-depth discussions on issues of document classification, information retrieval, clustering and organizing documents, information extraction, web-based data-sourcing, and prediction and evaluation. Features: includes chapter summaries and exercises; explores the application of each method; provides several case studies; contains links to free text-mining software.

Arvustused

Fundamentals of predictive text mining is a second edition that is designed as a textbook, with questions and exercises in each chapter. The book can be used with data mining software for hands-on experience for students. The book will be very useful for people planning to go into this field or to learn techniques that could be used in a big data environment. (S. Srinivasan, Computing Reviews, February, 2016)

1 Overview of Text Mining 1(12)
1.1 What's Special About Text Mining?
1(5)
1.1.1 Structured or Unstructured Data'?
2(1)
1.1.2 Is Text Different from Numbers'?
3(3)
1.2 What Types of Problems Can Be Solved9 5
1.3 Document Classification
6(1)
1.4 Information Retrieval
6(1)
1.5 Clustering and Organizing Documents
7(1)
1.6 Information Extraction
8(1)
1.7 Prediction and Evaluation
9(1)
1.8 The Next
Chapters
10(1)
1.9 Summary
11(1)
1.10 Historical and Bibliographical Remarks
11(1)
1.11 Questions and Exercises
12(1)
2 From Textual Information to Numerical Vectors 13(28)
2.1 Collecting Documents
13(2)
2.2 Document Standardization
15(2)
2.3 Tokenization
17(2)
2.4 Lemmatization
19(2)
2.4.1 Inflectional Stemming
19(2)
2.4.2 Stemming to a Root
21(1)
2.5 Vector Generation for Prediction
21(9)
2.5.1 Multiword Features
26(3)
2.5.2 Labels for the Right Answers
29(1)
2.5.3 Feature Selection by Attribute Ranking
29(1)
2.6 Sentence Boundary Determination
30(1)
2.7 Part-of-Speech Tagging
30(2)
2.8 Word Sense Disambiguation
32(1)
2.9 Phrase Recognition
33(1)
2.10 Named Entity Recognition
33(1)
2.11 Parsing
34(1)
2.12 Feature Generation
35(2)
2.13 Summary
37(1)
2.14 Historical and Bibliographical Remarks
37(2)
2.15 Questions and Exercises
39(2)
3 Using Text for Prediction 41(40)
3.1 Recognizing that Documents Fit a Pattern
43(1)
3.2 How Many Documents Are Enough?
44(1)
3.3 Document Classification
45(1)
3.4 Learning to Predict from Text
46(23)
3.4.1 Similarity and Nearest-Neighbor Methods
47(1)
3.4.2 Document Similarity
48(2)
3.4.3 Decision Rules
50(6)
3.4.4 Decision Trees
56(1)
3.4.5 Scoring by Probabilities
57(3)
3.4.6 Linear Scoring Methods
60(9)
3.5 Evaluation of Performance
69(5)
3.5.1 Estimating Current and Future Performance
69(2)
3.5.2 Getting the Most from a Learning Method
71(1)
3.5.3 Errors and Pitfalls in Big Data Evaluation
72(2)
3.6 Applications
74(1)
3.7 Graph Models for Social Networks
74(2)
3.8 Summary
76(1)
3.9 Historical and Bibliographical Remarks
77(2)
3.10 Questions and Exercises
79(2)
4 Information Retrieval and Text Mining 81(16)
4.1 Is Information Retrieval a Form of Text Mining?
81(1)
4.2 Key Word Search
82(1)
4.3 Nearest-Neighbor Methods
83(1)
4.4 Measuring Similarity
84(3)
4.4.1 Shared Word Count
84(1)
4.4.2 Word Count and Bonus
85(1)
4.4.3 Cosine Similarity
86(1)
4.5 Web-Based Document Search
87(4)
4.5.1 Link Analysis
88(3)
4.6 Document Matching
91(1)
4.7 Inverted Lists
92(1)
4.8 Evaluation of Performance
93(1)
4.9 Summary
94(1)
4.10 Historical and Bibliographical Remarks
95(1)
4.11 Questions and Exercises
95(2)
5 Finding Structure in a Document Collection 97(22)
5.1 Clustering Documents by Similarity
99(1)
5.2 Similarity of Composite Documents
100(12)
5.2.1 k-Means Clustering
102(4)
5.2.2 Hierarchical Clustering
106(2)
5.2.3 The EM Algorithm
108(4)
5.3 What Do a Cluster's Labels Mean?
112(1)
5.4 Applications
113(1)
5.5 Evaluation of Performance
114(2)
5.6 Summary
116(1)
5.7 Historical and Bibliographical Remarks
116(2)
5.8 Questions and Exercises
118(1)
6 Looking for Information in Documents 119(28)
6.1 Goals of Information Extraction
119(2)
6.2 Finding Patterns and Entities from Text
121(14)
6.2.1 Entity Extraction as Sequential Tagging
122(1)
6.2.2 Tag Prediction as Classification
123(1)
6.2.3 The Maximum Entropy Method
124(5)
6.2.4 Linguistic Features and Encoding
129(1)
6.2.5 Local Sequence Prediction Models
130(4)
6.2.6 Global Sequence Prediction Models
134(1)
6.3 Coreference and Relationship Extraction
135(4)
6.3.1 Coreference Resolution
135(3)
6.3.2 Relationship Extraction
138(1)
6.4 Template Filling and Database Construction
139(1)
6.5 Applications
140(3)
6.5.1 Information Retrieval
140(1)
6.5.2 Commercial Extraction Systems
140(1)
6.5.3 Criminal Justice
141(1)
6.5.4 Intelligence
142(1)
6.6 Summary
143(1)
6.7 Historical and Bibliographical Remarks
143(2)
6.8 Questions and Exercises
145(2)
7 Data Sources for Prediction: Databases, Hybrid Data and the Web 147(18)
7.1 Ideal Models of Data
147(3)
7.1.1 Ideal Data for Prediction
147(1)
7.1.2 Ideal Data for Text and Unstructured Data
148(1)
7.1.3 Hybrid and Mixed Data
148(2)
7.2 Practical Data Sourcing
150(1)
7.3 Prototypical Examples
151(7)
7.3.1 Web-Based Spreadsheet Data
152(1)
7.3.2 Web-Based XML Data
152(1)
7.3.3 Opinion Data and Sentiment Analysis
153(5)
7.4 Hybrid Example: Independent Sources of Numerical and Text Data
158(1)
7.5 Mixed Data in Standard Table Format
159(1)
7.6 Summary
160(2)
7.7 Historical and Bibliographical Remarks
162(1)
7.8 Questions and Exercises
162(3)
8 Case Studies 165(38)
8.1 Market Intelligence from the Web
165(5)
8.1.1 The Problem
165(1)
8.1.2 Solution Overview
166(1)
8.1.3 Methods and Procedures
167(1)
8.1.4 System Deployment
168(2)
8.2 Lightweight Document Matching for Digital Libraries
170(3)
8.2.1 The Problem
170(1)
8.2.2 Solution Overview
170(1)
8.2.3 Methods and Procedures
171(2)
8.2.4 System Deployment
173(1)
8.3 Generating Model Cases for Help Desk Applications
173(4)
8.3.1 The Problem
173(1)
8.3.2 Solution Overview
174(1)
8.3.3 Methods and Procedures
174(2)
8.3.4 System Deployment
176(1)
8.4 Assigning Topics to News Articles
177(5)
8.4.1 The Problem
177(1)
8.4.2 Solution Overview
177(1)
8.4.3 Methods and Procedures
178(4)
8.4.4 System Deployment
182(1)
8.5 E-mail Filtering
182(4)
8.5.1 The Problem
182(1)
8.5.2 Solution Overview
183(1)
8.5.3 Methods and Procedures
184(1)
8.5.4 System Deployment
185(1)
8.6 Search Engines
186(4)
8.6.1 The Problem
186(1)
8.6.2 Solution Overview
186(1)
8.6.3 Methods and Procedures
187(1)
8.6.4 System Deployment
188(2)
8.7 Extracting Named Entities from Documents
190(4)
8.7.1 The Problem
190(1)
8.7.2 Solution Overview
190(1)
8.7.3 Methods and Procedures
191(2)
8.7.4 System Deployment
193(1)
8.8 Mining Social Media
194(3)
8.8.1 The Problem
194(1)
8.8.2 Solution Overview
195(1)
8.8.3 Methods and Procedures
196(1)
8.8.4 System Deployment
197(1)
8.9 Customized Newspapers
197(3)
8.9.1 The Problem
197(1)
8.9.2 Solution Overview
198(1)
8.9.3 Methods and Procedures
198(1)
8.9.4 System Deployment
199(1)
8.10 Summary
200(1)
8.11 Historical and Bibliographical Remarks
200(1)
8.12 Questions and Exercises
201(2)
9 Emerging Directions 203(20)
9.1 Summarization
203(3)
9.2 Active Learning
206(1)
9.3 Learning with Unlabeled Data
207(1)
9.4 Different Ways of Collecting Samples
208(7)
9.4.1 Ensembles and Voting Methods
208(2)
9.4.2 Online Learning
210(1)
9.4.3 Deep Learning
211(3)
9.4.4 Cost-Sensitive Learning
214(1)
9.4.5 Unbalanced Samples and Rare Events
214(1)
9.5 Distributed Text Mining
215(2)
9.6 Learning to Rank
217(1)
9.7 Question Answering
218(1)
9.8 Summary
219(1)
9.9 Historical and Bibliographical Remarks
219(3)
9.10 Questions and Exercises
222(1)
References 223(8)
Author Index 231(4)
Subject Index 235
Dr. Sholom M. Weiss is a Professor Emeritus of Computer Science at Rutgers University, a Fellow of the Association for the Advancement of Artificial Intelligence, and co-founder of AI Data-Miner LLC, New York.

Dr. Nitin Indurkhya is faculty member at the School of Computer Science and Engineering, University of New South Wales, Australia, and the Institute of Statistical Education, Arlington, VA, USA. He is also a co-founder of AI Data-Miner LLC, New York.

Dr. Tong Zhang is a Professor of Statistics and Biostatistics at Rutgers University.