Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Text Mining in Practice with R [Wiley Online]

Ted Kwartler

Formaat: 320 pages
Ilmumisaeg: 21-Jul-2017
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119282101
ISBN-13: 9781119282105

Teised raamatud teemal:

Data mining

Wiley Online
Hind: 84,58 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 320 pages
Ilmumisaeg: 21-Jul-2017
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119282101
ISBN-13: 9781119282105

Teised raamatud teemal:

Data mining

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119282105

A reliable, cost-effective approach to extracting priceless business information from all sources of text

Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information. This book takes a practical, hands-on approach to teaching you a reliable, cost-effective approach to mining the vast, untold riches buried within all forms of text using R.

Author Ted Kwartler clearly describes all of the tools needed to perform text mining and shows you how to use them to identify practical business applications to get your creative text mining efforts started right away. With the help of numerous real-world examples and case studies from industries ranging from healthcare to entertainment to telecommunications, he demonstrates how to execute an array of text mining processes and functions, including sentiment scoring, topic modelling, predictive modelling, extracting clickbait from headlines, and more. You’ll learn how to:

Identify actionable social media posts to improve customer service
Use text mining in HR to identify candidate perceptions of an organisation, match job descriptions with resumes, and more
Extract priceless information from virtually all digital and print sources, including the news media, social media sites, PDFs, and even JPEG and GIF image files
Make text mining an integral component of marketing in order to identify brand evangelists, impact customer propensity modelling, and much more

Most companies’ data mining efforts focus almost exclusively on numerical and categorical data, while text remains a largely untapped resource. Especially in a global marketplace where being first to identify and respond to customer needs and expectations imparts an unbeatable competitive advantage, text represents a source of immense potential value. Unfortunately, there is no reliable, cost-effective technology for extracting analytical insights from the huge and ever-growing volume of text available online and other digital sources, as well as from paper documents—until now.

Foreword

1 What is Text Mining?

(16)

1.1 What is it?

(1)

1.1.1 What is Text Mining in Practice?

(1)

1.1.2 Where Does Text Mining Fit?

(1)

1.2 Why We Care About Text Mining

(7)

1.2.1 What Are the Consequences of Ignoring Text?

(2)

1.2.2 What Are the Benefits of Text Mining?

(1)

1.2.3 Setting Expectations: When Text Mining Should (and Should Not) Be Used

(3)

1.3 A Basic Workflow -- How the Process Works

(3)

1.4 What Tools Do I Need to Get Started with This?

(1)

1.5 A Simple Example

(1)

1.6 A Real World Use Case

(2)

1.7 Summary

(2)

2 Basics of Text Mining

(34)

2.1 What is Text Mining in a Practical Sense?

(3)

2.2 Types of Text Mining: Bag of Words

(4)

2.2.1 Types of Text Mining: Syntactic Parsing

(2)

2.3 The Text Mining Process in Context

(1)

2.4 String Manipulation: Number of Characters and Substitutions

(8)

2.4.1 String Manipulations: Paste, Character Splits and Extractions

(4)

2.5 Keyword Scanning

(3)

2.6 String Packages stringr and stringi

(1)

2.7 Preprocessing Steps for Bag of Words Text Mining

(7)

2.8 Spellcheck

(3)

2.9 Frequent Terms and Associations

(2)

2.10 DeltaAssist Wrap Up

(1)

2.11 Summary

(2)

3 Common Text Mining Visualizations

(34)

3.1 A Tale of Two (or Three) Cultures

(2)

3.2 Simple Exploration: Term Frequency, Associations and Word Networks

(14)

3.2.1 Term Frequency

(3)

3.2.2 Word Associations

(2)

3.2.3 Word Networks

(8)

3.3 Simple Word Clusters: Hierarchical Dendrograms

(6)

3.4 Word Clouds: Overused but Effective

(10)

3.4.1 One Corpus Word Clouds

(1)

3.4.2 Comparing and Contrasting Corpora in Word Clouds

(4)

3.4.3 Polarized Tag Plot

(4)

3.5 Summary

(2)

4 Sentiment Scoring

(44)

4.1 What is Sentiment Analysis?

(3)

4.2 Sentiment Scoring: Parlor Trick or Insightful?

(1)

4.3 Polarity: Simple Sentiment Scoring

(14)

4.3.1 Subjectivity Lexicons

(4)

4.3.2 Qdap's Scoring for Positive and Negative Word Choice

(3)

4.3.3 Revisiting Word Clouds -- Sentiment Word Clouds

(7)

4.4 Emoticons -- Dealing with These Perplexing Clues

103

(10)

4.4.1 Symbol-Based Emoticons Native to R

105

(1)

4.4.2 Punctuation Based Emoticons

106

(2)

4.4.3 Emoji

108

(5)

4.5 R's Archived Sentiment Scoring Library

113

(5)

4.6 Sentiment the Tidytext Way

118

(8)

4.7 Airbnb.com Boston Wrap Up

126

(1)

4.8 Summary

126

(3)

5 Hidden Structures: Clustering, String Distance, Text Vectors and Topic Modeling

129

(52)

5.1 What is clustering?

129

(18)

5.1.1 K-Means Clustering

130

(9)

5.1.2 Spherical K-Means Clustering

139

(5)

5.1.3 K-Mediod Clustering

144

(1)

5.1.4 Evaluating the Cluster Approaches

145

(2)

5.2 Calculating and Exploring String Distance

147

(7)

5.2.1 What is String Distance?

148

(3)

5.2.2 Fuzzy Matching -- Amatch, Ain

151

(1)

5.2.3 Similarity Distances -- Stringdist, Stringdistmatrix

152

(2)

5.3 LDA Topic Modeling Explained

154

(15)

5.3.1 Topic Modeling Case Study

156

(2)

5.3.2 LDA and LDAvis

158

(11)

5.4 Text to Vectors using text2vec

169

(10)

5.4.1 Text2vec

171

(8)

5.5 Summary

179

(2)

6 Document Classification: Finding Clickbait from Headlines

181

(28)

6.1 What is Document Classification?

181

(2)

6.2 Clickbait Case Study

183

(24)

6.2.1 Session and Data Set-Up

185

(3)

6.2.2 GLMNet Training

188

(8)

6.2.3 GLMNet Test Predictions

196

(2)

6.2.4 Test Set Evaluation

198

(2)

6.2.5 Finding the Most Impactful Words

200

(6)

6.2.6 Case Study Wrap Up: Model Accuracy and Improving Performance Recommendations

206

(1)

6.3 Summary

207

(2)

7 Predictive Modeling: Using Text for Classifying and Predicting Outcomes

209

(28)

7.1 Classification vs Prediction

209

(1)

7.2 Case Study I: Will This Patient Come Back to the Hospital?

210

(14)

7.2.1 Patient Readmission in the Text Mining Workflow

211

(1)

7.2.2 Session and Data Set-Up

211

(3)

7.2.3 Patient Modeling

214

(2)

7.2.4 More Model KPIs: AUC, Recall, Precision and F1

216

(2)

7.2.4.1 Additional Evaluation Metrics

218

(4)

7.2.5 Apply the Model to New Patients

222

(1)

7.2.6 Patient Readmission Conclusion

223

(1)

7.3 Case Study II: Predicting Box Office Success

224

(12)

7.3.1 Opening Weekend Revenue in the Text Mining Workflow

225

(1)

7.3.2 Session and Data Set-Up

225

(3)

7.3.3 Opening Weekend Modeling

228

(3)

7.3.4 Model Evaluation

231

(3)

7.3.5 Apply the Model to New Movie Reviews

234

(1)

7.3.6 Movie Revenue Conclusion

235

(1)

7.4 Summary

236

(1)

8 The OpenNLP Project

237

(34)

8.1 What is the OpenNLP project?

237

(1)

8.2 R's OpenNLP Package

238

(4)

8.3 Named Entities in Hillary Clinton's Email

242

(13)

8.3.1 R Session Set-Up

243

(2)

8.3.2 Minor Text Cleaning

245

(1)

8.3.3 Using OpenNLP on a single email

246

(5)

8.3.4 Using OpenNLP on Multiple Documents

251

(3)

8.3.5 Revisiting the Text Mining Workflow

254

(1)

8.4 Analyzing the Named Entities

255

(14)

8.4.1 Worldwide Map of Hillary Clinton's Location Mentions

256

(3)

8.4.2 Mapping Only European Locations

259

(3)

8.4.3 Entities and Polarity: How Does Hillary Clinton Feel About an Entity?

262

(4)

8.4.4 Stock Charts for Entities

266

(2)

8.4.5 Reach an Insight or Conclusion About Hillary Clinton's Emails

268

(1)

8.6 Summary

269

(2)

9 Text Sources

271

(34)

9.1 Sourcing Text

271

(1)

9.2 Web Sources

272

(21)

9.2.1 Web Scraping a Single Page with rvest

272

(4)

9.2.2 Web Scraping Multiple Pages with rvest

276

(6)

9.2.3 Application Program Interfaces (APIs)

282

(1)

9.2.4 Newspaper Articles from the Guardian Newspaper

283

(2)

9.2.5 Tweets Using the twitteR Package

285

(2)

9.2.6 Calling an API Without a Dedicated R Package

287

(1)

9.2.7 Using Jsonlite to Access the New York Times

288

(2)

9.2.8 Using RCurl and XML to Parse Google Newsfeeds

290

(2)

9.2.9 The tm Library Web-Mining Plugin

292

(1)

9.3 Getting Text from File Sources

293

(9)

9.3.1 Individual CSV, TXT and Microsoft Office Files

294

(2)

9.3.2 Reading Multiple Files Quickly

296

(2)

9.3.3 Extracting Text from PDFs

298

(1)

9.3.4 Optical Character Recognition: Extracting Text from Images

299

(3)

9.4 Summary

302

(3)

Index

305

TED KWARTLER is a data science instructor at DataCamp.com. He has worked in analytical and executive roles at DataRobot, Liberty Mutual Insurance and Amazon.com.

Püsilink: https://www.kriso.ee/db/9781119282105_pe.html

Märksõnad:

E-raamat: Text Mining in Practice with R [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv