Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Practical Text Mining with Perl

4.11/5 (17 hinnangut Goodreads-ist)

Roger Bilisoly (Central Connecticut State University)

Teised formaadid

Other digital carrier (Hind: 107,42 €) - 07-Feb-2008

Formaat: PDF+DRM
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 28-Aug-2008
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9780470382851

Teised raamatud teemal:

Data mining

Formaat - PDF+DRM
Hind: 137,02 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
Raamatukogudele

Formaat: PDF+DRM
Sari: Wiley Series on Methods and Applications in Data Mining
Ilmumisaeg: 28-Aug-2008
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9780470382851

Teised raamatud teemal:

Data mining

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Bilisoly (Central Connecticut State U.) has written this introductory guide to text mining through the use of Perl, an open-source programming tool that can be downloaded from online sources at no cost to the user. By covering such basics as regular expressions, text pattern methodology and quantitative text summaries, the author provides a tutorial to efficient and thorough text mining applications, including the bags-of-words model, TF-IDF similarity measure, concordance lines and corpus linguistics. Designed primarily for text mining students and professionals who wish to enhance their information access, this book also explores the use of multivariate techniques such as correlation, principal components analysis and clustering. Annotation ©2008 Book News, Inc., Portland, OR (booknews.com)

Provides readers with the methods, algorithms, and means to perform text mining tasks

This book is devoted to the fundamentals of text mining using Perl, an open-source programming tool that is freely available via the Internet (www.perl.org). It covers mining ideas from several perspectives--statistics, data mining, linguistics, and information retrieval--and provides readers with the means to successfully complete text mining tasks on their own.

The book begins with an introduction to regular expressions, a text pattern methodology, and quantitative text summaries, all of which are fundamental tools of analyzing text. Then, it builds upon this foundation to explore:

Probability and texts, including the bag-of-words model
Information retrieval techniques such as the TF-IDF similarity measure
Concordance lines and corpus linguistics
Multivariate techniques such as correlation, principal components analysis, and clustering
Perl modules, German, and permutation tests

Each chapter is devoted to a single key topic, and the author carefully and thoughtfully introduces mathematical concepts as they arise, allowing readers to learn as they go without having to refer to additional books. The inclusion of numerous exercises and worked-out examples further complements the book's student-friendly format.

Practical Text Mining with Perl is ideal as a textbook for undergraduate and graduate courses in text mining and as a reference for a variety of professionals who are interested in extracting information from text documents.

Arvustused

"Practical Text Mining with Perl is an excellent book for readers at a variety of different programming skill levels Bilisoly's book would serve as a good text for an introductory text mining course, and could be supplemented with lecture notes for Web mining or data mining courses." (Journal of Statistical Software, January 2009)

List of Figures

xiii

List of Tables

Preface

xvii

Acknowledgments

xxiii

Introduction

(6)

Overview of this Book

(1)

Text Mining and Related Fields

(3)

Pattern Matching

(1)

Data Structures

(1)

Probability

(1)

Information Retrieval

(1)

Corpus Liguistics

(1)

Multivariate Statistics

(1)

Clustering

(1)

Three Additional Topics

(1)

Advice for Reading this Book

(2)

Text Patterns

(52)

Introduction

(1)

Regular Expressions

(7)

First Regex: Finding the Word Cat

(2)

Character Ranges adn Finding Telephone Numbers

(2)

Testing Regexes with Perl

(3)

Finding Words in a Text

(6)

Regex Summary

(2)

Nineteenth-Century Literature

(1)

Perl Variables and the Function split

(3)

Match Variables

(1)

Decomposing Poe's ``The Tell-Tale Heart'' into Words

(7)

Dashes and String Substitutions

(1)

Hyphens

(3)

Apostrophes

(1)

A Simple Concordance

(6)

Command Line Arugments

(1)

Writing to Files

(1)

First Attempt at Extracting Sentences

(12)

Sentence Segmentation Preliminaries

(2)

Sentence Segmentation for A Christmas Carol

(4)

Leftmost Greediness and Sentence Segmentation

(5)

Regex Odds and Ends

(6)

Match Variables and Backreferences

(1)

Regular Expression Operators and Their Output

(2)

Lookaround

(2)

References

(1)

Problems

(7)

Quantitative Text Summaries

(46)

Introduction

(1)

Scalars, Interpolation, and Context in Perl

(1)

Arrays and Context in Perl

(4)

Word Lenghts in Poe's ``The Tell-Tale Heart''

(2)

Arrays and Functions

(7)

Additing and Removing Entries from Arrays

(3)

Selecting Subsets of an Array

(1)

Sorting an Array

(4)

Hashes

(4)

Using a Hash

(3)

Two Text Applications

(9)

Zipf's Law for a Christmas Carol

(6)

Perl for Word Games

(1)

An Aid to Crossword Puzzles

(1)

Word Anagrams

(1)

Finding Word in a Set of Letters

(1)

Complex Data Structures

(11)

References and Pointers

(3)

Arrays of Arrays and Beyond

(2)

Application: Comparing the Words in Two Poe Stories

(5)

References

(1)

First Transition

(1)

Problems

(8)

Probability and Text Sampling

105

(28)

Introduction

105

(1)

Probability

105

(10)

Probability and Coin Flipping

106

(2)

Probabilities and Texts

108

(1)

Estimating Letter Probabilities for Poe and Dickens

109

(3)

Estimating Letter Bigram Probabilities

112

(3)

Conditional Probability

115

(3)

Independence

117

(1)

Mean and Vairance of Random Variables

118

(5)

Sampling and Error Estimates

120

(3)

The Bag-of-Words Model for Poe's ``The Black Cat''

123

(1)

The Effect of Sample Size

124

(4)

Tokens vs. Types in Poe's ``Hans Pfaall''

124

(4)

References

128

(1)

Problems

129

(4)

Applying Information Retrieval to Text Mining

133

(28)

Introduction

133

(1)

Counting Letters and Words

134

(4)

Counting Letters in Poe with Perl

134

(2)

Counting Pronouns Occuring in Poe

136

(2)

Text Counts and Vectors

138

(5)

Vectors and Angles for Two Poe Stories

139

(1)

Computing Angles Between Vectors

140

(1)

Subroutines in Perl

140

(3)

Computing the Angle between Vectors

143

(1)

The Term-Document Matrix Applied to Poe

143

(4)

Matrix Multiplication

147

(3)

Matrix Multiplications Applied to Poe

148

(2)

Function of Counts

150

(2)

Document Similarity

152

(5)

Inverse Document Frequency

153

(1)

Poe Story Angles Revisited

154

(3)

References

157

(1)

Problems

157

(4)

Concordance Lines and Corpus Lingusitics

161

(30)

Introduction

161

(1)

Sampling

162

(2)

Statistical Survey Sampling

162

(1)

Text Sampling

163

(1)

Corpus as Baseline

164

(5)

Function vs. Content Words in Dickens, London, and Shelley

168

(1)

Concordancing

169

(10)

Sorting Concordance Lines

170

(1)

Code for Sorting Concordance Lines

171

(1)

Applications: Word Usage Differences between London and Shelley

172

(4)

Application: Word Morphology of Adverbs

176

(3)

Collocations and Concordance Lines

179

(6)

More Ways to Sort Concordance Lines

179

(2)

Application: Phrasal Verbs in The Call of the Wild

181

(3)

Grouping Words: Colors in The Call of the Wild

184

(1)

Applications with References

185

(2)

Second Transition

187

(1)

Problems

188

(3)

Multivariate Techniques with Text

191

(28)

Introduction

191

(1)

Basic Statistics

192

(10)

z-Scores Applied to Poe

193

(2)

Word Correlations among Poe's Short Stories

195

(4)

Correlations and Cosines

199

(2)

Correlations and Covariances

201

(1)

Basic linear algebra

202

(3)

2 by 2 Correlation Matrices

202

(3)

Principal Components Analysis

205

(6)

Finding the Principal Components

206

(1)

PCA Applied to the 68 Poe Short Stories

206

(3)

Another PCA Example with Poe's Short Stories

209

(1)

Rotations

209

(2)

Text Applications

211

(1)

A Word on Factor Analysis

211

(1)

Applications and References

211

(1)

Problems

212

(7)

Text Clustering

219

(24)

Introduction

219

(1)

Clustering

220

(15)

Two-Variable Example of k-Means

220

(3)

k-Means with R

223

(1)

He versus She in Poe's Short Stories

224

(5)

Poe Clusters Using Eight Pronouns

229

(1)

Clustering Poe Using Principal Components

230

(4)

Hierarchical Clustering of Poe's Short Stories

234

(1)

A Note on Classification

235

(1)

Decision Trees and Overfitting

236

(1)

References

236

(1)

Last Transition

236

(1)

Problems

236

(7)

A Sample of Additional Topics

243

(16)

Introduction

243

(1)

Perl Modules

243

(5)

Modules for Number Words

244

(1)

The StopWords Module

245

(1)

The Sentence Segmentation Module

245

(2)

An Object-Oriented Module for Tagging

247

(1)

Miscellaneous Modules

248

(1)

Other Languages: Analyzing Goethe in German

248

(3)

Permutation Tests

251

(7)

Runs and Hypothesis Testing

252

(2)

Distribution of Character Names in Dickens and London

254

(4)

References

258

(1)

Appendix A: Overview of Perl for Text Mining

259

(16)

Basic Data Structures

259

(4)

Special Variables and Arrays

262

(1)

Operators

263

(3)

Branching and Looping

266

(4)

A Few Perl Functions

270

(1)

Introduction to Regular Expressions

271

(4)

Appendix B: Summary of R used in this Book

275

(8)

Basics of R

275

(4)

Data Entry

276

(1)

Basic Operators

277

(1)

Matrix Manipulation

278

(1)

This Book's R Code

279

(4)

Refernces

283

(8)

Index

291

Roger Bilisoly, PhD, is an Assistant Professor of Statistics at Central Connecticut State University, where he developed and teaches a new graduate-level course in text mining for the school's data mining program.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97804703828512e.html

Märksõnad:

E-raamat: Practical Text Mining with Perl

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv