|
|
|
|
|
|
|
|
|
|
|
1.1 Overview of this Book. |
|
|
|
1.2 Text Mining and Related Fields. |
|
|
|
1.3 Advice for Reading this Book. |
|
|
|
|
|
|
|
|
|
2.3 Finding Words in a Text. |
|
|
|
2.4 Decomposing Poe's "The Tell-Tale Heart" into Words. |
|
|
|
2.5 A Simple Concordance. |
|
|
|
2.6 First Attempt at Extracting Sentences. |
|
|
|
|
|
|
|
3. Quantitative Text Summaries. |
|
|
|
|
|
3.2 Scalars, Interpolation, and Context in Perl. |
|
|
|
3.3 Arrays and Context in Perl. |
|
|
|
3.4 Word Lengths in Poe's "The Tell-Tale Heart". |
|
|
|
3.5 Arrays and Functions. |
|
|
|
|
|
3.7 Two Text Applications. |
|
|
|
3.8 Complex Data Structures. |
|
|
|
|
|
|
|
4. Probability and Text Sampling. |
|
|
|
|
|
|
|
4.3 Conditioned Probability. |
|
|
|
4.4 Mean and Variance of random Variables. |
|
|
|
4.5 The Bag-of-Words Model for Poe's :The Black Cat". |
|
|
|
4.6 The Effect of Sample Size. |
|
|
|
|
|
5. Applying Information Retrieval to Text Mining. |
|
|
|
|
|
5.2 Counting Letters and Words. |
|
|
|
5.3 Text Counts and Vectors. |
|
|
|
5.4 The Term-Document Matrix Applied to Poe. |
|
|
|
5.5 Matrix Multiplication. |
|
|
|
|
|
|
|
|
|
6. Concordance Lines and Corpus Linguistics. |
|
|
|
|
|
|
|
|
|
|
|
6.5 Collocations and Concordance Lines. |
|
|
|
6.6 Applications with References. |
|
|
|
|
|
7. Multivariate Techniques with Text. |
|
|
|
|
|
|
|
7.3 Basic Linear Algebra. |
|
|
|
7.4 Principal Component Matrices. |
|
|
|
|
|
7.6 Applications and References. |
|
|
|
|
|
|
|
|
|
8.3 A Note on Classification. |
|
|
|
|
|
|
|
9. A Sample of Additional Topics. |
|
|
|
|
|
|
|
9.3 Other Languages: Analyzing Goethe in German. |
|
|
|
|
|
|
|
Appendix A. Overview of Perl for Text Mining. |
|
|
|
A.1 Basic Data Structures. |
|
|
|
|
|
A.3 Branching and Looping. |
|
|
|
|
|
A.5 Introduction to Regular Expressions. |
|
|
|
Appendix B. Summary of R used in this Book |
|
|
|
|
|
|
|
|
|
|