If you are an NLP or machine learning enthusiast with some or no experience in text processing, then this book is for you. This book is also ideal for expert Python programmers who want to learn NLTK quickly.
Preface |
|
v | |
|
Chapter 1 Introduction to Natural Language Processing |
|
|
1 | (18) |
|
|
2 | (3) |
|
Let's start playing with Python! |
|
|
5 | (6) |
|
|
5 | (1) |
|
|
6 | (2) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (6) |
|
|
17 | (1) |
|
|
17 | (2) |
|
Chapter 2 Text Wrangling and Cleansing |
|
|
19 | (12) |
|
|
19 | (3) |
|
|
22 | (1) |
|
|
22 | (1) |
|
|
23 | (1) |
|
|
24 | (2) |
|
|
26 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
|
28 | (1) |
|
|
29 | (2) |
|
Chapter 3 Part of Speech Tagging |
|
|
31 | (14) |
|
What is Part of speech tagging |
|
|
31 | (9) |
|
|
34 | (1) |
|
Diving deep into a tagger |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
39 | (1) |
|
Machine learning based tagger |
|
|
39 | (1) |
|
Named Entity Recognition (NER) |
|
|
40 | (2) |
|
|
40 | (2) |
|
|
42 | (1) |
|
|
43 | (2) |
|
Chapter 4 Parsing Structure in Text |
|
|
45 | (14) |
|
Shallow versus deep parsing |
|
|
46 | (1) |
|
The two approaches in parsing |
|
|
46 | (1) |
|
|
46 | (2) |
|
Different types of parsers |
|
|
48 | (2) |
|
A recursive descent parser |
|
|
48 | (1) |
|
|
48 | (1) |
|
|
49 | (1) |
|
|
49 | (1) |
|
|
50 | (2) |
|
|
52 | (3) |
|
|
55 | (3) |
|
Named-entity recognition (NER) |
|
|
56 | (1) |
|
|
57 | (1) |
|
|
58 | (1) |
|
Chapter 5 NLP Applications |
|
|
59 | (14) |
|
Building your first NLP application |
|
|
60 | (3) |
|
|
63 | (9) |
|
|
63 | (2) |
|
Statistical machine translation |
|
|
65 | (1) |
|
|
65 | (1) |
|
|
66 | (1) |
|
|
66 | (1) |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
68 | (2) |
|
|
70 | (1) |
|
Question answering systems |
|
|
70 | (1) |
|
|
71 | (1) |
|
Word sense disambiguation |
|
|
71 | (1) |
|
|
71 | (1) |
|
|
72 | (1) |
|
Optical character recognition |
|
|
72 | (1) |
|
|
72 | (1) |
|
Chapter 6 Text Classification |
|
|
73 | (20) |
|
|
74 | (1) |
|
|
75 | (2) |
|
|
77 | (10) |
|
|
80 | (3) |
|
|
83 | (1) |
|
Stochastic gradient descent |
|
|
84 | (1) |
|
|
85 | (1) |
|
|
85 | (2) |
|
The Random forest algorithm |
|
|
87 | (1) |
|
|
87 | (2) |
|
|
88 | (1) |
|
|
89 | (2) |
|
|
89 | (2) |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
93 | (16) |
|
|
93 | (1) |
|
Writing your first crawler |
|
|
94 | (3) |
|
|
97 | (8) |
|
|
98 | (5) |
|
|
103 | (2) |
|
|
105 | (1) |
|
|
106 | (2) |
|
|
108 | (1) |
|
|
108 | (1) |
|
Chapter 8 Using NLTK with Other Python Libraries |
|
|
109 | (28) |
|
|
110 | (8) |
|
|
110 | (1) |
|
|
111 | (1) |
|
|
111 | (2) |
|
Extracting data from an array |
|
|
113 | (1) |
|
Complex matrix operations |
|
|
114 | (2) |
|
|
116 | (2) |
|
|
118 | (1) |
|
|
118 | (6) |
|
|
119 | (1) |
|
Eigenvalues and eigenvectors |
|
|
120 | (1) |
|
|
121 | (1) |
|
|
122 | (2) |
|
|
124 | (6) |
|
|
124 | (3) |
|
|
127 | (1) |
|
|
128 | (1) |
|
|
128 | (2) |
|
|
130 | (5) |
|
|
131 | (2) |
|
|
133 | (1) |
|
|
134 | (1) |
|
|
134 | (1) |
|
|
134 | (1) |
|
|
135 | (1) |
|
|
135 | (2) |
|
Chapter 9 Social Media Mining in Python |
|
|
137 | (18) |
|
|
138 | (4) |
|
|
138 | (4) |
|
|
142 | (2) |
|
|
143 | (1) |
|
|
144 | (9) |
|
|
145 | (1) |
|
|
146 | (5) |
|
|
151 | (2) |
|
|
153 | (2) |
|
Chapter 10 Text Mining at Scale |
|
|
155 | (14) |
|
Different ways of using Python on Hadoop |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (1) |
|
|
157 | (4) |
|
|
157 | (3) |
|
|
160 | (1) |
|
|
161 | (4) |
|
|
165 | (2) |
|
|
167 | (2) |
Index |
|
169 | |
Nitin Hardeniya is a data scientist with more than 4 years of experience working with companies such as Fidelity, Groupon, and [ 24]7-inc. He has worked on a variety of business problems across different domains. He holds a master's degree in computational linguistics from IIIT-H. He is the author of 5 patents in the field of customer experience. He is passionate about language processing and large unstructured data. He has been using Python for almost 5 years in his day-to-day work. He believes that Python could be a single-point solution to most of the problems related to data science. He has put on his hacker's hat to write this book and has tried to give you an introduction to all the sophisticated tools related to NLP and machine learning in a very simplified form. In this book, he has also provided a workaround using some of the amazing capabilities of Python libraries, such as NLTK, scikit-learn, pandas, and NumPy.