Foreword |
|
xiii | |
Preface |
|
xv | |
Acknowledgments |
|
xxi | |
About this book |
|
xxiv | |
About the authors |
|
xxvii | |
About the cover illustration |
|
xxix | |
|
|
1 | (152) |
|
1 Packets ofthought (NLP overview) |
|
|
3 | (27) |
|
1.1 Natural language vs. programming language |
|
|
4 | (1) |
|
|
4 | (4) |
|
|
5 | (1) |
|
|
6 | (2) |
|
1.3 Practical applications |
|
|
8 | (1) |
|
1.4 Language through a computer's "eyes" |
|
|
9 | (10) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (4) |
|
|
16 | (3) |
|
1.5 A brief overflight of hyperspace |
|
|
19 | (2) |
|
1.6 Word order and grammar |
|
|
21 | (1) |
|
1.7 A chatbot natural language pipeline |
|
|
22 | (3) |
|
|
25 | (2) |
|
|
27 | (3) |
|
2 Build your vocabulary (word tokenization) |
|
|
30 | (40) |
|
2.1 Challenges (a preview of stemming) |
|
|
32 | (1) |
|
2.2 Building your vocabulary with a tokenizer |
|
|
33 | (29) |
|
|
41 | (1) |
|
Measuring bag-of-words overlap |
|
|
42 | (1) |
|
|
43 | (5) |
|
Extending your vocabulary with n-grams |
|
|
48 | (6) |
|
Normalizing your vocabulary |
|
|
54 | (8) |
|
|
62 | (8) |
|
VADER---A rule-based sentiment analyzer |
|
|
64 | (1) |
|
|
65 | (5) |
|
3 Math with words (TF-IDF vectors) |
|
|
70 | (27) |
|
|
71 | (5) |
|
|
76 | (7) |
|
|
79 | (4) |
|
|
83 | (3) |
|
|
86 | (11) |
|
|
89 | (1) |
|
|
90 | (3) |
|
|
93 | (1) |
|
|
93 | (2) |
|
|
95 | (1) |
|
|
95 | (2) |
|
4 Finding meaning in word counts (semantic analysis) |
|
|
97 | (56) |
|
4.1 From word counts to topic scores |
|
|
98 | (13) |
|
TF-IDF vectors and lemmatization |
|
|
99 | (1) |
|
|
99 | (2) |
|
|
101 | (4) |
|
An algorithm for scoring topics |
|
|
105 | (2) |
|
|
107 | (4) |
|
4.2 Latent semantic analysis |
|
|
111 | (5) |
|
Your thought experiment made real |
|
|
113 | (3) |
|
4.3 Singular value decomposition |
|
|
116 | (7) |
|
U---left singular vectors |
|
|
118 | (1) |
|
|
119 | (1) |
|
VT---right singular vectors |
|
|
120 | (1) |
|
|
120 | (1) |
|
|
121 | (2) |
|
4.4 Principal component analysis |
|
|
123 | (11) |
|
|
125 | (1) |
|
Stop horsing around and get back to NLP |
|
|
126 | (2) |
|
Using PCA for SMS message semantic analysis |
|
|
128 | (2) |
|
Using truncated SVD for SMS message semantic analysis |
|
|
130 | (1) |
|
How well does LSA work for spam classification ? |
|
|
131 | (3) |
|
4.5 Latent Dirichlet allocation (LDiA) |
|
|
134 | (9) |
|
|
135 | (2) |
|
LDiA topic model for SMS messages |
|
|
137 | (3) |
|
LDiA + LDA = spam classifier |
|
|
140 | (2) |
|
A fairer comparison: 32 LDiA topics |
|
|
142 | (1) |
|
4.6 Distance and similarity |
|
|
143 | (3) |
|
4.7 Steering with feedback |
|
|
146 | (2) |
|
Linear discriminant analysis |
|
|
147 | (1) |
|
|
148 | (5) |
|
|
150 | (2) |
|
|
152 | (1) |
|
Part 2 Deeper learning (neural networks) |
|
|
153 | (184) |
|
5 Baby steps with neural networks (perceptrons and backpropagation) |
|
|
155 | (26) |
|
5.1 Neural networks, the ingredient list |
|
|
156 | (25) |
|
|
157 | (1) |
|
|
157 | (1) |
|
|
158 | (14) |
|
Let's go skiing---the error surface |
|
|
172 | (1) |
|
Off the chair lift, onto the slope |
|
|
173 | (1) |
|
Let's shake things up a bit |
|
|
174 | (1) |
|
Keras: neural networks in Python |
|
|
175 | (4) |
|
|
179 | (1) |
|
Normalization: input with style |
|
|
179 | (2) |
|
6 Reasoning with word vectors (Word2vec) |
|
|
181 | (37) |
|
6.1 Semantic queries and analogies |
|
|
182 | (2) |
|
|
183 | (1) |
|
|
184 | (34) |
|
Vector-oriented reasoning |
|
|
187 | (4) |
|
How to compute Word2vec representations |
|
|
191 | (9) |
|
How to use the gensim.word2vec module |
|
|
200 | (2) |
|
How to generate your own word vector representations |
|
|
202 | (3) |
|
Word2vec vs. GloVe (Global Vectors) |
|
|
205 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
|
Visualizing word relationships |
|
|
207 | (7) |
|
|
214 | (1) |
|
Document similarity with Doc2vec |
|
|
215 | (3) |
|
7 Getting words in order with convolutional neural networks' (CNNs) |
|
|
218 | (29) |
|
|
220 | (1) |
|
|
221 | (1) |
|
7.3 Convolutional neural nets |
|
|
222 | (6) |
|
|
223 | (1) |
|
|
224 | (1) |
|
|
224 | (2) |
|
|
226 | (2) |
|
|
228 | (1) |
|
7.4 Narrow windows indeed |
|
|
228 | (19) |
|
Implementation in Keras: prepping the data |
|
|
230 | (5) |
|
Convolutional neural network architecture |
|
|
235 | (1) |
|
|
236 | (2) |
|
|
238 | (1) |
|
|
239 | (2) |
|
Let's get to learning (training) |
|
|
241 | (2) |
|
Using the model in a pipeline |
|
|
243 | (1) |
|
Where do you go from here? |
|
|
244 | (3) |
|
8 Loopy (recurrent) neural networks (RNNs) |
|
|
247 | (27) |
|
8.1 Remembering with recurrent networks |
|
|
250 | (14) |
|
Backpropagation through time |
|
|
255 | (2) |
|
|
257 | (2) |
|
|
259 | (1) |
|
|
259 | (1) |
|
Recurrent neural net with Keras |
|
|
260 | (4) |
|
8.2 Putting things together |
|
|
264 | (2) |
|
8.3 Let's get to learning our past selves |
|
|
266 | (1) |
|
|
267 | (2) |
|
|
269 | (5) |
|
|
270 | (1) |
|
|
271 | (1) |
|
|
272 | (2) |
|
9 Improving retention with long short-term memory networks |
|
|
274 | (37) |
|
|
275 | (36) |
|
Backpropagation through time |
|
|
284 | (3) |
|
Where does the rubber hit the road? |
|
|
287 | (1) |
|
|
288 | (3) |
|
|
291 | (1) |
|
Words are hard. Letters are easier |
|
|
292 | (6) |
|
|
298 | (2) |
|
My turn to speak more clearly |
|
|
300 | (8) |
|
Learned how to say, but not yet what |
|
|
308 | (1) |
|
|
308 | (1) |
|
|
309 | (2) |
|
10 Sequence-to-sequence models and attention |
|
|
311 | (26) |
|
10.1 Encoder-decoder architecture |
|
|
312 | (6) |
|
|
313 | (2) |
|
|
315 | (1) |
|
Sequence-to-sequence conversation |
|
|
316 | (1) |
|
|
317 | (1) |
|
10.2 Assembling a sequence-to-sequence pipeline |
|
|
318 | (6) |
|
Preparing your datasetfor the sequence-to-sequence training |
|
|
318 | (2) |
|
Sequence-to-sequence model in Keras |
|
|
320 | (1) |
|
|
320 | (2) |
|
|
322 | (1) |
|
Assembling the sequence-to-sequence network |
|
|
323 | (1) |
|
10.3 Training the sequence-to-sequence network |
|
|
324 | (2) |
|
Generate output sequences |
|
|
325 | (1) |
|
10.4 Building a chatbot using sequence-to-sequence networks |
|
|
326 | (6) |
|
Preparing the corpus for your training |
|
|
326 | (1) |
|
Building your character dictionary |
|
|
327 | (1) |
|
Generate one-hot encoded training sets |
|
|
328 | (1) |
|
Train your sequence-to-sequence chatbot |
|
|
329 | (1) |
|
Assemble the model for sequence generation |
|
|
330 | (1) |
|
|
330 | (1) |
|
|
331 | (1) |
|
Converse with your chatbot |
|
|
331 | (1) |
|
|
332 | (2) |
|
Reduce training complexity with bucketing |
|
|
332 | (1) |
|
|
333 | (1) |
|
|
334 | (3) |
|
Part 3 Getting real (real-world NLP challenges) |
|
|
337 | (90) |
|
11 Information extraction (named entity extraction and question answering) |
|
|
339 | (26) |
|
11.1 Named entities and relations |
|
|
339 | (4) |
|
|
340 | (3) |
|
|
343 | (1) |
|
|
343 | (3) |
|
|
344 | (1) |
|
Information extraction as ML feature extraction |
|
|
345 | (1) |
|
11.3 Information worth extracting |
|
|
346 | (6) |
|
|
347 | (1) |
|
|
347 | (5) |
|
11.4 Extracting relationships (relations) |
|
|
352 | (11) |
|
Part-of-speech (POS) tagging |
|
|
353 | (4) |
|
Entity name normalization |
|
|
357 | (1) |
|
Relation normalization and extraction |
|
|
358 | (1) |
|
|
358 | (1) |
|
|
359 | (1) |
|
Why won't split(`.!?') work? |
|
|
360 | (1) |
|
Sentence segmentation with regular expressions |
|
|
361 | (2) |
|
|
363 | (2) |
|
12 Getting chatty (dialog engines) |
|
|
365 | (38) |
|
|
366 | (7) |
|
|
367 | (6) |
|
|
373 | (1) |
|
12.2 Pattern-matching approach |
|
|
373 | (9) |
|
A pattern-matching chatbot with AIML |
|
|
375 | (6) |
|
A network view of pattern matching |
|
|
381 | (1) |
|
|
382 | (2) |
|
|
384 | (7) |
|
|
384 | (2) |
|
Example retrieval-based chatbot |
|
|
386 | (3) |
|
|
389 | (2) |
|
|
391 | (4) |
|
|
392 | (2) |
|
Pros and cons of each approach |
|
|
394 | (1) |
|
|
395 | (1) |
|
|
395 | (1) |
|
|
396 | (3) |
|
|
399 | (2) |
|
Ask questions with predictable answers |
|
|
399 | (1) |
|
|
399 | (1) |
|
When all else fails, search |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
401 | (2) |
|
13 Scaling up (optimization, parallelization, and batch processing) |
|
|
403 | (24) |
|
13.1 Too much of a good thing (data) |
|
|
404 | (1) |
|
13.2 Optimizing NLP algorithms |
|
|
404 | (10) |
|
|
405 | (1) |
|
|
406 | (2) |
|
Advanced indexing with Annoy |
|
|
408 | (4) |
|
Why use approximate indexes at all? |
|
|
412 | (1) |
|
An indexing workaround: discretizing |
|
|
413 | (1) |
|
13.3 Constant RAM algorithms |
|
|
414 | (2) |
|
|
414 | (1) |
|
|
415 | (1) |
|
13.4 Parallelizing your NLP computations |
|
|
416 | (3) |
|
Training NLP models on GPUs |
|
|
416 | (1) |
|
|
417 | (1) |
|
|
418 | (1) |
|
|
419 | (1) |
|
13.5 Reducing the memory footprint during model training |
|
|
419 | (3) |
|
13.6 Gaining model insights with TensorBoard |
|
|
422 | (5) |
|
How to visualize word embeddings |
|
|
423 | (4) |
Appendix A Your NLP tools |
|
427 | (7) |
Appendix B Playful Python and regular expressions |
|
434 | (6) |
Appendix C Vectors and matrices (linear algebra fundamentals) |
|
440 | (6) |
Appendix D Machine learning tools and techniques |
|
446 | (13) |
Appendix E Setting up your A WS GPU |
|
459 | (14) |
Appendix F Locality sensitive hashing |
|
473 | (8) |
Resources |
|
481 | (9) |
Glossary |
|
490 | (7) |
Index |
|
497 | |