Preface |
|
xiii | |
Acknowledgments |
|
xv | |
About This book |
|
xvii | |
About the author |
|
xxii | |
About the cover illustration |
|
xxiii | |
|
|
1 | (30) |
|
1.1 A brief history of NLP |
|
|
2 | (3) |
|
|
5 | (26) |
|
|
5 | (11) |
|
Advanced information search: Asking the machine precise questions |
|
|
16 | (2) |
|
Conversational agents and intelligent virtual assistants |
|
|
18 | (2) |
|
Text prediction and language generation |
|
|
20 | (5) |
|
|
25 | (1) |
|
|
26 | (2) |
|
Spell- and grammar checking |
|
|
28 | (3) |
|
|
31 | (40) |
|
2.1 Introducing NLP in practice: Spam filtering |
|
|
31 | (5) |
|
2.2 Understanding the task |
|
|
36 | (10) |
|
Step 1 Define the data and classes |
|
|
37 | (1) |
|
Step 2 Split the text into words |
|
|
37 | (5) |
|
Step 3 Extract and normalize the features |
|
|
42 | (1) |
|
Step 4 Train a classifier |
|
|
43 | (2) |
|
Step 5 Evaluate the classifier |
|
|
45 | (1) |
|
2.3 Implementing your own spam filter |
|
|
46 | (19) |
|
Step 1 Define the data and classes |
|
|
46 | (3) |
|
Step 2 Split the text into words |
|
|
49 | (1) |
|
Step 3 Extract and normalize the features |
|
|
50 | (3) |
|
Step 4 Train the classifier |
|
|
53 | (9) |
|
Step 5 Evaluate your classifier |
|
|
62 | (3) |
|
2.4 Deploying your spam filter in practice |
|
|
65 | (6) |
|
3 Introduction to information search |
|
|
71 | (43) |
|
3.1 Understanding the task |
|
|
72 | (15) |
|
|
75 | (8) |
|
|
83 | (4) |
|
3.2 Processing the data further |
|
|
87 | (9) |
|
Preselecting the words that matter: Stopwords removal |
|
|
87 | (3) |
|
Matching forms of the same word: Morphological processing |
|
|
90 | (6) |
|
|
96 | (7) |
|
Weighing words with term frequency |
|
|
97 | (3) |
|
Weighing words with inverse document frequency |
|
|
100 | (3) |
|
3.4 Practical use of the search algorithm |
|
|
103 | (11) |
|
Retrieval of the most similar documents |
|
|
104 | (2) |
|
Evaluation of the results |
|
|
106 | (5) |
|
Deploying search algorithm in practice |
|
|
111 | (3) |
|
|
114 | (37) |
|
|
116 | (4) |
|
|
116 | (1) |
|
|
117 | (2) |
|
|
119 | (1) |
|
4.2 Understanding the task |
|
|
120 | (4) |
|
4.3 Detecting word types with part-of-speech tagging |
|
|
124 | (13) |
|
|
124 | (4) |
|
Part-of-speech tagging with spaCy |
|
|
128 | (9) |
|
4.4 Understanding sentence structure with syntactic parsing |
|
|
137 | (7) |
|
Why sentence structure is important |
|
|
137 | (2) |
|
Dependency parsing with spaCy |
|
|
139 | (5) |
|
4.5 Building your own information extraction algorithm |
|
|
144 | (7) |
|
5 Author profiling as a machine-learning task |
|
|
151 | (43) |
|
5.1 Understanding the task |
|
|
153 | (4) |
|
Case 1 Authorship attribution |
|
|
154 | (1) |
|
|
155 | (2) |
|
5.2 Machine-learning pipeline at first glance |
|
|
157 | (18) |
|
|
157 | (6) |
|
Testing generalization behavior |
|
|
163 | (6) |
|
|
169 | (6) |
|
5.3 A closer look at the machine-learning pipeline |
|
|
175 | (19) |
|
Decision Trees classifier basics |
|
|
175 | (3) |
|
Evaluating which tree is better using node impurity |
|
|
178 | (6) |
|
Selection of the best split in Decision Trees |
|
|
184 | (1) |
|
Decision Trees on language data |
|
|
185 | (9) |
|
6 Linguistic feature engineering for author profiling |
|
|
194 | (35) |
|
6.1 Another close look at the machine-learning pipeline |
|
|
196 | (4) |
|
Evaluating the performance of your classifier |
|
|
196 | (1) |
|
Further evaluation measures |
|
|
197 | (3) |
|
6.2 Feature engineering for authorship attribution |
|
|
200 | (26) |
|
Word and sentence length statistics as features |
|
|
201 | (6) |
|
Counts of stopwords and proportion of stopwords as features |
|
|
207 | (5) |
|
Distributions of parts of speech as features |
|
|
212 | (7) |
|
Distribution of word suffixes as features |
|
|
219 | (4) |
|
|
223 | (3) |
|
6.3 Practical use of authorship attribution and user profiling |
|
|
226 | (3) |
|
7 Your first sentiment analyzer using sentiment lexicons |
|
|
229 | (34) |
|
|
231 | (3) |
|
7.2 Understanding your task |
|
|
234 | (5) |
|
Aggregating sentiment score with the help of a lexicon |
|
|
235 | (2) |
|
Learning to detect sentiment in a data-driven way |
|
|
237 | (2) |
|
7.3 Setting up the pipeline: Data loading and analysis |
|
|
239 | (12) |
|
Data loading and preprocessing |
|
|
240 | (3) |
|
A closer look into the data |
|
|
243 | (8) |
|
7.4 Aggregating sentiment scores with a sentiment lexicon |
|
|
251 | (12) |
|
Collecting sentiment scores from a lexicon |
|
|
252 | (3) |
|
Applying sentiment scores to detect review polarity |
|
|
255 | (8) |
|
8 Sentiment analysis with a data-driven approach |
|
|
263 | (41) |
|
8.1 Addressing multiple senses of a word with SentiWordNet |
|
|
266 | (11) |
|
8.2 Addressing dependence on context with machine learning |
|
|
277 | (18) |
|
|
278 | (6) |
|
Extracting features from text |
|
|
284 | (5) |
|
Scikit-learn's machine-learning pipeline |
|
|
289 | (3) |
|
Full-scale evaluation with cross-validation |
|
|
292 | (3) |
|
8.3 Varying the length of the sentiment-bearing features |
|
|
295 | (3) |
|
8.4 Negation handling for sentiment analysis |
|
|
298 | (3) |
|
|
301 | (3) |
|
|
304 | (118) |
|
9.1 Topic classification as a supervised machine-learning task |
|
|
307 | (18) |
|
|
308 | (4) |
|
Topic classification with Naive Bayes |
|
|
312 | (8) |
|
Evaluation of the results |
|
|
320 | (5) |
|
9.2 Topic discovery as an unsupervised machine-learning task |
|
|
325 | (24) |
|
Unsupervised ML approaches |
|
|
325 | (5) |
|
Clustering for topic discovery |
|
|
330 | (8) |
|
Evaluation of the topic clustering algorithm |
|
|
338 | (8) |
|
|
346 | (3) |
|
10.1 Topic modeling with latent Dirichlet allocation |
|
|
349 | (11) |
|
Exercise 10.1 Question 1 solution |
|
|
349 | (2) |
|
Exercise 10.1 Question 2 solution |
|
|
351 | (1) |
|
Estimating parameters for the LDA |
|
|
352 | (4) |
|
IDA as a generative model |
|
|
356 | (4) |
|
10.2 Implementation of the topic modeling algorithm |
|
|
360 | (28) |
|
|
361 | (2) |
|
|
363 | (8) |
|
|
371 | (4) |
|
|
375 | (9) |
|
|
384 | (4) |
|
11.1 Named entity recognition: Definitions and challenges |
|
|
388 | (4) |
|
|
388 | (2) |
|
Challenges in named entity recognition |
|
|
390 | (2) |
|
11.2 Named-entity recognition as a sequence labeling task |
|
|
392 | (11) |
|
|
393 | (2) |
|
What does it mean for a task to be sequential? |
|
|
395 | (2) |
|
Sequential solution for NER |
|
|
397 | (6) |
|
11.3 Practical applications of NER |
|
|
403 | (19) |
|
Data loading and exploration |
|
|
403 | (3) |
|
Named entity types exploration with spaCy |
|
|
406 | (10) |
|
Information extraction revisited |
|
|
410 | (6) |
|
Named entities visualization |
|
|
416 | (6) |
Appendix Installation instructions |
|
422 | (1) |
Index |
|
423 | |