Contributors |
|
xv | |
Preface |
|
xvii | |
|
Part A Linguistic Principles and Computational Resources |
|
|
|
1 Linguistics: Core Concepts and Principles |
|
|
3 | (12) |
|
|
|
|
|
3 | (1) |
|
|
4 | (1) |
|
2 Subfields of Linguistics |
|
|
4 | (3) |
|
|
7 | (3) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
13 | (2) |
|
|
13 | (1) |
|
|
14 | (1) |
|
|
15 | (16) |
|
|
|
|
15 | (2) |
|
1.1 Three Aspects of Languages |
|
|
17 | (1) |
|
|
17 | (1) |
|
|
17 | (1) |
|
3 Grammar Classes and Corresponding Languages |
|
|
18 | (9) |
|
|
19 | (1) |
|
3.2 Context-Free Languages |
|
|
20 | (3) |
|
|
23 | (3) |
|
3.4 Context-Sensitive Languages |
|
|
26 | (1) |
|
3.5 Recursively Enumerable and Recursive Languages |
|
|
27 | (1) |
|
4 A Simplistic Context-Free Grammar for English Language |
|
|
27 | (1) |
|
|
28 | (3) |
|
|
28 | (3) |
|
3 Open-Source Libraries, Application Frameworks, and Workflow Systems for NLP |
|
|
31 | (22) |
|
|
|
|
31 | (1) |
|
|
32 | (2) |
|
|
32 | (2) |
|
|
34 | (2) |
|
|
36 | (1) |
|
5 Software Libraries and Frameworks for Machine Learning |
|
|
36 | (2) |
|
|
36 | (1) |
|
5.2 Deep Learning for Java |
|
|
37 | (1) |
|
|
37 | (1) |
|
5.4 The Microsoft Cognitive Toolkit |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
38 | (1) |
|
|
38 | (1) |
|
|
38 | (1) |
|
6 Software Libraries and Frameworks for NLP |
|
|
38 | (2) |
|
6.1 Natural Language Toolkit |
|
|
38 | (1) |
|
6.2 Stanford CoreNLP Toolset |
|
|
39 | (1) |
|
|
39 | (1) |
|
6.4 General Architecture for Text Engineering |
|
|
39 | (1) |
|
6.5 Machine Learning for Language Toolkit |
|
|
39 | (1) |
|
6.6 Tools for Social Media NLP |
|
|
39 | (1) |
|
7 Task-Specific NLP Tools |
|
|
40 | (6) |
|
7.1 Language Identification |
|
|
40 | (1) |
|
7.2 Sentence Segmentation |
|
|
40 | (1) |
|
|
41 | (1) |
|
7.4 Part-of-Speech Tagging |
|
|
42 | (1) |
|
|
42 | (1) |
|
7.6 Named Entity Recognition |
|
|
43 | (1) |
|
7.7 Semantic Role Labeling |
|
|
43 | (1) |
|
7.8 Information Extraction |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
46 | (2) |
|
|
48 | (5) |
|
|
48 | (1) |
|
|
48 | (5) |
|
Part B Mathematical and Machine Learning Foundations |
|
|
|
4 Mathematical Essentials |
|
|
53 | (22) |
|
|
Sesha Phani Deepika Vadlamudi |
|
|
|
53 | (1) |
|
|
53 | (1) |
|
|
53 | (1) |
|
|
54 | (1) |
|
|
54 | (1) |
|
|
54 | (4) |
|
|
57 | (1) |
|
|
58 | (9) |
|
3.1 Vector Spaces and Subspaces |
|
|
58 | (1) |
|
3.2 Linearly Independent Sets and Bases |
|
|
58 | (3) |
|
3.3 Dimension of a Vector Space |
|
|
61 | (1) |
|
|
61 | (1) |
|
3.5 Linear Transformations and Change of Bases |
|
|
62 | (3) |
|
3.6 Eigenvalues and Eigen-Vectors |
|
|
65 | (2) |
|
|
67 | (5) |
|
|
67 | (1) |
|
4.2 Average Self-Information |
|
|
68 | (1) |
|
4.3 Conditional Self-Information |
|
|
68 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
69 | (1) |
|
|
70 | (2) |
|
4.8 Average Mutual Information |
|
|
72 | (1) |
|
|
72 | (3) |
|
|
73 | (2) |
|
|
75 | (36) |
|
|
|
|
75 | (1) |
|
|
76 | (3) |
|
3 Conditional Probability |
|
|
79 | (1) |
|
|
80 | (2) |
|
5 R-Valued Random Variables |
|
|
82 | (13) |
|
5.1 Probability Distributions |
|
|
83 | (5) |
|
5.2 Expectation and Variance |
|
|
88 | (2) |
|
5.3 Some Common Discrete Random Variables |
|
|
90 | (3) |
|
5.4 Some Common Continuous Distributions |
|
|
93 | (2) |
|
6 Rn-Valued Random Variables |
|
|
95 | (9) |
|
|
95 | (3) |
|
6.2 Expectation and Covariance |
|
|
98 | (1) |
|
6.3 Conditional Distributions and Bayes Theorem |
|
|
99 | (5) |
|
7 Independent Random Variables |
|
|
104 | (3) |
|
|
107 | (4) |
|
|
109 | (2) |
|
6 Inference and Prediction |
|
|
111 | (62) |
|
|
|
|
111 | (2) |
|
|
113 | (1) |
|
|
113 | (3) |
|
|
116 | (2) |
|
|
118 | (19) |
|
5.1 Method of Moments Estimator |
|
|
118 | (1) |
|
5.2 Maximum Likelihood Estimator |
|
|
119 | (3) |
|
5.3 Iterative Algorithms for MLE |
|
|
122 | (2) |
|
5.4 Finite Sample Properties |
|
|
124 | (4) |
|
5.5 Asymptotic Properties |
|
|
128 | (3) |
|
5.6 Bootstrap and Jackknife Resampling |
|
|
131 | (3) |
|
|
134 | (3) |
|
|
137 | (11) |
|
6.1 Likelihood Ratio Test |
|
|
139 | (3) |
|
6.2 Other Large Sample Tests |
|
|
142 | (1) |
|
6.3 Power Function and Decision Making |
|
|
143 | (2) |
|
6.4 Two-Sample Comparisons |
|
|
145 | (3) |
|
|
148 | (4) |
|
7.1 Finding Interval Estimators |
|
|
149 | (2) |
|
7.2 Evaluating Interval Estimators |
|
|
151 | (1) |
|
|
152 | (12) |
|
8.1 Prior and Posterior Distributions |
|
|
152 | (4) |
|
|
156 | (1) |
|
|
157 | (2) |
|
8.4 Bayesian Hypothesis Testing |
|
|
159 | (2) |
|
|
161 | (1) |
|
8.6 Bayes Sampling Methods |
|
|
162 | (2) |
|
9 Prediction and Model Selection |
|
|
164 | (9) |
|
|
166 | (3) |
|
|
169 | (1) |
|
9.3 From Bagging to Random Forests |
|
|
170 | (2) |
|
|
172 | (1) |
|
|
173 | (24) |
|
|
|
173 | (3) |
|
1.1 Bayes' Rule: Discrete Case |
|
|
174 | (1) |
|
1.2 Bayes' Rule: Continuous Case |
|
|
175 | (1) |
|
|
176 | (4) |
|
2.1 Inference in Bayesian Networks |
|
|
177 | (2) |
|
2.2 Bayesian Parameter Estimation |
|
|
179 | (1) |
|
|
180 | (2) |
|
|
182 | (2) |
|
4.1 Discrete Markov Network |
|
|
183 | (1) |
|
5 Inference in Markov Networks |
|
|
184 | (8) |
|
5.1 Inference as Optimization |
|
|
186 | (1) |
|
5.2 Sampling-Based Approximate Inference |
|
|
187 | (1) |
|
5.3 Markov Chain Monte Carlo Methods |
|
|
188 | (1) |
|
5.4 Markov Chains for Graphical Models |
|
|
189 | (1) |
|
|
190 | (1) |
|
5.6 Parameter Estimation in Markov Networks |
|
|
191 | (1) |
|
|
192 | (1) |
|
|
193 | (4) |
|
|
194 | (1) |
|
|
195 | (2) |
|
|
197 | (32) |
|
|
|
1 Introduction to Machine Learning |
|
|
197 | (3) |
|
|
198 | (2) |
|
1.2 Unsupervised Learning |
|
|
200 | (1) |
|
1.3 Semi-supervised Learning |
|
|
200 | (1) |
|
1.4 Reinforcement Learning |
|
|
200 | (1) |
|
|
200 | (1) |
|
3 Regularization and Bias-Variance Trade-Off |
|
|
201 | (2) |
|
4 Evaluating Machine Learning Algorithms |
|
|
203 | (4) |
|
|
203 | (1) |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
204 | (1) |
|
|
204 | (1) |
|
4.6 k-Fold Cross-validation |
|
|
205 | (1) |
|
4.7 Stratified k-Fold Cross-validation |
|
|
206 | (1) |
|
4.8 Advantage and Disadvantage of Cross-validation |
|
|
206 | (1) |
|
4.9 Bootstrapping and Bagging |
|
|
207 | (1) |
|
|
207 | (1) |
|
6 Classification Algorithms |
|
|
208 | (14) |
|
6.1 Decision Tree Algorithm |
|
|
208 | (7) |
|
6.2 Naive Bayesian Classification |
|
|
215 | (1) |
|
6.3 Support Vector Machine |
|
|
216 | (6) |
|
|
222 | (4) |
|
|
223 | (2) |
|
7.2 Hierarchical Clustering |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
226 | (3) |
|
9.1 Challenges and Opportunities |
|
|
227 | (1) |
|
|
227 | (2) |
|
9 Deep Neural Networks for Natural Language Processing |
|
|
229 | (88) |
|
|
|
|
229 | (2) |
|
2 Word Vectors Representations |
|
|
231 | (20) |
|
3 Feedforward Neural Networks |
|
|
251 | (5) |
|
|
254 | (2) |
|
4 Training Deep Models and Optimization |
|
|
256 | (7) |
|
|
257 | (3) |
|
|
260 | (3) |
|
5 Regularization for Deep Learning |
|
|
263 | (10) |
|
5.1 Parameter Norm Penalties |
|
|
264 | (6) |
|
5.2 Sparse Representation |
|
|
270 | (3) |
|
6 Sequence Modeling (Language Modeling) |
|
|
273 | (15) |
|
6.1 Count-Based Models or n-Grams |
|
|
273 | (2) |
|
6.2 Recurrent Neural Networks Language Models |
|
|
275 | (4) |
|
6.3 Bidirectional Neural Networks |
|
|
279 | (1) |
|
6.4 Vanishing and Exploding Gradient Problem |
|
|
280 | (2) |
|
6.5 The Long Short-Term Memory and Gated RNNs |
|
|
282 | (2) |
|
6.6 Encoder-Decoder Sequence-to-Sequence Architectures |
|
|
284 | (1) |
|
6.7 Recursive Neural Networks |
|
|
285 | (3) |
|
7 Convolutional Neural Networks |
|
|
288 | (8) |
|
|
289 | (4) |
|
7.2 Tricks to Improve the Performance |
|
|
293 | (2) |
|
7.3 Narrow vs Wide Convolution |
|
|
295 | (1) |
|
|
296 | (1) |
|
7.5 Application of CNN as Input for RNN |
|
|
296 | (1) |
|
|
296 | (18) |
|
|
303 | (5) |
|
8.2 Register Machines (RAM) |
|
|
308 | (2) |
|
8.3 Neural Pushdown Automata |
|
|
310 | (4) |
|
|
314 | (3) |
|
|
314 | (3) |
|
10 Deep Learning for Natural Language Processing |
|
|
317 | (14) |
|
|
|
|
|
|
317 | (1) |
|
2 Survey of Deep Learning Techniques on NLP |
|
|
318 | (2) |
|
3 Sentence Embedding Based on SOM |
|
|
320 | (3) |
|
|
320 | (2) |
|
|
322 | (1) |
|
4 Representing, Visualizing, and Processing Documents as Images |
|
|
323 | (3) |
|
5 Discussion and Conclusion |
|
|
326 | (5) |
|
|
327 | (1) |
|
|
328 | (3) |
|
Part C Applications and Linguistic Diversity |
|
|
|
11 Information Retrieval: Concepts, Models, and Systems |
|
|
331 | (72) |
|
|
|
|
|
331 | (3) |
|
|
333 | (1) |
|
2 A Reference Architecture for Current IR Systems |
|
|
334 | (1) |
|
|
335 | (3) |
|
|
335 | (1) |
|
|
336 | (1) |
|
3.3 Stemming and Lemmatization |
|
|
337 | (1) |
|
3.4 Stop Words, Accents, Case Folding, and Language Identification |
|
|
338 | (1) |
|
4 Mini Gutenberg Text Corpus |
|
|
338 | (4) |
|
4.1 Distribution of Characters |
|
|
340 | (1) |
|
4.2 Unigrams, Bigrams, and Trigrams |
|
|
340 | (1) |
|
|
341 | (1) |
|
5 A Categorization of IR Models |
|
|
342 | (4) |
|
|
346 | (4) |
|
7 Positional Index, Phrase, and Proximity Queries |
|
|
350 | (5) |
|
7.1 Processing Boolean Queries Using the Positional Inverted Index |
|
|
353 | (1) |
|
7.2 Processing Phrase Queries Using the Positional Inverted Index |
|
|
353 | (1) |
|
7.3 Processing Proximity Queries Using the Positional Inverted Index |
|
|
354 | (1) |
|
7.4 Recovering Document Source Text Using the Positional Inverted Index |
|
|
354 | (1) |
|
|
355 | (7) |
|
8.1 Log Frequency Term Weighting |
|
|
355 | (1) |
|
|
356 | (3) |
|
8.3 Term Discrimination Value |
|
|
359 | (1) |
|
8.4 Document Length Normalization |
|
|
359 | (2) |
|
|
361 | (1) |
|
|
362 | (2) |
|
10 Probabilistic IR Models |
|
|
364 | (4) |
|
10.1 Binary Independence Model |
|
|
364 | (2) |
|
|
366 | (2) |
|
11 Language Model-Based IR |
|
|
368 | (6) |
|
11.1 Statistical Language Modeling |
|
|
369 | (1) |
|
11.2 The Query Likelihood Model |
|
|
370 | (4) |
|
|
374 | (8) |
|
12.1 Precision and Recall for Unranked Retrieval |
|
|
375 | (1) |
|
|
376 | (1) |
|
12.3 Retrieval Effectiveness for Ranked Retrieval |
|
|
377 | (2) |
|
12.4 Precision-Recall Graphs |
|
|
379 | (1) |
|
|
379 | (1) |
|
12.6 Discounted Cumulative Gain |
|
|
380 | (2) |
|
12.7 Eliciting Relevance Judgments Using Pooling |
|
|
382 | (1) |
|
13 Relevance Feedback and Query Expansion |
|
|
382 | (4) |
|
13.1 Modifying Query Representation |
|
|
382 | (1) |
|
13.2 Modifying Document Representation |
|
|
383 | (1) |
|
13.3 Pseudo-Relevance Feedback |
|
|
383 | (2) |
|
13.4 Theoretical Optimal Query: Rocchio's Algorithm |
|
|
385 | (1) |
|
14 IR Libraries, Frameworks, and Test Collections |
|
|
386 | (3) |
|
14.1 Solr and Elasticsearch |
|
|
386 | (1) |
|
14.2 Lucene Image Retrieval |
|
|
386 | (1) |
|
|
387 | (1) |
|
|
387 | (1) |
|
|
387 | (1) |
|
|
388 | (1) |
|
|
388 | (1) |
|
|
389 | (4) |
|
15.1 Vocabulary, Faceted, and Exploratory Search |
|
|
389 | (1) |
|
15.2 Information Architecture |
|
|
389 | (1) |
|
15.3 Search Interfaces and User Modeling |
|
|
390 | (1) |
|
15.4 Personal Information Management |
|
|
390 | (1) |
|
15.5 Neural Network Approaches to IR |
|
|
390 | (1) |
|
|
390 | (1) |
|
15.7 Information Extraction |
|
|
391 | (1) |
|
|
391 | (1) |
|
15.9 Machine Translation-Based Approaches to IR |
|
|
391 | (1) |
|
|
391 | (1) |
|
15.11 Dynamic Information Retrieval |
|
|
391 | (1) |
|
|
392 | (1) |
|
15.13 Scholarly Collaboration Using Academic Social Web Platforms |
|
|
392 | (1) |
|
|
392 | (1) |
|
15.15 Multimedia and Cross-Language Retrieval |
|
|
392 | (1) |
|
15.16 Long-Range IR Challenges and Opportunities |
|
|
392 | (1) |
|
|
393 | (10) |
|
|
393 | (1) |
|
|
393 | (1) |
|
16.3 Machine Learning and NLP |
|
|
394 | (1) |
|
16.4 Journals and Conferences |
|
|
394 | (1) |
|
|
395 | (1) |
|
|
395 | (8) |
|
12 Natural Language Core Tasks and Applications |
|
|
403 | (26) |
|
|
|
403 | (2) |
|
|
404 | (1) |
|
2 Annotated Language Corpora |
|
|
405 | (1) |
|
3 Language Identification |
|
|
405 | (1) |
|
4 Text and Word Segmentation |
|
|
406 | (1) |
|
5 Word-Sense Disambiguation (WSD) |
|
|
407 | (1) |
|
|
408 | (2) |
|
|
410 | (3) |
|
7.1 Generative and Noisy-Channel Models |
|
|
411 | (1) |
|
7.2 Multilayer Perceptron Neural Network Model for PoS |
|
|
412 | (1) |
|
|
413 | (1) |
|
9 Named Entity Recognition |
|
|
414 | (1) |
|
|
415 | (2) |
|
11 Information Extraction |
|
|
417 | (1) |
|
|
417 | (1) |
|
13 Question-Answering Systems |
|
|
418 | (1) |
|
14 Natural Language User Interfaces |
|
|
419 | (2) |
|
|
421 | (8) |
|
|
421 | (1) |
|
|
421 | (8) |
|
13 Linguistic Elegance of the Languages of South India |
|
|
429 | (34) |
|
|
|
429 | (1) |
|
2 History and Evolution of Dravidian Languages |
|
|
430 | (6) |
|
2.1 History of Indo-European Languages |
|
|
430 | (1) |
|
|
431 | (1) |
|
|
432 | (3) |
|
|
435 | (1) |
|
3 Linguistic Elegance and Language Traditions of South Indian Languages |
|
|
436 | (13) |
|
|
436 | (4) |
|
|
440 | (7) |
|
|
447 | (2) |
|
|
449 | (1) |
|
4 Classical Languages of India |
|
|
449 | (8) |
|
|
452 | (1) |
|
|
453 | (1) |
|
|
454 | (1) |
|
|
455 | (1) |
|
4.5 Classics in Malayalam |
|
|
456 | (1) |
|
|
457 | (1) |
|
5 Influence of Other Languages on South Indian Languages |
|
|
457 | (4) |
|
5.1 Propagation of Hindi in India |
|
|
457 | (1) |
|
5.2 English as a Medium of Communication |
|
|
458 | (1) |
|
5.3 Impact of Globalization |
|
|
459 | (1) |
|
5.4 Symbiosis Between English and South Indian Languages |
|
|
460 | (1) |
|
5.5 Promoting South Indian Languages |
|
|
460 | (1) |
|
|
461 | (2) |
|
|
461 | (2) |
|
14 Text Mining for Modeling Cyberattacks |
|
|
463 | |
|
|
|
463 | (2) |
|
2 Anatomy of an Attack Pattern |
|
|
465 | (3) |
|
3 Applying Attack Patterns to Scenarios |
|
|
468 | (10) |
|
3.1 Resource Consumption Attacks |
|
|
469 | (1) |
|
3.2 Attacks for Introduction-Based Routing |
|
|
470 | (8) |
|
4 Mining Attack Pattern Text |
|
|
478 | (21) |
|
4.1 Vector-Space Attack Pattern Model |
|
|
478 | (2) |
|
4.2 Query Relevance Distance |
|
|
480 | (2) |
|
4.3 Attack Pattern Distances |
|
|
482 | (2) |
|
4.4 Attack Pattern Clustering |
|
|
484 | (15) |
|
|
499 | (5) |
|
6 Attack Pattern Hierarchies |
|
|
504 | (6) |
|
|
510 | (2) |
|
|
512 | |
|
|
513 | |