Muutke küpsiste eelistusi

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications, Volume 38 [Kõva köide]

Volume editor (Professor, East Carolina University, NC, USA), Series edited by (University of Hyderabad Campus, India)
  • Formaat: Hardback, 537 pages, kõrgus x laius: 229x152 mm, kaal: 980 g
  • Sari: Handbook of Statistics
  • Ilmumisaeg: 29-Aug-2018
  • Kirjastus: North-Holland
  • ISBN-10: 0444640428
  • ISBN-13: 9780444640420
  • Formaat: Hardback, 537 pages, kõrgus x laius: 229x152 mm, kaal: 980 g
  • Sari: Handbook of Statistics
  • Ilmumisaeg: 29-Aug-2018
  • Kirjastus: North-Holland
  • ISBN-10: 0444640428
  • ISBN-13: 9780444640420

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications, Volume 38, the latest release in this monograph that provides a cohesive and integrated exposition of these advances and associated applications, includes new chapters on Linguistics: Core Concepts and Principles, Grammars, Open-Source Libraries, Application Frameworks, Workflow Systems, Mathematical Essentials, Probability, Inference and Prediction Methods, Random Processes, Bayesian Methods, Machine Learning, Artificial Neural Networks for Natural Language Processing, Information Retrieval, Language Core Tasks, Language Understanding Applications, and more.

The synergistic confluence of linguistics, statistics, big data, and high-performance computing is the underlying force for the recent and dramatic advances in analyzing and understanding natural languages, hence making this series all the more important.

  • Provides a thorough treatment of open-source libraries, application frameworks and workflow systems for natural language analysis and understanding
  • Presents new chapters on Linguistics: Core Concepts and Principles, Grammars, Open-Source Libraries, Application Frameworks, Workflow Systems, Mathematical Essentials, Probability, and more
Contributors xv
Preface xvii
Part A Linguistic Principles and Computational Resources
1 Linguistics: Core Concepts and Principles
3(12)
Akhil Cudivada
Dhana L. Rao
Venkat N. Cudivada
1 Introduction
3(1)
1.1
Chapter Organization
4(1)
2 Subfields of Linguistics
4(3)
3 Variation in Languages
7(3)
4 Phonetics
10(1)
5 Phonology
11(1)
6 Morphology
11(1)
7 Syntax
12(1)
8 Semantics
13(1)
9 Summary
13(2)
Acknowledgment
13(1)
References
14(1)
2 Languages and Grammar
15(16)
Akhil Gudivada
Dhana L. Rao
1 Introduction
15(2)
1.1 Three Aspects of Languages
17(1)
1.2
Chapter Organization
17(1)
2 Formal Grammars
17(1)
3 Grammar Classes and Corresponding Languages
18(9)
3.1 Regular Languages
19(1)
3.2 Context-Free Languages
20(3)
3.3 Parse Trees
23(3)
3.4 Context-Sensitive Languages
26(1)
3.5 Recursively Enumerable and Recursive Languages
27(1)
4 A Simplistic Context-Free Grammar for English Language
27(1)
5 Summary
28(3)
References
28(3)
3 Open-Source Libraries, Application Frameworks, and Workflow Systems for NLP
31(22)
Venkat N. Cudivada
Kamyar Arbabifard
1 Introduction
31(1)
2 Corpus Datasets
32(2)
2.1 Corpus Tools
32(2)
3 NLP Datasets
34(2)
4 Treebanks
36(1)
5 Software Libraries and Frameworks for Machine Learning
36(2)
5.1 TensorFlow
36(1)
5.2 Deep Learning for Java
37(1)
5.3 Apache MXNet
37(1)
5.4 The Microsoft Cognitive Toolkit
37(1)
5.5 Keras
37(1)
5.6 Torch and PyTorch
37(1)
5.7 Scikit-Learn
38(1)
5.8 Caffe
38(1)
5.9 Accord.NET
38(1)
5.10 Spark MLlib
38(1)
6 Software Libraries and Frameworks for NLP
38(2)
6.1 Natural Language Toolkit
38(1)
6.2 Stanford CoreNLP Toolset
39(1)
6.3 Apache OpenNLP
39(1)
6.4 General Architecture for Text Engineering
39(1)
6.5 Machine Learning for Language Toolkit
39(1)
6.6 Tools for Social Media NLP
39(1)
7 Task-Specific NLP Tools
40(6)
7.1 Language Identification
40(1)
7.2 Sentence Segmentation
40(1)
7.3 Word Segmentation
41(1)
7.4 Part-of-Speech Tagging
42(1)
7.5 Parsing
42(1)
7.6 Named Entity Recognition
43(1)
7.7 Semantic Role Labeling
43(1)
7.8 Information Extraction
44(1)
7.9 Machine Translation
45(1)
7.10 Topic Modeling
46(1)
8 Workflow Systems
46(2)
9 Conclusions
48(5)
Acknowledgment
48(1)
References
48(5)
Part B Mathematical and Machine Learning Foundations
4 Mathematical Essentials
53(22)
China Venkaiah Vadlamudi
Sesha Phani Deepika Vadlamudi
1 Introduction
53(1)
1.1 Functions
53(1)
1.2 Linear Algebra
53(1)
1.3 Information Theory
54(1)
1.4 Optimization
54(1)
2 Functions
54(4)
2.1 Operations
57(1)
3 Linear Algebra
58(9)
3.1 Vector Spaces and Subspaces
58(1)
3.2 Linearly Independent Sets and Bases
58(3)
3.3 Dimension of a Vector Space
61(1)
3.4 Orthogonality
61(1)
3.5 Linear Transformations and Change of Bases
62(3)
3.6 Eigenvalues and Eigen-Vectors
65(2)
4 Information Theory
67(5)
4.1 Self-Information
67(1)
4.2 Average Self-Information
68(1)
4.3 Conditional Self-Information
68(1)
4.4 Mutual Information
68(1)
4.5 Entropy
69(1)
4.6 Joint Entropy
69(1)
4.7 Conditional Entropy
70(2)
4.8 Average Mutual Information
72(1)
5 Optimization
72(3)
Further Reading
73(2)
5 Probability Essentials
75(36)
Paul Vos
Qiang Wu
1 Preliminaries
75(1)
2 Formal Definitions
76(3)
3 Conditional Probability
79(1)
4 Bayes Theorem
80(2)
5 R-Valued Random Variables
82(13)
5.1 Probability Distributions
83(5)
5.2 Expectation and Variance
88(2)
5.3 Some Common Discrete Random Variables
90(3)
5.4 Some Common Continuous Distributions
93(2)
6 Rn-Valued Random Variables
95(9)
6.1 Joint Distributions
95(3)
6.2 Expectation and Covariance
98(1)
6.3 Conditional Distributions and Bayes Theorem
99(5)
7 Independent Random Variables
104(3)
8 Central Limit Theorem
107(4)
References
109(2)
6 Inference and Prediction
111(62)
Qiang Wu
Paul Vos
1 Introduction
111(2)
2 Notation
113(1)
3 Sufficient Statistics
113(3)
4 Likelihood Principle
116(2)
5 Point Estimation
118(19)
5.1 Method of Moments Estimator
118(1)
5.2 Maximum Likelihood Estimator
119(3)
5.3 Iterative Algorithms for MLE
122(2)
5.4 Finite Sample Properties
124(4)
5.5 Asymptotic Properties
128(3)
5.6 Bootstrap and Jackknife Resampling
131(3)
5.7 Robust Estimators
134(3)
6 Hypothesis Testing
137(11)
6.1 Likelihood Ratio Test
139(3)
6.2 Other Large Sample Tests
142(1)
6.3 Power Function and Decision Making
143(2)
6.4 Two-Sample Comparisons
145(3)
7 Interval Estimation
148(4)
7.1 Finding Interval Estimators
149(2)
7.2 Evaluating Interval Estimators
151(1)
8 Bayesian Methods
152(12)
8.1 Prior and Posterior Distributions
152(4)
8.2 Improper Prior
156(1)
8.3 Bayesian Estimation
157(2)
8.4 Bayesian Hypothesis Testing
159(2)
8.5 Bayesian Prediction
161(1)
8.6 Bayes Sampling Methods
162(2)
9 Prediction and Model Selection
164(9)
9.1 Model Selection
166(3)
9.2 Cross-Validation
169(1)
9.3 From Bagging to Random Forests
170(2)
References
172(1)
7 Bayesian Methods
173(24)
Indranil Ghosh
1 Bayesian Methods
173(3)
1.1 Bayes' Rule: Discrete Case
174(1)
1.2 Bayes' Rule: Continuous Case
175(1)
2 Bayesian Networks
176(4)
2.1 Inference in Bayesian Networks
177(2)
2.2 Bayesian Parameter Estimation
179(1)
3 Conceptual Exercises
180(2)
4 Markov Networks
182(2)
4.1 Discrete Markov Network
183(1)
5 Inference in Markov Networks
184(8)
5.1 Inference as Optimization
186(1)
5.2 Sampling-Based Approximate Inference
187(1)
5.3 Markov Chain Monte Carlo Methods
188(1)
5.4 Markov Chains for Graphical Models
189(1)
5.5 Gibbs Sampling
190(1)
5.6 Parameter Estimation in Markov Networks
191(1)
6 Conceptual Exercises
192(1)
7 Summary
193(4)
Acknowledgments
194(1)
References
195(2)
8 Machine Learning
197(32)
Gangadhar Shobha
Shanta Rangaswamy
1 Introduction to Machine Learning
197(3)
1.1 Supervised Learning
198(2)
1.2 Unsupervised Learning
200(1)
1.3 Semi-supervised Learning
200(1)
1.4 Reinforcement Learning
200(1)
2 Terminologies
200(1)
3 Regularization and Bias-Variance Trade-Off
201(2)
4 Evaluating Machine Learning Algorithms
203(4)
4.1 Accuracy
203(1)
4.2 Confusion Matrix
203(1)
4.3 Precision and Recall
204(1)
4.4 F Measure
204(1)
4.5 Regression Metrics
204(1)
4.6 k-Fold Cross-validation
205(1)
4.7 Stratified k-Fold Cross-validation
206(1)
4.8 Advantage and Disadvantage of Cross-validation
206(1)
4.9 Bootstrapping and Bagging
207(1)
5 Regression Algorithms
207(1)
6 Classification Algorithms
208(14)
6.1 Decision Tree Algorithm
208(7)
6.2 Naive Bayesian Classification
215(1)
6.3 Support Vector Machine
216(6)
7 Clustering Algorithms
222(4)
7.1 k-Means Clustering
223(2)
7.2 Hierarchical Clustering
225(1)
8 Applications
226(1)
9 Conclusion
226(3)
9.1 Challenges and Opportunities
227(1)
Further Reading
227(2)
9 Deep Neural Networks for Natural Language Processing
229(88)
Ehsan Fathi
Babak Maleki Shoja
1 Introduction
229(2)
2 Word Vectors Representations
231(20)
3 Feedforward Neural Networks
251(5)
3.1 Neural Networks
254(2)
4 Training Deep Models and Optimization
256(7)
4.1 Hidden Units
257(3)
4.2 Backpropagation
260(3)
5 Regularization for Deep Learning
263(10)
5.1 Parameter Norm Penalties
264(6)
5.2 Sparse Representation
270(3)
6 Sequence Modeling (Language Modeling)
273(15)
6.1 Count-Based Models or n-Grams
273(2)
6.2 Recurrent Neural Networks Language Models
275(4)
6.3 Bidirectional Neural Networks
279(1)
6.4 Vanishing and Exploding Gradient Problem
280(2)
6.5 The Long Short-Term Memory and Gated RNNs
282(2)
6.6 Encoder-Decoder Sequence-to-Sequence Architectures
284(1)
6.7 Recursive Neural Networks
285(3)
7 Convolutional Neural Networks
288(8)
7.1 What is Convolution?
289(4)
7.2 Tricks to Improve the Performance
293(2)
7.3 Narrow vs Wide Convolution
295(1)
7.4 Stride Size
296(1)
7.5 Application of CNN as Input for RNN
296(1)
8 Memory
296(18)
8.1 Attention (ROM)
303(5)
8.2 Register Machines (RAM)
308(2)
8.3 Neural Pushdown Automata
310(4)
9 Summary
314(3)
References
314(3)
10 Deep Learning for Natural Language Processing
317(14)
Ying Xie
Linh Le
Yiyun Zhou
Vijay V. Raghavan
1 Introduction
317(1)
2 Survey of Deep Learning Techniques on NLP
318(2)
3 Sentence Embedding Based on SOM
320(3)
3.1 The Method
320(2)
3.2 Experiments
322(1)
4 Representing, Visualizing, and Processing Documents as Images
323(3)
5 Discussion and Conclusion
326(5)
References
327(1)
Further Reading
328(3)
Part C Applications and Linguistic Diversity
11 Information Retrieval: Concepts, Models, and Systems
331(72)
Venkat N. Cudivada
Dhana L. Rao
Amogh R. Cudivada
1 Introduction
331(3)
1.1
Chapter Organization
333(1)
2 A Reference Architecture for Current IR Systems
334(1)
3 Document Preprocessing
335(3)
3.1 Document Granularity
335(1)
3.2 Tokenization
336(1)
3.3 Stemming and Lemmatization
337(1)
3.4 Stop Words, Accents, Case Folding, and Language Identification
338(1)
4 Mini Gutenberg Text Corpus
338(4)
4.1 Distribution of Characters
340(1)
4.2 Unigrams, Bigrams, and Trigrams
340(1)
4.3 Zip's Law
341(1)
5 A Categorization of IR Models
342(4)
6 Boolean IR Model
346(4)
7 Positional Index, Phrase, and Proximity Queries
350(5)
7.1 Processing Boolean Queries Using the Positional Inverted Index
353(1)
7.2 Processing Phrase Queries Using the Positional Inverted Index
353(1)
7.3 Processing Proximity Queries Using the Positional Inverted Index
354(1)
7.4 Recovering Document Source Text Using the Positional Inverted Index
354(1)
8 Term Weighting
355(7)
8.1 Log Frequency Term Weighting
355(1)
8.2 TF-IDF Weighting
356(3)
8.3 Term Discrimination Value
359(1)
8.4 Document Length Normalization
359(2)
8.5 BM25 Term Weighting
361(1)
9 Vector Space IR Model
362(2)
10 Probabilistic IR Models
364(4)
10.1 Binary Independence Model
364(2)
10.2 Okapi BM25 Model
366(2)
11 Language Model-Based IR
368(6)
11.1 Statistical Language Modeling
369(1)
11.2 The Query Likelihood Model
370(4)
12 Evaluating IR Systems
374(8)
12.1 Precision and Recall for Unranked Retrieval
375(1)
12.2 The F-measure
376(1)
12.3 Retrieval Effectiveness for Ranked Retrieval
377(2)
12.4 Precision-Recall Graphs
379(1)
12.5 Reciprocal Rank
379(1)
12.6 Discounted Cumulative Gain
380(2)
12.7 Eliciting Relevance Judgments Using Pooling
382(1)
13 Relevance Feedback and Query Expansion
382(4)
13.1 Modifying Query Representation
382(1)
13.2 Modifying Document Representation
383(1)
13.3 Pseudo-Relevance Feedback
383(2)
13.4 Theoretical Optimal Query: Rocchio's Algorithm
385(1)
14 IR Libraries, Frameworks, and Test Collections
386(3)
14.1 Solr and Elasticsearch
386(1)
14.2 Lucene Image Retrieval
386(1)
14.3 Apache UIMA®
387(1)
14.4 Lemur and Wumpus
387(1)
14.5 NLP/IR Tools
387(1)
14.6 Test Collections
388(1)
14.7 TREC Collections
388(1)
15 Facets of IR Research
389(4)
15.1 Vocabulary, Faceted, and Exploratory Search
389(1)
15.2 Information Architecture
389(1)
15.3 Search Interfaces and User Modeling
390(1)
15.4 Personal Information Management
390(1)
15.5 Neural Network Approaches to IR
390(1)
15.6 Query Difficulty
390(1)
15.7 Information Extraction
391(1)
15.8 Text Simplification
391(1)
15.9 Machine Translation-Based Approaches to IR
391(1)
15.10 XML Retrieval
391(1)
15.11 Dynamic Information Retrieval
391(1)
15.12 Metasearch Engines
392(1)
15.13 Scholarly Collaboration Using Academic Social Web Platforms
392(1)
15.14 Access Control
392(1)
15.15 Multimedia and Cross-Language Retrieval
392(1)
15.16 Long-Range IR Challenges and Opportunities
392(1)
16 Additional Reading
393(10)
16.1 Earliest Books
393(1)
16.2 Recent Books
393(1)
16.3 Machine Learning and NLP
394(1)
16.4 Journals and Conferences
394(1)
Acknowledgments
395(1)
References
395(8)
12 Natural Language Core Tasks and Applications
403(26)
Venkat N. Cudivada
1 Introduction
403(2)
1.1
Chapter Organization
404(1)
2 Annotated Language Corpora
405(1)
3 Language Identification
405(1)
4 Text and Word Segmentation
406(1)
5 Word-Sense Disambiguation (WSD)
407(1)
6 Language Modeling
408(2)
7 PoS Tagging
410(3)
7.1 Generative and Noisy-Channel Models
411(1)
7.2 Multilayer Perceptron Neural Network Model for PoS
412(1)
8 Parsing
413(1)
9 Named Entity Recognition
414(1)
10 Machine Translation
415(2)
11 Information Extraction
417(1)
12 Text Summarization
417(1)
13 Question-Answering Systems
418(1)
14 Natural Language User Interfaces
419(2)
15 Summary
421(8)
Acknowledgments
421(1)
References
421(8)
13 Linguistic Elegance of the Languages of South India
429(34)
Deepamala Nijagunappa
1 Introduction
429(1)
2 History and Evolution of Dravidian Languages
430(6)
2.1 History of Indo-European Languages
430(1)
2.2 Brahmi Script
431(1)
2.3 Sanskrit Language
432(3)
2.4 Dravidian Languages
435(1)
3 Linguistic Elegance and Language Traditions of South Indian Languages
436(13)
3.1 Telugu
436(4)
3.2 Kannada
440(7)
3.3 Tamil
447(2)
3.4 Malayalam
449(1)
4 Classical Languages of India
449(8)
4.1 Classics in Sanskrit
452(1)
4.2 Classics in Telugu
453(1)
4.3 Classics in Kannada
454(1)
4.4 Classics in Tamil
455(1)
4.5 Classics in Malayalam
456(1)
4.6 Classics in Odiya
457(1)
5 Influence of Other Languages on South Indian Languages
457(4)
5.1 Propagation of Hindi in India
457(1)
5.2 English as a Medium of Communication
458(1)
5.3 Impact of Globalization
459(1)
5.4 Symbiosis Between English and South Indian Languages
460(1)
5.5 Promoting South Indian Languages
460(1)
6 Summary
461(2)
References
461(2)
14 Text Mining for Modeling Cyberattacks
463
Steven Noel
1 Introduction
463(2)
2 Anatomy of an Attack Pattern
465(3)
3 Applying Attack Patterns to Scenarios
468(10)
3.1 Resource Consumption Attacks
469(1)
3.2 Attacks for Introduction-Based Routing
470(8)
4 Mining Attack Pattern Text
478(21)
4.1 Vector-Space Attack Pattern Model
478(2)
4.2 Query Relevance Distance
480(2)
4.3 Attack Pattern Distances
482(2)
4.4 Attack Pattern Clustering
484(15)
5 Attack Chains
499(5)
6 Attack Pattern Hierarchies
504(6)
7 Analytic Environment
510(2)
8 Summary
512
References
513
C. R. Rao is a world famous statistician who earned a place in the history of statistics as one of those who developed statistics from its adhoc origins into a firmly grounded mathematical science.”

He was employed at the Indian Statistical Institute (ISI) in 1943 as a research scholar after obtaining an MA degree in mathematics with a first class and first rank from Andhra University in1941 and MA degree in statistics from Calcutta University in 1943 with a first class, first rank, gold medal and record marks which remain unbroken during the last 73 years.

At the age of 28 he was made a full professor at ISI in recognition of his creativity.” While at ISI, Rao went to Cambridge University (CU) in 1946 on an invitation to work on an anthropometric project using the methodology developed at ISI. Rao worked in the museum of archeology and anthropology in Duckworth laboratory of CU during 1946-1948 as a paid visiting scholar. The results were reported in the book Ancient Inhabitants of Jebel Moya” published by the Cambridge Press under the joint authorship of Rao and two anthropologists. On the basis of work done at CU during the two year period, 1946-1948, Rao earned a Ph.D. degree and a few years later Sc.D. degree of CU and the rare honor of life fellowship of Kings College, Cambridge.

He retired from ISI in 1980 at the mandatory age of 60 after working for 40 years during which period he developed ISI as an international center for statistical education and research. He also took an active part in establishing state statistical bureaus to collect local statistics and transmitting them to Central Statistical Organization in New Delhi. Rao played a pivitol role in launching undergraduate and postgraduate courses at ISI. He is the author of 475 research publications and several breakthrough papers contributing to statistical theory and methodology for applications to problems in all areas of human endeavor. There are a number of classical statistical terms named after him, the most popular of which are Cramer-Rao inequality, Rao-Blackwellization, Raos Orthogonal arrays used in quality control, Raos score test, Raos Quadratic Entropy used in ecological work, Raos metric and distance which are incorporated in most statistical books.

He is the author of 10 books, of which two important books are, Linear Statistical Inference which is translated into German, Russian, Czec, Polish and Japanese languages,and Statistics and Truth which is translated into, French, German, Japanese, Mainland Chinese, Taiwan Chinese, Turkish and Korean languages.

He directed the research work of 50 students for the Ph.D. degrees who in turn produced 500 Ph.D.s. Rao received 38 hon. Doctorate degree from universities in 19 countries spanning 6 continents. He received the highest awards in statistics in USA,UK and India: National Medal of Science awarded by the president of USA, Indian National Medal of Science awarded by the Prime Minister of India and the Guy Medal in Gold awarded by the Royal Statistical Society, UK. Rao was a recipient of the first batch of Bhatnagar awards in 1959 for mathematical sciences and and numerous medals in India and abroad from Science Academies. He is a Fellow of Royal Society (FRS),UK, and member of National Academy of Sciences, USA, Lithuania and Europe. In his honor a research Institute named as CRRAO ADVANCED INSTITUTE OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE was established in the campus of Hyderabad University.

Venkat N. Gudivada is a professor and chair of the Computer Science Department at East Carolina University. Prior to this, he was a professor and founding chair of the Weisberg Division of Computer Science at Marshall University. His industry tenure spans over six years as a vice president for Wall Street companies in the New York City area including Merrill Lynch (now Bank of America Merrill Lynch) and Financial Technologies International (now GoldenSource). Previous academic tenure includes work at the University of Michigan, University of Missouri, and Ohio University.He has published over 90 peer-reviewed technical articles and rendered professional service in various roles including conference program chair, keynote speaker, program committee member, and guest editor of IEEE journals. Gudivada's research sponsors include National Science Foundation (NSF), National Aeronautics and Space Administration (NASA), U.S. Department of Energy, U.S. Department of Navy, U.S. Army Research Office, MU Foundation, and WV Division of Science and Research. His current research interests encompass Big Data Management, High Performance Computing, Information Retrieval, Image and Natural Language Processing, and Personalized Learning. Gudivada received a PhD degree in Computer Science from the University of Louisiana at Lafayette.