|
Part I Natural Language Processing Core-Technologies |
|
|
|
1 Linguistic Introduction: The Orthography, Morphology and Syntax of Semitic Languages |
|
|
3 | (40) |
|
|
|
|
|
|
|
3 | (2) |
|
|
5 | (8) |
|
|
6 | (1) |
|
1.2.2 Derivational Morphology |
|
|
7 | (2) |
|
1.2.3 Inflectional Morphology |
|
|
9 | (2) |
|
1.2.4 Basic Syntactic Structure |
|
|
11 | (2) |
|
|
13 | (6) |
|
|
14 | (1) |
|
|
15 | (3) |
|
1.3.3 Basic Syntactic Structure |
|
|
18 | (1) |
|
|
19 | (7) |
|
|
20 | (2) |
|
1.4.2 Derivational Morphology |
|
|
22 | (1) |
|
1.4.3 Inflectional Morphology |
|
|
23 | (2) |
|
1.4.4 Morphological Ambiguity |
|
|
25 | (1) |
|
1.4.5 Basic Syntactic Structure |
|
|
25 | (1) |
|
|
26 | (6) |
|
|
26 | (1) |
|
1.5.2 Derivational Morphology |
|
|
27 | (2) |
|
1.5.3 Inflectional Morphology |
|
|
29 | (1) |
|
1.5.4 Basic Syntactic Structure |
|
|
30 | (2) |
|
|
32 | (2) |
|
|
32 | (1) |
|
1.6.2 Derivational Morphology |
|
|
33 | (1) |
|
1.6.3 Inflectional Morphology |
|
|
33 | (1) |
|
|
34 | (1) |
|
|
34 | (4) |
|
|
34 | (1) |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
37 | (1) |
|
|
37 | (1) |
|
|
38 | (5) |
|
|
38 | (5) |
|
2 Morphological Processing of Semitic Languages |
|
|
43 | (24) |
|
|
|
43 | (1) |
|
|
44 | (1) |
|
2.3 The Challenges of Morphological Processing |
|
|
45 | (2) |
|
2.4 Computational Approaches to Morphology |
|
|
47 | (4) |
|
2.4.1 Two-Level Morphology |
|
|
48 | (1) |
|
2.4.2 Multi-tape Automata |
|
|
48 | (1) |
|
|
49 | (1) |
|
2.4.4 Registered Automata |
|
|
50 | (1) |
|
2.4.5 Analysis by Generation |
|
|
50 | (1) |
|
2.4.6 Functions) Morphology |
|
|
51 | (1) |
|
2.5 Morphological Analysis and Generation of Semitic Languages |
|
|
51 | (5) |
|
|
52 | (1) |
|
|
52 | (2) |
|
|
54 | (1) |
|
|
55 | (1) |
|
2.5.5 Related Applications |
|
|
55 | (1) |
|
2.6 Morphological Disambiguation of Semitic Languages |
|
|
56 | (2) |
|
|
58 | (9) |
|
|
58 | (9) |
|
3 Syntax and Parsing of Semitic Languages |
|
|
67 | (62) |
|
|
|
67 | (17) |
|
|
69 | (5) |
|
|
74 | (6) |
|
3.1.3 The Main Challenges |
|
|
80 | (4) |
|
3.1.4 Summary and Conclusion |
|
|
84 | (1) |
|
3.2 Case Study: Generative Probabilistic Parsing |
|
|
84 | (33) |
|
3.2.1 Formal Preliminaries |
|
|
85 | (6) |
|
3.2.2 An Architecture for Parsing Semitic Languages |
|
|
91 | (8) |
|
3.2.3 The Syntactic Model |
|
|
99 | (14) |
|
|
113 | (4) |
|
|
117 | (6) |
|
3.3.1 Parsing Modern Standard Arabic |
|
|
117 | (3) |
|
3.3.2 Parsing Modern Hebrew |
|
|
120 | (3) |
|
3.4 Conclusion and Future Work |
|
|
123 | (6) |
|
|
124 | (5) |
|
4 Semantic Processing of Semitic Languages |
|
|
129 | (32) |
|
|
|
|
129 | (1) |
|
4.2 Fundamentals of Semitic Language Meaning Units |
|
|
130 | (5) |
|
4.2.1 Morpho-Semantics: A Primer |
|
|
130 | (5) |
|
4.3 Meaning, Semantic Distance, Paraphrasing and Lexicon Generation |
|
|
135 | (4) |
|
|
136 | (2) |
|
|
138 | (1) |
|
|
138 | (1) |
|
4.4 Word Sense Disambiguation and Meaning Induction |
|
|
139 | (3) |
|
4.4.1 WSD Approaches in Semitic Languages |
|
|
140 | (1) |
|
4.4.2 WSI in Semitic Languages |
|
|
141 | (1) |
|
4.5 Multiword Expression Detection and Classification |
|
|
142 | (3) |
|
4.5.1 Approaches to Semitic MWE Processing and Resources |
|
|
143 | (2) |
|
4.6 Predicate--Argument Analysis |
|
|
145 | (7) |
|
4.6.1 Arabic Annotated Resources |
|
|
146 | (2) |
|
4.6.2 Systems for Semantic Role Labeling |
|
|
148 | (4) |
|
|
152 | (9) |
|
|
152 | (9) |
|
|
161 | (38) |
|
|
|
161 | (1) |
|
5.2 Evaluating Language Models with Perplexity |
|
|
162 | (2) |
|
5.3 N-Gram Language Modeling |
|
|
164 | (2) |
|
5.4 Smoothing: Discounting, Backoff, and Interpolation |
|
|
166 | (4) |
|
|
166 | (2) |
|
5.4.2 Combining Discounting with Backoff |
|
|
168 | (1) |
|
|
168 | (2) |
|
5.5 Extensions to N-Gram Language Modeling |
|
|
170 | (17) |
|
5.5.1 Skip N-Grams and Flex Grams |
|
|
170 | (1) |
|
5.5.2 Variable-Length Language Models |
|
|
171 | (2) |
|
5.5.3 Class-Based Language Models |
|
|
173 | (1) |
|
5.5.4 Factored Language Models |
|
|
174 | (1) |
|
5.5.5 Neural Network Language Models |
|
|
175 | (2) |
|
5.5.6 Syntactic or Structured Language Models |
|
|
177 | (1) |
|
5.5.7 Tree-Based Language Models |
|
|
178 | (1) |
|
5.5.8 Maximum-Entropy Language Models |
|
|
178 | (2) |
|
5.5.9 Discriminative Language Models |
|
|
180 | (3) |
|
5.5.10 LSA Language Models |
|
|
183 | (1) |
|
5.5.11 Bayesian Language Models |
|
|
184 | (3) |
|
5.6 Modeling Semitic Languages |
|
|
187 | (6) |
|
|
188 | (1) |
|
|
189 | (2) |
|
|
191 | (1) |
|
|
191 | (1) |
|
|
192 | (1) |
|
5.6.6 Other Morphologically Rich Languages |
|
|
192 | (1) |
|
|
193 | (6) |
|
|
193 | (6) |
|
Part II Natural Language Processing Applications |
|
|
|
6 Statistical Machine Translation |
|
|
199 | (22) |
|
|
|
|
199 | (1) |
|
6.2 Machine Translation Approaches |
|
|
200 | (4) |
|
6.2.1 Machine Translation Paradigms |
|
|
200 | (2) |
|
6.2.2 Rule-Based Machine Translation |
|
|
202 | (1) |
|
6.2.3 Example-Based Machine Translation |
|
|
202 | (1) |
|
6.2.4 Statistical Machine Translation |
|
|
203 | (1) |
|
6.2.5 Machine Translation for Semitic Languages |
|
|
203 | (1) |
|
6.3 Overview of Statistical Machine Translation |
|
|
204 | (5) |
|
6.3.1 Word-Based Translation Models |
|
|
204 | (1) |
|
|
205 | (1) |
|
6.3.3 Phrase Extraction Techniques |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
207 | (1) |
|
|
208 | (1) |
|
6.4 Machine Translation Evaluation Metrics |
|
|
209 | (1) |
|
6.5 Machine Translation for Semitic Languages |
|
|
210 | (3) |
|
|
210 | (1) |
|
6.5.2 Word Alignment and Reordering |
|
|
211 | (1) |
|
6.5.3 Gender-Number Agreement |
|
|
212 | (1) |
|
6.6 Building Phrase-Based SMT Systems |
|
|
213 | (1) |
|
|
213 | (1) |
|
|
213 | (1) |
|
|
214 | (1) |
|
6.7 SMT Software Resources |
|
|
214 | (1) |
|
6.7.1 SMT Moses Framework |
|
|
214 | (1) |
|
6.7.2 Language Modeling Toolkits |
|
|
214 | (1) |
|
6.7.3 Morphological Analysis |
|
|
215 | (1) |
|
6.8 Building a Phrase-Based SMT System: Step-by-Step Guide |
|
|
215 | (3) |
|
6.8.1 Machine Preparation |
|
|
215 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
217 | (1) |
|
|
217 | (1) |
|
|
217 | (1) |
|
|
218 | (1) |
|
|
218 | (3) |
|
|
218 | (3) |
|
7 Named Entity Recognition |
|
|
221 | (26) |
|
|
|
221 | (1) |
|
7.2 The Named Entity Recognition Task |
|
|
222 | (8) |
|
|
222 | (1) |
|
7.2.2 Challenges in Named Entity Recognition |
|
|
223 | (1) |
|
7.2.3 Rule-Based Named Entity Recognition |
|
|
224 | (1) |
|
7.2.4 Statistical Named Entity Recognition |
|
|
225 | (3) |
|
|
228 | (1) |
|
7.2.6 Evaluation and Shared Tasks |
|
|
228 | (1) |
|
7.2.7 Evaluation Campaigns |
|
|
229 | (1) |
|
7.2.8 Beyond Traditional Named Entity Recognition |
|
|
230 | (1) |
|
7.3 Named Entity Recognition for Semitic Languages |
|
|
230 | (3) |
|
7.3.1 Challenges in Semitic Named Entity Recognition |
|
|
231 | (1) |
|
7.3.2 Approaches to Semitic Named Entity Recognition |
|
|
232 | (1) |
|
|
233 | (3) |
|
7.4.1 Learning Algorithms |
|
|
234 | (1) |
|
|
234 | (1) |
|
|
235 | (1) |
|
|
236 | (3) |
|
7.5.1 Named Entity Translation and Transliteration |
|
|
236 | (2) |
|
7.5.2 Entity Detection and Tracking |
|
|
238 | (1) |
|
|
238 | (1) |
|
7.6 Labeled Named Entity Recognition Corpora |
|
|
239 | (1) |
|
7.7 Future Challenges and Opportunities |
|
|
240 | (1) |
|
|
241 | (6) |
|
|
241 | (6) |
|
|
247 | (32) |
|
|
|
8.1 Introduction: Anaphora and Anaphora Resolution |
|
|
247 | (1) |
|
|
248 | (1) |
|
8.2.1 Pronominal Anaphora |
|
|
248 | (1) |
|
|
249 | (1) |
|
8.2.3 Comparative Anaphora |
|
|
249 | (1) |
|
8.3 Determinants in Anaphora Resolution |
|
|
249 | (7) |
|
8.3.1 Eliminating Factors |
|
|
250 | (1) |
|
8.3.2 Preferential Factors |
|
|
251 | (1) |
|
8.3.3 Implementing Features in AR (Anaphora Resolution) Systems |
|
|
252 | (4) |
|
8.4 The Process of Anaphora Resolution |
|
|
256 | (1) |
|
8.5 Different Approaches to Anaphora Resolution |
|
|
257 | (5) |
|
8.5.1 Knowledge-Intensive Versus Knowledge-Poor Approaches |
|
|
257 | (2) |
|
8.5.2 Traditional Approach |
|
|
259 | (1) |
|
8.5.3 Statistical Approach |
|
|
259 | (1) |
|
8.5.4 Linguistic Approach to Anaphora Resolution |
|
|
260 | (2) |
|
8.6 Recent Work in Anaphora and Coreference Resolution |
|
|
262 | (3) |
|
8.6.1 Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree [ 24] |
|
|
262 | (1) |
|
8.6.2 A Twin-Candidate Model for Learning-Based Anaphora Resolution [ 47, 48] |
|
|
263 | (1) |
|
8.6.3 Improving Machine Learning Approaches to Coreference Resolution [ 36] |
|
|
264 | (1) |
|
8.7 Evaluation of Anaphora Resolution Systems |
|
|
265 | (4) |
|
|
265 | (2) |
|
|
267 | (1) |
|
|
267 | (1) |
|
|
268 | (1) |
|
|
269 | (1) |
|
8.8 Anaphora in Semitic Languages |
|
|
269 | (3) |
|
8.8.1 Anaphora Resolution in Arabic |
|
|
270 | (2) |
|
8.9 Difficulties with AR in Semitic Languages |
|
|
272 | (2) |
|
8.9.1 The Morphology of the Language |
|
|
272 | (1) |
|
8.9.2 Complex Sentence Structure |
|
|
273 | (1) |
|
|
273 | (1) |
|
8.9.4 The Lack of Corpora Annotated with Anaphoric Links |
|
|
273 | (1) |
|
|
274 | (5) |
|
|
274 | (5) |
|
|
279 | (20) |
|
|
|
|
279 | (1) |
|
|
280 | (1) |
|
9.3 Approaches to Relation Extraction |
|
|
281 | (10) |
|
9.3.1 Feature-Based Classifiers |
|
|
281 | (4) |
|
9.3.2 Kernel-Based Methods |
|
|
285 | (3) |
|
9.3.3 Semi-supervised and Adaptive Learning |
|
|
288 | (3) |
|
9.4 Language-Specific Issues |
|
|
291 | (1) |
|
|
292 | (2) |
|
|
294 | (1) |
|
|
295 | (4) |
|
|
295 | (4) |
|
|
299 | (36) |
|
|
|
299 | (1) |
|
10.2 The Information Retrieval Task |
|
|
299 | (10) |
|
|
301 | (1) |
|
10.2.2 The General Architecture of an IR System |
|
|
302 | (1) |
|
|
303 | (2) |
|
|
305 | (4) |
|
10.3 Semitic Language Retrieval |
|
|
309 | (9) |
|
10.3.1 The Major Known Challenges |
|
|
309 | (4) |
|
10.3.2 Survey of Existing Literature |
|
|
313 | (3) |
|
10.3.3 Best Arabic Index Terms |
|
|
316 | (2) |
|
10.3.4 Best Hebrew Index Terms |
|
|
318 | (1) |
|
10.3.5 Best Amharic Index Terms |
|
|
318 | (1) |
|
10.4 Available IR Test Collections |
|
|
318 | (1) |
|
|
318 | (1) |
|
|
319 | (1) |
|
|
319 | (1) |
|
|
319 | (10) |
|
10.5.1 Arabic--English CLIR |
|
|
320 | (2) |
|
10.5.2 Arabic OCR Text Retrieval |
|
|
322 | (4) |
|
10.5.3 Arabic Social Search |
|
|
326 | (2) |
|
|
328 | (1) |
|
|
329 | (6) |
|
|
329 | (6) |
|
|
335 | (36) |
|
|
|
|
|
|
|
|
335 | (1) |
|
11.2 The Question Answering Task |
|
|
336 | (8) |
|
|
336 | (2) |
|
11.2.2 The Major Known Challenges |
|
|
338 | (1) |
|
11.2.3 The General Architecture of a QA System |
|
|
339 | (2) |
|
11.2.4 Answering Definition Questions and Query Expansion Techniques |
|
|
341 | (2) |
|
11.2.5 How to Benchmark QA System Performance: Evaluation Measure for QA |
|
|
343 | (1) |
|
11.3 The Case of Semitic Languages |
|
|
344 | (3) |
|
11.3.1 NLP for Semitic Languages |
|
|
344 | (1) |
|
11.3.2 QA for Semitic Languages |
|
|
345 | (2) |
|
11.4 Building Arabic QA Specific Modules |
|
|
347 | (19) |
|
11.4.1 Answering Definition Questions in Arabic |
|
|
347 | (6) |
|
11.4.2 Query Expansion for Arabic QA |
|
|
353 | (13) |
|
|
366 | (5) |
|
|
367 | (4) |
|
12 Automatic Summarization |
|
|
371 | (38) |
|
|
|
|
|
|
|
|
371 | (1) |
|
12.2 Text Summarization Aspects |
|
|
372 | (4) |
|
12.2.1 Types of Summaries |
|
|
374 | (1) |
|
12.2.2 Extraction vs. Abstraction |
|
|
375 | (1) |
|
12.2.3 The Major Known Challenges |
|
|
376 | (1) |
|
12.3 How to Evaluate Summarization Systems |
|
|
376 | (2) |
|
12.3.1 Insights from the Evaluation Campaigns |
|
|
377 | (1) |
|
12.3.2 Evaluation Measures for Summarization |
|
|
377 | (1) |
|
12.4 Single Document Summarization Approaches |
|
|
378 | (2) |
|
12.4.1 Numerical Approach |
|
|
379 | (1) |
|
|
379 | (1) |
|
|
380 | (1) |
|
12.5 Multiple Document Summarization Approaches |
|
|
380 | (5) |
|
12.5.1 Numerical Approach |
|
|
381 | (1) |
|
|
382 | (1) |
|
|
383 | (2) |
|
12.6 Case of Semitic Languages |
|
|
385 | (4) |
|
12.6.1 Language-Independent Systems |
|
|
385 | (1) |
|
|
386 | (2) |
|
|
388 | (1) |
|
|
388 | (1) |
|
|
389 | (1) |
|
12.7 Case Study: Building an Arabic Summarization System (L.A.E) |
|
|
389 | (13) |
|
12.7.1 L.A.E System Architecture |
|
|
390 | (1) |
|
12.7.2 Source Text Segmentation |
|
|
390 | (10) |
|
|
400 | (1) |
|
12.7.4 Evaluation and Discussion |
|
|
401 | (1) |
|
|
402 | (7) |
|
|
403 | (6) |
|
13 Automatic Speech Recognition |
|
|
409 | |
|
|
|
|
|
|
|
|
|
409 | (4) |
|
13.1.1 Automatic Speech Recognition |
|
|
410 | (1) |
|
13.1.2 Introduction to Arabic: A Speech Recognition Perspective |
|
|
411 | (1) |
|
|
412 | (1) |
|
|
413 | (15) |
|
13.2.1 Language-Independent Techniques |
|
|
413 | (5) |
|
|
418 | (5) |
|
13.2.3 Modeling of Arabic Dialects in Decision Trees |
|
|
423 | (5) |
|
|
428 | (6) |
|
13.3.1 Language-Independent Techniques for Language Modeling |
|
|
428 | (4) |
|
13.3.2 Language-Specific Techniques for Language Modeling |
|
|
432 | (2) |
|
13.4 IBM GALE 2011 System Description |
|
|
434 | (9) |
|
|
434 | (5) |
|
|
439 | (2) |
|
13.4.3 System Combination |
|
|
441 | (1) |
|
13.4.4 System Architecture |
|
|
441 | (2) |
|
13.5 From MSA to Dialects |
|
|
443 | (10) |
|
13.5.1 Dialect Identification |
|
|
443 | (3) |
|
13.5.2 ASR and Dialect ID Data Selection |
|
|
446 | (1) |
|
13.5.3 Dialect Identification on GALE Data |
|
|
447 | (1) |
|
13.5.4 Acoustic Modeling Experiments |
|
|
448 | (4) |
|
13.5.5 Dialect ID Based on Text Only |
|
|
452 | (1) |
|
|
453 | (2) |
|
13.6.1 Acoustic Training Data |
|
|
453 | (1) |
|
13.6.2 Training Data for Language Modeling |
|
|
454 | (1) |
|
13.6.3 Vowelization Resources |
|
|
454 | (1) |
|
13.7 Comparing Arabic and Hebrew ASR |
|
|
455 | (1) |
|
|
456 | |
|
|
457 | |