Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Natural Language Processing of Semitic Languages 2014 ed. [Kõva köide]

4.00/5 (8 hinnangut Goodreads-ist)

Edited by Imed Zitouni

Formaat: Hardback, 459 pages, kõrgus x laius: 235x155 mm, kaal: 8454 g, 23 Illustrations, color; 38 Illustrations, black and white; XXIV, 459 p. 61 illus., 23 illus. in color., 1 Hardback
Sari: Theory and Applications of Natural Language Processing
Ilmumisaeg: 12-May-2014
Kirjastus: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
ISBN-10: 3642453570
ISBN-13: 9783642453571

Teised raamatud teemal:

Natural language & machine translation
Computational linguistics - (Hetkel poes: 1 nimetust)

Kõva köide
Hind: 104,29 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 122,69 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 459 pages, kõrgus x laius: 235x155 mm, kaal: 8454 g, 23 Illustrations, color; 38 Illustrations, black and white; XXIV, 459 p. 61 illus., 23 illus. in color., 1 Hardback
Sari: Theory and Applications of Natural Language Processing
Ilmumisaeg: 12-May-2014
Kirjastus: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
ISBN-10: 3642453570
ISBN-13: 9783642453571

Teised raamatud teemal:

Natural language & machine translation
Computational linguistics - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9783642453571.html

Märksõnad:

Research in Natural Language Processing (NLP) has rapidly advanced in recent years, resulting in exciting algorithms for sophisticated processing of text and speech in various languages. Much of this work focuses on English; in this book we address another group of interesting and challenging languages for NLP research: the Semitic languages. The Semitic group of languages includes Arabic (206 million native speakers), Amharic (27 million), Hebrew (7 million), Tigrinya (6.7 million), Syriac (1 million) and Maltese (419 thousand). Semitic languages exhibit unique morphological processes, challenging syntactic constructions and various other phenomena that are less prevalent in other natural languages. These challenges call for unique solutions, many of which are described in this book.

The 13 chapters presented in this book bring together leading scientists from several universities and research institutes worldwide. While this book devotes some attention to cutting-edge algorithms and techniques, its primary purpose is a thorough explication of best practices in the field. Furthermore, every chapter describes how the techniques discussed apply to Semitic languages. The book covers both statistical approaches to NLP, which are dominant across various applications nowadays and the more traditional, rule-based approaches, that were proven useful for several other application domains. We hope that this book will provide a "one-stop-shop'' for all the requisite background and practical advice when building NLP applications for Semitic languages.

Part I Natural Language Processing Core-Technologies

1 Linguistic Introduction: The Orthography, Morphology and Syntax of Semitic Languages

(40)

Ray Fabri

Michael Gasser

Nizar Habash

George Kiraz

Shuly Wintner

1.1 Introduction

(2)

1.2 Amharic

(8)

1.2.1 Orthography

(1)

1.2.2 Derivational Morphology

(2)

1.2.3 Inflectional Morphology

(2)

1.2.4 Basic Syntactic Structure

(2)

1.3 Arabic

(6)

1.3.1 Orthography

(1)

1.3.2 Morphology

(3)

1.3.3 Basic Syntactic Structure

(1)

1.4 Hebrew

(7)

1.4.1 Orthography

(2)

1.4.2 Derivational Morphology

(1)

1.4.3 Inflectional Morphology

(2)

1.4.4 Morphological Ambiguity

(1)

1.4.5 Basic Syntactic Structure

(1)

1.5 Maltese

(6)

1.5.1 Orthography

(1)

1.5.2 Derivational Morphology

(2)

1.5.3 Inflectional Morphology

(1)

1.5.4 Basic Syntactic Structure

(2)

1.6 Syriac

(2)

1.6.1 Orthography

(1)

1.6.2 Derivational Morphology

(1)

1.6.3 Inflectional Morphology

(1)

1.6.4 Syntax

(1)

1.7 Contrastive Analysis

(4)

1.7.1 Orthography

(1)

1.7.2 Phonology

(1)

1.7.3 Morphology

(1)

1.7.4 Syntax

(1)

1.7.5 Lexicon

(1)

1.8 Conclusion

(5)

References

(5)

2 Morphological Processing of Semitic Languages

(24)

Shuly Wintner

2.1 Introduction

(1)

2.2 Basic Notions

(1)

2.3 The Challenges of Morphological Processing

(2)

2.4 Computational Approaches to Morphology

(4)

2.4.1 Two-Level Morphology

(1)

2.4.2 Multi-tape Automata

(1)

2.4.3 The Xerox Approach

(1)

2.4.4 Registered Automata

(1)

2.4.5 Analysis by Generation

(1)

2.4.6 Functions) Morphology

(1)

2.5 Morphological Analysis and Generation of Semitic Languages

(5)

2.5.1 Amharic

(1)

2.5.2 Arabic

(2)

2.5.3 Hebrew

(1)

2.5.4 Other Languages

(1)

2.5.5 Related Applications

(1)

2.6 Morphological Disambiguation of Semitic Languages

(2)

2.7 Future Directions

(9)

References

(9)

3 Syntax and Parsing of Semitic Languages

(62)

Reut Tsarfaty

3.1 Introduction

(17)

3.1.1 Parsing Systems

(5)

3.1.2 Semitic Languages

(6)

3.1.3 The Main Challenges

(4)

3.1.4 Summary and Conclusion

(1)

3.2 Case Study: Generative Probabilistic Parsing

(33)

3.2.1 Formal Preliminaries

(6)

3.2.2 An Architecture for Parsing Semitic Languages

(8)

3.2.3 The Syntactic Model

(14)

3.2.4 The Lexical Model

113

(4)

3.3 Empirical Results

117

(6)

3.3.1 Parsing Modern Standard Arabic

117

(3)

3.3.2 Parsing Modern Hebrew

120

(3)

3.4 Conclusion and Future Work

123

(6)

References

124

(5)

4 Semantic Processing of Semitic Languages

129

(32)

Mona Diab

Yuval Marton

4.1 Introduction

129

(1)

4.2 Fundamentals of Semitic Language Meaning Units

130

(5)

4.2.1 Morpho-Semantics: A Primer

130

(5)

4.3 Meaning, Semantic Distance, Paraphrasing and Lexicon Generation

135

(4)

4.3.1 Semantic Distance

136

(2)

4.3.2 Textual Entailment

138

(1)

4.3.3 Lexicon Creation

138

(1)

4.4 Word Sense Disambiguation and Meaning Induction

139

(3)

4.4.1 WSD Approaches in Semitic Languages

140

(1)

4.4.2 WSI in Semitic Languages

141

(1)

4.5 Multiword Expression Detection and Classification

142

(3)

4.5.1 Approaches to Semitic MWE Processing and Resources

143

(2)

4.6 Predicate--Argument Analysis

145

(7)

4.6.1 Arabic Annotated Resources

146

(2)

4.6.2 Systems for Semantic Role Labeling

148

(4)

4.7 Conclusion

152

(9)

References

152

(9)

5 Language Modeling

161

(38)

Ilana Heintz

5.1 Introduction

161

(1)

5.2 Evaluating Language Models with Perplexity

162

(2)

5.3 N-Gram Language Modeling

164

(2)

5.4 Smoothing: Discounting, Backoff, and Interpolation

166

(4)

5.4.1 Discounting

166

(2)

5.4.2 Combining Discounting with Backoff

168

(1)

5.4.3 Interpolation

168

(2)

5.5 Extensions to N-Gram Language Modeling

170

(17)

5.5.1 Skip N-Grams and Flex Grams

170

(1)

5.5.2 Variable-Length Language Models

171

(2)

5.5.3 Class-Based Language Models

173

(1)

5.5.4 Factored Language Models

174

(1)

5.5.5 Neural Network Language Models

175

(2)

5.5.6 Syntactic or Structured Language Models

177

(1)

5.5.7 Tree-Based Language Models

178

(1)

5.5.8 Maximum-Entropy Language Models

178

(2)

5.5.9 Discriminative Language Models

180

(3)

5.5.10 LSA Language Models

183

(1)

5.5.11 Bayesian Language Models

184

(3)

5.6 Modeling Semitic Languages

187

(6)

5.6.1 Arabic

188

(1)

5.6.2 Amharic

189

(2)

5.6.3 Hebrew

191

(1)

5.6.4 Maltese

191

(1)

5.6.5 Syriac

192

(1)

5.6.6 Other Morphologically Rich Languages

192

(1)

5.7 Summary

193

(6)

References

193

(6)

Part II Natural Language Processing Applications

6 Statistical Machine Translation

199

(22)

Hany Hassan

Kareem Darwish

6.1 Introduction

199

(1)

6.2 Machine Translation Approaches

200

(4)

6.2.1 Machine Translation Paradigms

200

(2)

6.2.2 Rule-Based Machine Translation

202

(1)

6.2.3 Example-Based Machine Translation

202

(1)

6.2.4 Statistical Machine Translation

203

(1)

6.2.5 Machine Translation for Semitic Languages

203

(1)

6.3 Overview of Statistical Machine Translation

204

(5)

6.3.1 Word-Based Translation Models

204

(1)

6.3.2 Phrase-Based SMT

205

(1)

6.3.3 Phrase Extraction Techniques

206

(1)

6.3.4 SMT Reordering

207

(1)

6.3.5 Language Modeling

207

(1)

6.3.6 SMT Decoding

208

(1)

6.4 Machine Translation Evaluation Metrics

209

(1)

6.5 Machine Translation for Semitic Languages

210

(3)

6.5.1 Word Segmentation

210

(1)

6.5.2 Word Alignment and Reordering

211

(1)

6.5.3 Gender-Number Agreement

212

(1)

6.6 Building Phrase-Based SMT Systems

213

(1)

6.6.1 Data

213

(1)

6.6.2 Parallel Data

213

(1)

6.6.3 Monolingual Data

214

(1)

6.7 SMT Software Resources

214

(1)

6.7.1 SMT Moses Framework

214

(1)

6.7.2 Language Modeling Toolkits

214

(1)

6.7.3 Morphological Analysis

215

(1)

6.8 Building a Phrase-Based SMT System: Step-by-Step Guide

215

(3)

6.8.1 Machine Preparation

215

(1)

6.8.2 Data

216

(1)

6.8.3 Data Preprocessing

216

(1)

6.8.4 Words Segmentation

216

(1)

6.8.5 Language Model

217

(1)

6.8.6 Translation Model

217

(1)

6.8.7 Parameter Tuning

217

(1)

6.8.8 System Decoding

218

(1)

6.9 Summary

218

(3)

References

218

(3)

7 Named Entity Recognition

221

(26)

Behrang Mohit

7.1 Introduction

221

(1)

7.2 The Named Entity Recognition Task

222

(8)

7.2.1 Definition

222

(1)

7.2.2 Challenges in Named Entity Recognition

223

(1)

7.2.3 Rule-Based Named Entity Recognition

224

(1)

7.2.4 Statistical Named Entity Recognition

225

(3)

7.2.5 Hybrid Systems

228

(1)

7.2.6 Evaluation and Shared Tasks

228

(1)

7.2.7 Evaluation Campaigns

229

(1)

7.2.8 Beyond Traditional Named Entity Recognition

230

(1)

7.3 Named Entity Recognition for Semitic Languages

230

(3)

7.3.1 Challenges in Semitic Named Entity Recognition

231

(1)

7.3.2 Approaches to Semitic Named Entity Recognition

232

(1)

7.4 Case Studies

233

(3)

7.4.1 Learning Algorithms

234

(1)

7.4.2 Features

234

(1)

7.4.3 Experiments

235

(1)

7.5 Relevant Problems

236

(3)

7.5.1 Named Entity Translation and Transliteration

236

(2)

7.5.2 Entity Detection and Tracking

238

(1)

7.5.3 Projection

238

(1)

7.6 Labeled Named Entity Recognition Corpora

239

(1)

7.7 Future Challenges and Opportunities

240

(1)

7.8 Summary

241

(6)

References

241

(6)

8 Anaphora Resolution

247

(32)

Khadiga Mahmoud Seddik

Ali Farghaly

8.1 Introduction: Anaphora and Anaphora Resolution

247

(1)

8.2 Types of Anaphora

248

(1)

8.2.1 Pronominal Anaphora

248

(1)

8.2.2 Lexical Anaphora

249

(1)

8.2.3 Comparative Anaphora

249

(1)

8.3 Determinants in Anaphora Resolution

249

(7)

8.3.1 Eliminating Factors

250

(1)

8.3.2 Preferential Factors

251

(1)

8.3.3 Implementing Features in AR (Anaphora Resolution) Systems

252

(4)

8.4 The Process of Anaphora Resolution

256

(1)

8.5 Different Approaches to Anaphora Resolution

257

(5)

8.5.1 Knowledge-Intensive Versus Knowledge-Poor Approaches

257

(2)

8.5.2 Traditional Approach

259

(1)

8.5.3 Statistical Approach

259

(1)

8.5.4 Linguistic Approach to Anaphora Resolution

260

(2)

8.6 Recent Work in Anaphora and Coreference Resolution

262

(3)

8.6.1 Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree [ 24]

262

(1)

8.6.2 A Twin-Candidate Model for Learning-Based Anaphora Resolution [ 47, 48]

263

(1)

8.6.3 Improving Machine Learning Approaches to Coreference Resolution [ 36]

264

(1)

8.7 Evaluation of Anaphora Resolution Systems

265

(4)

8.7.1 MUC [ 45]

265

(2)

8.7.2 B-Cube [ 2]

267

(1)

8.7.3 ACE (NIST 2003)

267

(1)

8.7.4 CEAF [ 23]

268

(1)

8.7.5 BLANC [ 40]

269

(1)

8.8 Anaphora in Semitic Languages

269

(3)

8.8.1 Anaphora Resolution in Arabic

270

(2)

8.9 Difficulties with AR in Semitic Languages

272

(2)

8.9.1 The Morphology of the Language

272

(1)

8.9.2 Complex Sentence Structure

273

(1)

8.9.3 Hidden Antecedents

273

(1)

8.9.4 The Lack of Corpora Annotated with Anaphoric Links

273

(1)

8.10 Summary

274

(5)

References

274

(5)

9 Relation Extraction

279

(20)

Vittorio Castelli

Imed Zitouni

9.1 Introduction

279

(1)

9.2 Relations

280

(1)

9.3 Approaches to Relation Extraction

281

(10)

9.3.1 Feature-Based Classifiers

281

(4)

9.3.2 Kernel-Based Methods

285

(3)

9.3.3 Semi-supervised and Adaptive Learning

288

(3)

9.4 Language-Specific Issues

291

(1)

9.5 Data

292

(2)

9.6 Results

294

(1)

9.7 Summary

295

(4)

References

295

(4)

10 Information Retrieval

299

(36)

Kareem Darwish

10.1 Introduction

299

(1)

10.2 The Information Retrieval Task

299

(10)

10.2.1 Task Definition

301

(1)

10.2.2 The General Architecture of an IR System

302

(1)

10.2.3 Retrieval Models

303

(2)

10.2.4 IR Evaluation

305

(4)

10.3 Semitic Language Retrieval

309

(9)

10.3.1 The Major Known Challenges

309

(4)

10.3.2 Survey of Existing Literature

313

(3)

10.3.3 Best Arabic Index Terms

316

(2)

10.3.4 Best Hebrew Index Terms

318

(1)

10.3.5 Best Amharic Index Terms

318

(1)

10.4 Available IR Test Collections

318

(1)

10.4.1 Arabic

318

(1)

10.4.2 Hebrew

319

(1)

10.4.3 Amharic

319

(1)

10.5 Domain-Specific IR

319

(10)

10.5.1 Arabic--English CLIR

320

(2)

10.5.2 Arabic OCR Text Retrieval

322

(4)

10.5.3 Arabic Social Search

326

(2)

10.5.4 Arabic Web Search

328

(1)

10.6 Summary

329

(6)

References

329

(6)

11 Question Answering

335

(36)

Yassine Benajiba

Paolo Rosso

Lahsen Abouenour

Omar Trigui

Karim Bouzoubaa

Lamia Belguith

11.1 Introduction

335

(1)

11.2 The Question Answering Task

336

(8)

11.2.1 Task Definition

336

(2)

11.2.2 The Major Known Challenges

338

(1)

11.2.3 The General Architecture of a QA System

339

(2)

11.2.4 Answering Definition Questions and Query Expansion Techniques

341

(2)

11.2.5 How to Benchmark QA System Performance: Evaluation Measure for QA

343

(1)

11.3 The Case of Semitic Languages

344

(3)

11.3.1 NLP for Semitic Languages

344

(1)

11.3.2 QA for Semitic Languages

345

(2)

11.4 Building Arabic QA Specific Modules

347

(19)

11.4.1 Answering Definition Questions in Arabic

347

(6)

11.4.2 Query Expansion for Arabic QA

353

(13)

11.5 Summary

366

(5)

References

367

(4)

12 Automatic Summarization

371

(38)

Lamla Hadrich Belguith

Mariem Ellouze

Mohamed Hedi Maaloul

Maher Jaoua

Fatma Kallel Jaoua

Philippe Blache

12.1 Introduction

371

(1)

12.2 Text Summarization Aspects

372

(4)

12.2.1 Types of Summaries

374

(1)

12.2.2 Extraction vs. Abstraction

375

(1)

12.2.3 The Major Known Challenges

376

(1)

12.3 How to Evaluate Summarization Systems

376

(2)

12.3.1 Insights from the Evaluation Campaigns

377

(1)

12.3.2 Evaluation Measures for Summarization

377

(1)

12.4 Single Document Summarization Approaches

378

(2)

12.4.1 Numerical Approach

379

(1)

12.4.2 Symbolic Approach

379

(1)

12.4.3 Hybrid Approach

380

(1)

12.5 Multiple Document Summarization Approaches

380

(5)

12.5.1 Numerical Approach

381

(1)

12.5.2 Symbolic Approach

382

(1)

12.5.3 Hybrid Approach

383

(2)

12.6 Case of Semitic Languages

385

(4)

12.6.1 Language-Independent Systems

385

(1)

12.6.2 Arabic Systems

386

(2)

12.6.3 Hebrew Systems

388

(1)

12.6.4 Maltese Systems

388

(1)

12.6.5 Amharic Systems

389

(1)

12.7 Case Study: Building an Arabic Summarization System (L.A.E)

389

(13)

12.7.1 L.A.E System Architecture

390

(1)

12.7.2 Source Text Segmentation

390

(10)

12.7.3 Interface

400

(1)

12.7.4 Evaluation and Discussion

401

(1)

12.8 Summary

402

(7)

References

403

(6)

13 Automatic Speech Recognition

409

Hagen Soltau

George Saon

Lidia Mangu

Hong-Kwang Kuo

Brian Kingsbury

Stephen Chu

Fadi Biadsy

13.1 Introduction

409

(4)

13.1.1 Automatic Speech Recognition

410

(1)

13.1.2 Introduction to Arabic: A Speech Recognition Perspective

411

(1)

13.1.3 Overview

412

(1)

13.2 Acoustic Modeling

413

(15)

13.2.1 Language-Independent Techniques

413

(5)

13.2.2 Vowelization

418

(5)

13.2.3 Modeling of Arabic Dialects in Decision Trees

423

(5)

13.3 Language Modeling

428

(6)

13.3.1 Language-Independent Techniques for Language Modeling

428

(4)

13.3.2 Language-Specific Techniques for Language Modeling

432

(2)

13.4 IBM GALE 2011 System Description

434

(9)

13.4.1 Acoustic Models

434

(5)

13.4.2 Language Models

439

(2)

13.4.3 System Combination

441

(1)

13.4.4 System Architecture

441

(2)

13.5 From MSA to Dialects

443

(10)

13.5.1 Dialect Identification

443

(3)

13.5.2 ASR and Dialect ID Data Selection

446

(1)

13.5.3 Dialect Identification on GALE Data

447

(1)

13.5.4 Acoustic Modeling Experiments

448

(4)

13.5.5 Dialect ID Based on Text Only

452

(1)

13.6 Resources

453

(2)

13.6.1 Acoustic Training Data

453

(1)

13.6.2 Training Data for Language Modeling

454

(1)

13.6.3 Vowelization Resources

454

(1)

13.7 Comparing Arabic and Hebrew ASR

455

(1)

13.8 Summary

456

References

457

Dr. Imed Zitouni is a Principal Researcher at Microsoft leading the Relevance Measurement Sciences group. Imed received his M.Sc. and Ph.D. with the highest-honors from the University-of-Nancy1 France.

In 1995, he obtained a MEng degree in computer science from ENSI in Tunisia. He is a senior member of IEEE, served as a member of the IEEE Speech and Language Processing Technical Committee (99-11), the Information Officer of the ACL SIG on Semitic-Languages, associate editor of TALIP ACM journal and a member of ISCA and ACL. Imed served as chair and reviewing-committee-member of several conferences and journals and he is the author/co-author of more than 100 patents and papers in international conferences and journals. Imeds research interest is in the area of Multilingual Natural Language Processing (NLP), including Information Retrieval, Information Extraction, Language modeling, etc. Imed has particular interest in advancing state of the art technology in the area of Semitic NLP, especially Arabic.

Imeds current research interest is in the area of Multilingual Information Retrieval focusing on the use of statistics and machine learning techniques to develop web scale offline and online metrics. He also working on the use of NLP to add a layer of semantics and understanding to search engines. Prior to joining Microsoft, Imed was a Senior Researcher at IBM for almost a decade, where he led several Multilingual NLP projects, including Arabic NLP, informatics extraction, semantic role labeling, language modeling, machine translation and speech recognition. Prior to IBM, Imed was a researcher at Bell Laboratories, Lucent Technologies, for almost half dozen years working on language modeling, speech recognition, spoken dialog systems and speech understanding. Imed also experiment the startup experience at DIALOCA in Paris, France, working on e-mail steering and language modeling and served as temporary assistant professor at the University ofNancy 1, France.

Natural Language Processing of Semitic Languages 2014 ed. [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv