Muutke küpsiste eelistusi

Multiword Expressions Acquisition: A Generic and Open Framework 2015 ed. [Kõva köide]

  • Formaat: Hardback, 230 pages, kõrgus x laius: 235x155 mm, kaal: 4912 g, 17 Illustrations, black and white; XIV, 230 p. 17 illus., 1 Hardback
  • Sari: Theory and Applications of Natural Language Processing
  • Ilmumisaeg: 08-Oct-2014
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319092065
  • ISBN-13: 9783319092065
  • Kõva köide
  • Hind: 95,02 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 111,79 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 230 pages, kõrgus x laius: 235x155 mm, kaal: 4912 g, 17 Illustrations, black and white; XIV, 230 p. 17 illus., 1 Hardback
  • Sari: Theory and Applications of Natural Language Processing
  • Ilmumisaeg: 08-Oct-2014
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319092065
  • ISBN-13: 9783319092065

?This book is an excellent introduction to multiword expressions. It provides a unique, comprehensive and up-to-date overview of this exciting topic in computational linguistics. The first part describes the diversity and richness of multiword expressions, including many examples in several languages. These constructions are not only complex and arbitrary, but also much more frequent than one would guess, making them a real nightmare for natural language processing applications.

The second part introduces a new generic framework for automatic acquisition of multiword expressions from texts. Furthermore, it describes the accompanying free software tool, the mwetoolkit, which comes in handy when looking for expressions in texts (regardless of the language). Evaluation is greatly emphasized, underlining the fact that results depend on parameters like corpus size, language, MWE type, etc. The last part contains solid experimental results and evaluates the mwetoolkit, demonstrating its usefulness for computer-assisted lexicography and machine translation.

This is the first book to cover the whole pipeline of multiword expression acquisition in a single volume. It is addresses the needs of students and researchers in computational and theoretical linguistics, cognitive sciences, artificial intelligence and computer science. Its good balance between computational and linguistic views make it the perfect starting point for anyone interested in multiword expressions, language and text processing in general.

Arvustused

The motivating idea behind this work is to explore and compare approaches to MWE, involving various tools as well as human resources. Much information is given to enable other researchers to investigate MWEs. The book contains a vast amount of information. An extensive bibliography follows each chapter. There are helpful appendices, including a list of standard part of speech tags. (Alice Davison, Computing Reviews, September, 2015)

1 Introduction
1(22)
1.1 Motivations
1(8)
1.1.1 What Are Multiword Expressions?
1(3)
1.1.2 Why Do They Matter?
4(3)
1.1.3 What Happens If We Ignore Them?
7(2)
1.2 A New Framework for MWE Treatment
9(5)
1.2.1 Hypotheses
9(1)
1.2.2 Goals
10(1)
1.2.3 Guiding Principles
11(3)
1.3
Chapters Outline
14(2)
1.4 Summary
16(7)
References
17(6)
Part I Multiword Expressions: A Tough Nut to Crack
2 Definitions and Characteristics
23(30)
2.1 A Brief History
23(5)
2.1.1 Theoretical Linguistics
24(2)
2.1.2 Computational Linguistics
26(2)
2.2 Defining MWEs
28(6)
2.2.1 What Is a Word?
28(1)
2.2.2 What Is a MWE?
29(2)
2.2.3 A Note on Terminology
31(3)
2.3 Characteristics and Characterisations
34(11)
2.3.1 The Compositionality Continuum
34(2)
2.3.2 Derived MWE Properties
36(3)
2.3.3 Existing MWE Typologies
39(2)
2.3.4 A Simplified Typology
41(4)
2.4 A Snapshot of the Research Field
45(1)
2.5 Summary
46(7)
References
47(6)
3 State of the Art in MWE Processing
53(52)
3.1 Elementary Notions
53(17)
3.1.1 Linguistic Processing: Analysis
54(3)
3.1.2 Word Frequency Distributions
57(3)
3.1.3 N-Grams, Language Models and Suffix Arrays
60(3)
3.1.4 Lexical Association Measures
63(7)
3.2 Methods for Automatic MWE Acquisition
70(10)
3.2.1 Monolingual Methods
71(3)
3.2.2 Bi- and Multilingual Methods
74(2)
3.2.3 Existing Tools
76(4)
3.3 Other Tasks Related to MWE Processing
80(11)
3.3.1 Interpretation
80(3)
3.3.2 Disambiguation
83(1)
3.3.3 Representation
84(2)
3.3.4 Applications
86(5)
3.4 Summary
91(14)
References
93(12)
Part II MWE Acquisition
4 Evaluation of MWE Acquisition
105(22)
4.1 Evaluation Context
106(8)
4.1.1 Evaluation Axes
106(3)
4.1.2 Evaluation Measures
109(2)
4.1.3 Annotation
111(3)
4.2 Acquisition Contexts
114(5)
4.2.1 Characteristics of Target Constructions
115(1)
4.2.2 Characteristics of Corpora
116(3)
4.2.3 Existing Resources
119(1)
4.3 Discussion
119(2)
4.4 Summary
121(6)
References
122(5)
5 A New Framework for MWE Acquisition
127(32)
5.1 The mwetoolkit Framework
127(14)
5.1.1 General Architecture
128(2)
5.1.2 Modules
130(8)
5.1.3 Discussion
138(3)
5.2 A Toy Experiment
141(4)
5.2.1 Candidate Extraction
141(1)
5.2.2 Candidate Filtering
142(2)
5.2.3 Results
144(1)
5.3 Comparison with Related Approaches
145(7)
5.3.1 Related Approaches
145(1)
5.3.2 Comparison Setup
146(1)
5.3.3 Results
147(5)
5.4 Summary
152(7)
References
154(5)
Part III Applications
6 Application 1: Lexicography
159(22)
6.1 A Dictionary of Nominal Compounds in Greek
159(7)
6.1.1 Greek Nominal Compounds
160(2)
6.1.2 Automatic Acquisition Setup
162(1)
6.1.3 Results
163(3)
6.2 A Dictionary of Complex Predicates in Portuguese
166(10)
6.2.1 Portuguese Complex Predicates
167(2)
6.2.2 Automatic Acquisition Setup
169(2)
6.2.3 Results
171(5)
6.3 Summary
176(5)
References
178(3)
7 Application 2: Machine Translation
181(20)
7.1 A Brief Introduction to SMT
183(3)
7.2 Evaluation of Phrasal Verb Translation
186(11)
7.2.1 English Phrasal Verbs
187(2)
7.2.2 Translation Setup
189(3)
7.2.3 Results
192(5)
7.3 Summary
197(4)
References
197(4)
8 Conclusions
201(6)
References
204(3)
A Extended List of Translation Examples
207(2)
B Resources Used in the Experiments
209(2)
B.1 Data
209(1)
B.1.1 Monolingual Corpora
209(1)
B.1.2 Multilingual Corpora
209(1)
B.2 Software
210(1)
B.2.1 Analysis Tools
210(1)
C The mwetoolkit: Documentation
211(12)
C.1 Design Choices
211(1)
C.2 Installing the mwetoolkit
212(1)
C.2.1 Windows
212(1)
C.2.2 Linux and Mac OS
212(1)
C.2.3 Mac OS Dependencies
213(1)
C.2.4 Testing Your Installation
213(1)
C.3 Getting Started
213(3)
C.3.1 An Example
214(2)
C.4 Defining Patterns for Extraction
216(3)
C.4.1 Literal Matches
216(1)
C.4.2 Repetitions and Optional Elements
216(1)
C.4.3 Ignoring Parts of the Match
217(1)
C.4.4 Backpatterns
218(1)
C.4.5 Syntactic Patterns
218(1)
C.5 Preprocessing a Corpus Using TreeTagger
219(1)
C.5.1 Installing TreeTagger
219(1)
C.5.2 Converting TreeTagger's Output to XML
219(1)
C.6 Preprocessing a Corpus Using RASP
220(1)
C.6.1 Installing RASP
220(1)
C.6.2 Converting RASP'S Output to XML
220(1)
C.7 Examples of XML Files
220(1)
C.8 Developers
221(2)
D Tagsets for POS and Syntax
223(6)
D.1 Generic POS Tagset
223(1)
D.2 RASP English POS Tagset
223(3)
D.3 RASP English Grammatical Relations
226(1)
D.4 TreeTagger English POS Tagset
227(2)
E Detailed Lexicon Descriptions
229
E.1 Sentiment Verbs Extracted from Brazilian WordNet
229(1)
E.2 Sentiment Nouns
230
Carlos Ramisch is a researcher and lecturer at the Aix-Marseille University (France). He holds a double PhD in computer science from Grenoble University (France) and UFRGS (Brazil). His research interests are multiword expressions, semantics and multilingualism. Carlos coordinated many events, including the MWE workshops (2010, 2011, 2013) and the ACM TSLP special issue. He is the creator and developer of the mwetoolkit.