Muutke küpsiste eelistusi

E-raamat: Collaborative Annotation for Reliable Natural Language Processing - Technical and Sociological Aspects: Technical and Sociological Aspects [Wiley Online]

  • Formaat: 192 pages
  • Ilmumisaeg: 07-Jun-2016
  • Kirjastus: ISTE Ltd and John Wiley & Sons Inc
  • ISBN-10: 1119306698
  • ISBN-13: 9781119306696
  • Wiley Online
  • Hind: 174,45 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Formaat: 192 pages
  • Ilmumisaeg: 07-Jun-2016
  • Kirjastus: ISTE Ltd and John Wiley & Sons Inc
  • ISBN-10: 1119306698
  • ISBN-13: 9781119306696

This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP).  NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems.

These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential.

Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject.

Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation.

 

Preface ix
List of Acronyms
xi
Introduction xiii
Chapter 1 Annotating Collaboratively
1(76)
1.1 The annotation process (re)visited
1(23)
1.1.1 Building consensus
1(2)
1.1.2 Existing methodologies
3(4)
1.1.3 Preparatory work
7(6)
1.1.4 Pre-campaign
13(4)
1.1.5 Annotation
17(4)
1.1.6 Finalization
21(3)
1.2 Annotation complexity
24(19)
1.2.1 Example overview
25(3)
1.2.2 What to annotate?
28(2)
1.2.3 How to annotate?
30(6)
1.2.4 The weight of the context
36(2)
1.2.5 Visualization
38(2)
1.2.6 Elementary annotation tasks
40(3)
1.3 Annotation tools
43(12)
1.3.1 To be or not to be an annotation tool
43(3)
1.3.2 Much more than prototypes
46(3)
1.3.3 Addressing the new annotation challenges
49(5)
1.3.4 The impossible dream tool
54(1)
1.4 Evaluating the annotation quality
55(20)
1.4.1 What is annotation quality?
55(1)
1.4.2 Understanding the basics
56(7)
1.4.3 Beyond kappas
63(4)
1.4.4 Giving meaning to the metrics
67(8)
1.5 Conclusion
75(2)
Chapter 2 Crowdsourcing Annotation
77(38)
2.1 What is crowdsourcing and why should we be interested in it?
77(4)
2.1.1 A moving target
77(3)
2.1.2 A massive success
80(1)
2.2 Deconstructing the myths
81(12)
2.2.1 Crowdsourcing is a recent phenomenon
81(2)
2.2.2 Crowdsourcing involves a crowd (of non-experts)
83(4)
2.2.3 "Crowdsourcing involves (a crowd of) non-experts"
87(6)
2.3 Playing with a purpose
93(8)
2.3.1 Using the players' innate capabilities and world knowledge
94(2)
2.3.2 Using the players' school knowledge
96(1)
2.3.3 Using the players' learning capacities
97(4)
2.4 Acknowledging crowdsourcing specifics
101(8)
2.4.1 Motivating the participants
101(6)
2.4.2 Producing quality data
107(2)
2.5 Ethical issues
109(6)
2.5.1 Game ethics
109(2)
2.5.2 What's wrong with Amazon Mechanical Turk?
111(2)
2.5.3 A charter to rule them all
113(2)
Conclusion 115(2)
Appendix 117(24)
Glossary 141(2)
Bibliography 143(20)
Index 163
Karën Fort is Associate Professor at University Paris-Sorbonne (Paris 4) working on the STIH (meaning, text, computer science, history) team. Her current research interests include collaborative manual annotation, crowdsourcing and ethics.