Collaboratively Constructed Language Resources (CCLRs) such as Wikipedia, Wiktionary, Linked Open Data, and various resources developed using crowdsourcing techniques such as Games with a Purpose and Mechanical Turk have substantially contributed to the research in natural language processing (NLP). Various NLP tasks utilize such resources to substitute for or supplement conventional lexical semantic resources and linguistically annotated corpora. These resources also provide an extensive body of texts from which valuable knowledge is mined. There are an increasing number of community efforts to link and maintain multiple linguistic resources.
This book aims offers comprehensive coverage of CCLR-related topics, including their construction, utilization in NLP tasks, and interlinkage and management. Various Bachelor/Master/Ph.D. programs in natural language processing, computational linguistics, and knowledge discovery can use this book both as the main text and as a supplementary reading. The book also provides a valuable reference guide for researchers and professionals for the above topics.
This book examines the challenges to the research field brought about by the emergence of collaboratively constructed language resources (CCLR), such as Wikipedia and Wiktionary. It offers a comprehensive overview of the significant work in CCLRs and NLP.
Part I Approaches to Collaboratively Constructed Language Resources.-
1.Using Games to Create Language Resources: Successes and Limitations of the
Approach. J.Chamberlain, K.Fort, U.Kruschwitz, M.Lafourcade and M.Poesio.-
2.Senso Comune: A Collaborative Knowledge Resource for Italian. Al.Oltramari,
G.Vetere, I.Chiari, E.Jezek, F.M.Zanzotto, M.Nissim, and A.Gangemi.-
3.Building Multilingual Language Resources in Web Localisation: A
Crowdsourcing Approach. A.Wasala, R.Schäler, J.Buckley, R.Weerasinghe and
C.Exton. 4.Reciprocal Enrichment Between Basque Wikipedia and Machine
Translation.- I.Alegria, U.Cabezon, U.Fernandez de Betoño, G.Labaka, A.Mayor,
K.Sarasola and A.Zubiaga.- Part II Mining Knowledge From and Using
Collaboratively Constructed Language Resources.- 5.A Survey of NLP Methods
and Resources for Analyzing the Collaborative Writing Process in Wikipedia.
O.Ferschke, J.Daxenberger and I.Gurevych.- 6.ConceptNet 5: A Large Semantic
Network for Relational Knowledge. R.Speer and C.Havasi.- 7.An Overview of
BabelNet and its API for Multilingual Language Processing. R.Navigli and
S.P.Ponzetto.- 8.Hierarchical Organization of Collaboratively Constructed
Content. J.Yu, Z-J.Zha, and T-S.Chua.- 9.Word Sense Disambiguation using
Wikipedia. B.Dandala, R.Mihalcea, and R.Bunescu.- Part III Interconnecting
and Managing Collaboratively Constructed Language Resources.- 10.An Open
Linguistic Infrastructure for Annotated Corpora. N.Ide.- 11.TowardsWeb-Scale
Collaborative Knowledge Extraction. S.Hellmann, S. Auer.- 12.Building a
Linked Open Data Cloud of Linguistic Resources: Motivations and Developments.
C.Chiarcos, S.Moran, P.N.Mendes, S.Nordhoff, R.Littauer.- 13.Community
Efforts around the ISOcat Data Category Registry. S.E.Wright, M.Windhouwer,
I.Schuurman, M.Kemps-Snijders.- Index.
Iryna Gurevych leads the UKP Lab in the Department of Computer Science of the Technische Universität Darmstadt (UKP-TUDA) and at the Institute for Educational Research and Educational Information (UKP-DIPF) in Frankfurt, Germany. She holds an endowed Lichtenberg-Chair "Ubiquitous Knowledge Processing" of the Volkswagen Foundation. Her research in NLP primarily concerns applied lexical semantic algorithms, such as computing semantic relatedness of words or paraphrase recognition, and their use to enhance the performance of NLP tasks, such as information retrieval, question answering, or summarization.
Jungi Kim is a postdoctoral researcher at UKP Lab in the Department of Computer Science of the Technische Universität Darmstadt, Germany (UKP-TUDA). His primary research interests are in semantic resources, algorithms, and evaluations for multilingual natural language processing. His previous research includes multilingual sentiment analysis, statistical machine translation, and various NLP topics involving multiple languages, especially East Asian languages.