Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis

3.50/5 (4 hinnangut Goodreads-ist)

Martin Weisser

Formaat: PDF+DRM
Ilmumisaeg: 02-Dec-2015
Kirjastus: Wiley-Blackwell
Keel: eng
ISBN-13: 9781118831915

Teised raamatud teemal:

linguistics

Formaat - PDF+DRM
Hind: 107,38 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 02-Dec-2015
Kirjastus: Wiley-Blackwell
Keel: eng
ISBN-13: 9781118831915

Teised raamatud teemal:

linguistics

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.

Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed
Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux
Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools
Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data

Arvustused

"This textbook makes Practical Corpus Linguistics accessible to everyone. The focus on methodological and technical aspects and the instructive dimension of the book nothing is considered obvious or already known make it very useful to any corpus linguist aiming at a better understanding of his/her data...Through the various exercises, it is very easy to test ones comprehension and the reader gradually gains confidence. The educational, sometimes entertaining tone as well as the glossary also contribute to gradually enhance the readers learning capacities in a field in which many feel insecure...It should accompany scholars at the beginning of any research to raise awareness about technical issues that are too often overlooked..." - Robert A. Cote for The LINGUIST List, December 2016

List of Figures

xiii

List of Tables

Acknowledgements

xvii

1 Introduction

(12)

1.1 Linguistic Data Analysis

(5)

1.1.1 What's data?

(1)

1.1.2 Forms of data

(4)

1.1.3 Collecting and analysing data

(1)

1.2 Outline of the Book

(2)

1.3 Conventions Used in this Book

(1)

1.4 A Note for Teachers

(1)

1.5 Online Resources

(2)

2 What's Out There?

(16)

2.1 What's a Corpus?

(1)

2.2 Corpus Formats

(2)

2.3 Synchronic vs. Diachronic Corpora

(6)

2.3.1 'Early' synchronic corpora

(3)

2.3.2 Mixed corpora

(2)

2.3.3 Examples of diachronic corpora

(1)

2.4 General vs. Specific Corpora

(4)

2.4.1 Examples of specific corpora

(3)

2.5 Static Versus Dynamic Corpora

(1)

2.6 Other Sources for Corpora

(1)

Solutions to/Comments on the Exercises

(2)

Note

(1)

Sources and Further Reading

(1)

3 Understanding Corpus Design

(14)

3.1 Food for Thought - General Issues in Corpus Design

(4)

3.1.1 Sampling

(1)

3.1.2 Size

(1)

3.1.3 Balance and representativeness

(1)

3.1.4 Legal issues

(1)

3.2 What's in a Text? - Understanding Document Structure

(5)

3.2.1 Headers, 'footers' and meta-data

(2)

3.2.2 The structure of the (text) body

(1)

3.2.3 What's (in) an electronic text? - understanding file formats and their properties

(1)

3.3 Understanding Encoding: Character Sets, File Size, etc.

(3)

3.3.1 ASCII and legacy encodings

(1)

3.3.2 Unicode

(1)

3.3.3 File sizes

(1)

Solutions to/Comments on the Exercises

(1)

Sources and Further Reading

(1)

4 Finding and Preparing Your Data

(24)

4.1 Finding Suitable Materials for Analysis

(2)

4.1.1 Retrieving data from text archives

(1)

4.1.2 Obtaining materials from Project Gutenberg

(1)

4.1.3 Obtaining materials from the Oxford Text Archive

(1)

4.2 Collecting Written Materials Yourself ('Web as Corpus')

(7)

4.2.1 A brief note on plain-text editors

(2)

4.2.2 Browser text export

(1)

4.2.3 Browser HTML export

(1)

4.2.4 Getting web data using ICEweb

(2)

4.2.5 Downloading other types of files

(1)

4.3 Collecting Spoken Data

(3)

4.4 Preparing Written Data for Analysis

(6)

4.4.1 'Cleaning up' your data

(2)

4.4.2 Extracting text from proprietary document formats

(1)

4.4.3 Removing unnecessary header and Tooter' information

(1)

4.4.4 Documenting what you've collected

(1)

4.4.5 Preparing your data for distribution or archiving

(2)

Solutions to/Comments on the Exercises

(4)

Sources and Further Reading

(1)

5 Concordancing

(15)

5.1 What's Concordancing?

(2)

5.2 Concordancing with AntConc

(9)

5.2.1 Sorting results

(1)

5.2.2 Saving, pruning and reusing your results

(3)

Solutions to/Comments on the Exercises

(3)

Sources and Further Reading

(1)

6 Regular Expressions

(19)

6.1 Character Classes

(2)

6.2 Negative Character Classes

(1)

6.3 Quantification

(1)

6.4 Anchoring, Grouping and Alternation

(5)

6.4.1 Anchoring

(1)

6.4.2 Grouping and alternation

(2)

6.4.3 Quoting and using special characters

(1)

6.4.4 Constraining the context further

(1)

6.5 Further Exercises

(1)

Solutions to/Comments on the Exercises

(7)

Sources and Further Reading

100

(1)

7 Understanding Part-of-Speech Tagging and Its Uses

101

(20)

7.1 A Brief Introduction to (Morpho-Syntactic) Tagsets

103

(6)

7.2 Tagging Your Own Data

109

(4)

Solutions to/Comments on the Exercises

113

(7)

Sources and Further Reading

120

(1)

8 Using Online Interfaces to Query Mega Corpora

121

(25)

8.1 Searching the BNC with BNCweb

122

(10)

8.1.1 What is BNCweb?

122

(1)

8.1.2 Basic standard queries

123

(1)

8.1.3 Navigating through and exploring search results

124

(2)

8.1.4 More advanced standard query options

126

(1)

8.1.5 Wildcards

126

(2)

8.1.6 Word and phrase alternation

128

(1)

8.1.7 Restricting searches through PoS tags

129

(2)

8.1.8 Headword and lemma queries

131

(1)

8.2 Exploring COCA through the BYU Web-Interface

132

(5)

8.2.1 The basic syntax

133

(2)

8.2.2 Comparing corpora in the BYU interface

135

(2)

Solutions to/Comments on the Exercises

137

(8)

Sources and Further Reading

145

(1)

9 Basic Frequency Analysis - or What Can (Single) Words Tell Us About Texts?

146

(47)

9.1 Understanding Basic Units in Texts

146

(5)

9.1.1 What's a word? and

147

(2)

9.1.2 Types tokens

149

(2)

9.2 Word (Frequency) Lists in AntConc

151

(9)

9.2.1 Stop words - good or bad?

156

(2)

9.2.2 Defining and using stop words in AntConc

158

(2)

9.3 Word Lists in BNCweb

160

(9)

9.3.1 Standard options

160

(2)

9.3.2 Investigating subcorpora

162

(7)

9.3.3 Keyword lists

169

(1)

9.4 Keyword Lists in AntConc and BNCweb

169

(6)

9.4.1 Keyword lists in AntConc

169

(3)

9.4.2 Keyword lists in BNCweb

172

(3)

9.5 Comparing and Reporting Frequency Counts

175

(3)

9.6 Investigating Genre-Specific Distributions in COCA

178

(1)

Solutions to/Comments on the Exercises

179

(13)

Sources and Further Reading

192

(1)

10 Exploring Words in Context

193

(34)

10.1 Understanding Extended Units of Text

194

(1)

10.2 Text Segmentation

195

(1)

10.3 N-Grams, Word Clusters and Lexical Bundles

196

(2)

10.4 Exploring (Relatively) Fixed Sequences in BNCweb

198

(1)

10.5 Simple, Sequential Collocations and Colligations

198

(4)

10.5.1 'Simple' collocations

198

(2)

10.5.2 Colligations

200

(1)

10.5.3 Contextually constrained and proximity searches

201

(1)

10.6 Exploring Colligations in COCA

202

(3)

10.7 N-grams and Clusters in AntConc

205

(2)

10.8 Investigating Collocations Based on Statistical Measures in AntConc, BNCweb and COCA

207

(5)

10.8.1 Calculating collocations

207

(2)

10.8.2 Computing collocations in AntConc

209

(1)

10.8.3 Computing collocations in BNCweb

210

(1)

10.8.4 Computing collocations in COCA

211

(1)

Solutions to/Comments on the Exercises

212

(14)

Sources and Further Reading

226

(1)

11 Understanding Markup and Annotation

227

(27)

11.1 From SGML to XML - A Brief Timeline

229

(1)

11.2 XML for Linguistics

230

(6)

11.2.1 Why bother?

230

(1)

11.2.2 What does markup/annotation look like?

230

(2)

11.2.3 The 'history' and development of (linguistic) markup

232

(2)

11.2.4 XML and style sheets

234

(2)

11.3 'Simple XML' for Linguistic Annotation

236

(4)

11.4 Colour Coding and Visualisation

240

(6)

11.5 More Complex Forms of Annotation

246

(2)

Solutions to/Comments on the Exercises

248

(5)

Sources and Further Reading

253

(1)

12 Conclusion and Further Perspectives

254

(5)

Appendix A: The CLAWS C5 Tagset

259

(2)

Appendix B: The Annotated Dialogue File

261

(8)

Appendix C: The CSS Style Sheet

269

(2)

Glossary

271

(6)

References

277

(6)

Index

283

Martin Weisser is a Professor in the National Key Research Center for Linguistics and Applied Linguistics at Guangdong University of Foreign Studies, China . He is the author of Essential Programming for Linguistics (2009), and has published numerous articles and book chapters, including contributions to The Encyclopedia of Applied Linguistics (Wiley, 2012) and Corpus Pragmatics: A Handbook (2014).

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97811188319152e.html

Märksõnad:

Corpora Linguistics

E-raamat: Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv