Muutke küpsiste eelistusi

E-raamat: Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis

  • Formaat: PDF+DRM
  • Ilmumisaeg: 02-Dec-2015
  • Kirjastus: Wiley-Blackwell
  • Keel: eng
  • ISBN-13: 9781118831915
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 107,38 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 02-Dec-2015
  • Kirjastus: Wiley-Blackwell
  • Keel: eng
  • ISBN-13: 9781118831915
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.
  • Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed
  • Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux
  • Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools
  • Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data

Arvustused

"This textbook makes Practical Corpus Linguistics accessible to everyone. The focus on methodological and technical aspects and the instructive dimension of the book nothing is considered obvious or already known make it very useful to any corpus linguist aiming at a better understanding of his/her data...Through the various exercises, it is very easy to test ones comprehension and the reader gradually gains confidence. The educational, sometimes entertaining tone as well as the glossary also contribute to gradually enhance the readers learning capacities in a field in which many feel insecure...It should accompany scholars at the beginning of any research to raise awareness about technical issues that are too often overlooked..." - Robert A. Cote for The LINGUIST List, December 2016

List of Figures xiii
List of Tables xv
Acknowledgements xvii
1 Introduction 1(12)
1.1 Linguistic Data Analysis
3(5)
1.1.1 What's data?
3(1)
1.1.2 Forms of data
3(4)
1.1.3 Collecting and analysing data
7(1)
1.2 Outline of the Book
8(2)
1.3 Conventions Used in this Book
10(1)
1.4 A Note for Teachers
11(1)
1.5 Online Resources
11(2)
2 What's Out There? 13(16)
2.1 What's a Corpus?
13(1)
2.2 Corpus Formats
13(2)
2.3 Synchronic vs. Diachronic Corpora
15(6)
2.3.1 'Early' synchronic corpora
15(3)
2.3.2 Mixed corpora
18(2)
2.3.3 Examples of diachronic corpora
20(1)
2.4 General vs. Specific Corpora
21(4)
2.4.1 Examples of specific corpora
22(3)
2.5 Static Versus Dynamic Corpora
25(1)
2.6 Other Sources for Corpora
26(1)
Solutions to/Comments on the Exercises
26(2)
Note
28(1)
Sources and Further Reading
28(1)
3 Understanding Corpus Design 29(14)
3.1 Food for Thought - General Issues in Corpus Design
29(4)
3.1.1 Sampling
30(1)
3.1.2 Size
31(1)
3.1.3 Balance and representativeness
32(1)
3.1.4 Legal issues
32(1)
3.2 What's in a Text? - Understanding Document Structure
33(5)
3.2.1 Headers, 'footers' and meta-data
34(2)
3.2.2 The structure of the (text) body
36(1)
3.2.3 What's (in) an electronic text? - understanding file formats and their properties
37(1)
3.3 Understanding Encoding: Character Sets, File Size, etc.
38(3)
3.3.1 ASCII and legacy encodings
38(1)
3.3.2 Unicode
39(1)
3.3.3 File sizes
40(1)
Solutions to/Comments on the Exercises
41(1)
Sources and Further Reading
42(1)
4 Finding and Preparing Your Data 43(24)
4.1 Finding Suitable Materials for Analysis
44(2)
4.1.1 Retrieving data from text archives
44(1)
4.1.2 Obtaining materials from Project Gutenberg
44(1)
4.1.3 Obtaining materials from the Oxford Text Archive
45(1)
4.2 Collecting Written Materials Yourself ('Web as Corpus')
46(7)
4.2.1 A brief note on plain-text editors
46(2)
4.2.2 Browser text export
48(1)
4.2.3 Browser HTML export
49(1)
4.2.4 Getting web data using ICEweb
50(2)
4.2.5 Downloading other types of files
52(1)
4.3 Collecting Spoken Data
53(3)
4.4 Preparing Written Data for Analysis
56(6)
4.4.1 'Cleaning up' your data
56(2)
4.4.2 Extracting text from proprietary document formats
58(1)
4.4.3 Removing unnecessary header and Tooter' information
58(1)
4.4.4 Documenting what you've collected
59(1)
4.4.5 Preparing your data for distribution or archiving
60(2)
Solutions to/Comments on the Exercises
62(4)
Sources and Further Reading
66(1)
5 Concordancing 67(15)
5.1 What's Concordancing?
67(2)
5.2 Concordancing with AntConc
69(9)
5.2.1 Sorting results
74(1)
5.2.2 Saving, pruning and reusing your results
75(3)
Solutions to/Comments on the Exercises
78(3)
Sources and Further Reading
81(1)
6 Regular Expressions 82(19)
6.1 Character Classes
84(2)
6.2 Negative Character Classes
86(1)
6.3 Quantification
86(1)
6.4 Anchoring, Grouping and Alternation
87(5)
6.4.1 Anchoring
87(1)
6.4.2 Grouping and alternation
88(2)
6.4.3 Quoting and using special characters
90(1)
6.4.4 Constraining the context further
91(1)
6.5 Further Exercises
92(1)
Solutions to/Comments on the Exercises
93(7)
Sources and Further Reading
100(1)
7 Understanding Part-of-Speech Tagging and Its Uses 101(20)
7.1 A Brief Introduction to (Morpho-Syntactic) Tagsets
103(6)
7.2 Tagging Your Own Data
109(4)
Solutions to/Comments on the Exercises
113(7)
Sources and Further Reading
120(1)
8 Using Online Interfaces to Query Mega Corpora 121(25)
8.1 Searching the BNC with BNCweb
122(10)
8.1.1 What is BNCweb?
122(1)
8.1.2 Basic standard queries
123(1)
8.1.3 Navigating through and exploring search results
124(2)
8.1.4 More advanced standard query options
126(1)
8.1.5 Wildcards
126(2)
8.1.6 Word and phrase alternation
128(1)
8.1.7 Restricting searches through PoS tags
129(2)
8.1.8 Headword and lemma queries
131(1)
8.2 Exploring COCA through the BYU Web-Interface
132(5)
8.2.1 The basic syntax
133(2)
8.2.2 Comparing corpora in the BYU interface
135(2)
Solutions to/Comments on the Exercises
137(8)
Sources and Further Reading
145(1)
9 Basic Frequency Analysis - or What Can (Single) Words Tell Us About Texts? 146(47)
9.1 Understanding Basic Units in Texts
146(5)
9.1.1 What's a word? and
147(2)
9.1.2 Types tokens
149(2)
9.2 Word (Frequency) Lists in AntConc
151(9)
9.2.1 Stop words - good or bad?
156(2)
9.2.2 Defining and using stop words in AntConc
158(2)
9.3 Word Lists in BNCweb
160(9)
9.3.1 Standard options
160(2)
9.3.2 Investigating subcorpora
162(7)
9.3.3 Keyword lists
169(1)
9.4 Keyword Lists in AntConc and BNCweb
169(6)
9.4.1 Keyword lists in AntConc
169(3)
9.4.2 Keyword lists in BNCweb
172(3)
9.5 Comparing and Reporting Frequency Counts
175(3)
9.6 Investigating Genre-Specific Distributions in COCA
178(1)
Solutions to/Comments on the Exercises
179(13)
Sources and Further Reading
192(1)
10 Exploring Words in Context 193(34)
10.1 Understanding Extended Units of Text
194(1)
10.2 Text Segmentation
195(1)
10.3 N-Grams, Word Clusters and Lexical Bundles
196(2)
10.4 Exploring (Relatively) Fixed Sequences in BNCweb
198(1)
10.5 Simple, Sequential Collocations and Colligations
198(4)
10.5.1 'Simple' collocations
198(2)
10.5.2 Colligations
200(1)
10.5.3 Contextually constrained and proximity searches
201(1)
10.6 Exploring Colligations in COCA
202(3)
10.7 N-grams and Clusters in AntConc
205(2)
10.8 Investigating Collocations Based on Statistical Measures in AntConc, BNCweb and COCA
207(5)
10.8.1 Calculating collocations
207(2)
10.8.2 Computing collocations in AntConc
209(1)
10.8.3 Computing collocations in BNCweb
210(1)
10.8.4 Computing collocations in COCA
211(1)
Solutions to/Comments on the Exercises
212(14)
Sources and Further Reading
226(1)
11 Understanding Markup and Annotation 227(27)
11.1 From SGML to XML - A Brief Timeline
229(1)
11.2 XML for Linguistics
230(6)
11.2.1 Why bother?
230(1)
11.2.2 What does markup/annotation look like?
230(2)
11.2.3 The 'history' and development of (linguistic) markup
232(2)
11.2.4 XML and style sheets
234(2)
11.3 'Simple XML' for Linguistic Annotation
236(4)
11.4 Colour Coding and Visualisation
240(6)
11.5 More Complex Forms of Annotation
246(2)
Solutions to/Comments on the Exercises
248(5)
Sources and Further Reading
253(1)
12 Conclusion and Further Perspectives 254(5)
Appendix A: The CLAWS C5 Tagset 259(2)
Appendix B: The Annotated Dialogue File 261(8)
Appendix C: The CSS Style Sheet 269(2)
Glossary 271(6)
References 277(6)
Index 283
Martin Weisser is a Professor in the National Key Research Center for Linguistics and Applied Linguistics at Guangdong University of Foreign Studies, China . He is the author of Essential Programming for Linguistics (2009), and has published numerous articles and book chapters, including contributions to The Encyclopedia of Applied Linguistics (Wiley, 2012) and Corpus Pragmatics: A Handbook (2014).