Muutke küpsiste eelistusi

Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics 1st ed. 2017 [Kõva köide]

  • Formaat: Hardback, 353 pages, kõrgus x laius: 254x178 mm, kaal: 8342 g, 55 Illustrations, color; 43 Illustrations, black and white; XIII, 353 p. 98 illus., 55 illus. in color., 1 Hardback
  • Sari: Quantitative Methods in the Humanities and Social Sciences
  • Ilmumisaeg: 07-Dec-2017
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319645706
  • ISBN-13: 9783319645704
Teised raamatud teemal:
  • Kõva köide
  • Hind: 141,35 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 166,29 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 353 pages, kõrgus x laius: 254x178 mm, kaal: 8342 g, 55 Illustrations, color; 43 Illustrations, black and white; XIII, 353 p. 98 illus., 55 illus. in color., 1 Hardback
  • Sari: Quantitative Methods in the Humanities and Social Sciences
  • Ilmumisaeg: 07-Dec-2017
  • Kirjastus: Springer International Publishing AG
  • ISBN-10: 3319645706
  • ISBN-13: 9783319645704
Teised raamatud teemal:
This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.

Arvustused

The fine expository qualities of CLSR, together with the sheer enthusiasm and generosity of its author, do everything to encourage its readers to use the rich and various resources R makes available to explore and discover new ways of addressing familiar problems. (Graham Ranger, Corpora, Vol. 14 (2), 2019)

1 Introduction
1(14)
1.1 From Introspective to Corpus-Informed Judgments
1(2)
1.2 Looking for Corpus Linguistics
3(12)
1.2.1 What Counts as a Corpus
3(3)
1.2.2 What Linguists Do with the Corpus
6(2)
1.2.3 How Central the Corpus Is to a Linguist's Work
8(2)
References
10(5)
Part I Methods in Corpus Linguistics
2 R Fundamentals
15(36)
2.1 Introduction
15(1)
2.2 Downloads and Installs
15(2)
2.2.1 Downloading and Installing R
16(1)
2.2.2 Downloading and Installing RStudio
16(1)
2.2.3 Downloading the Book Materials
17(1)
2.3 Setting the Working Directory
17(1)
2.4 R Scripts
17(1)
2.5 Packages
18(1)
2.5.1 Downloading Packages
18(1)
2.5.2 Loading Packages
19(1)
2.6 Simple Commands
19(1)
2.7 Variables and Assignment
20(1)
2.8 Functions and Arguments
21(3)
2.8.1 Ready-Made Functions
21(1)
2.8.2 User-Defined Functions
22(2)
2.9 R Objects
24(17)
2.9.1 Vectors
24(9)
2.9.2 Lists
33(1)
2.9.3 Matrices
34(2)
2.9.4 Data Frames (and Factors)
36(5)
2.10 For Loops
41(2)
2.11 If and if ... else Statements
43(2)
2.11.1 If Statements
43(1)
2.11.2 If ... else Statements
44(1)
2.12 Cleanup
45(1)
2.13 Common Mistakes and How to Avoid Them
46(1)
2.14 Further Reading
47(4)
Exercises
47(2)
References
49(2)
3 Digital Corpora
51(18)
3.1 A Short Typology
51(1)
3.2 Corpus Compilation: Kennedy's Five Steps
52(2)
3.3 Unannotated Corpora
54(4)
3.3.1 Collecting Textual Data
54(1)
3.3.2 Character Encoding Issues
55(2)
3.3.3 Creating an Unannotated Corpus
57(1)
3.4 Annotated Corpora
58(7)
3.4.1 Markup
58(1)
3.4.2 POS-Tagging
58(1)
3.4.3 POS-Tagging in R
59(4)
3.4.4 Semantic Tagging
63(2)
3.5 Obtaining Corpora
65(4)
Exercise
65(1)
References
66(3)
4 Processing and Manipulating Character Strings
69(18)
4.1 Introduction
69(1)
4.2 Character Strings
69(2)
4.2.1 Definition
70(1)
4.2.2 Loading Several Text Files
70(1)
4.3 First Forays into Character String Processing
71(2)
4.3.1 Splitting
71(1)
4.3.2 Matching
72(1)
4.3.3 Replacing and Deleting
72(1)
4.3.4 Limitations
73(1)
4.4 Regular Expressions
73(14)
4.4.1 Overview
73(1)
4.4.2 Literals vs. Metacharacters
74(1)
4.4.3 Line Anchors
74(1)
4.4.4 Quantifiers
75(1)
4.4.5 Alternations and Groupings
76(1)
4.4.6 Character Classes
77(2)
4.4.7 Lazy vs. Greedy Matching
79(1)
4.4.8 Backreference
80(1)
4.4.9 Exact Matching with strapply ()
81(1)
4.4.10 Lookaround
82(3)
Exercises
85(2)
5 Applied Character String Processing
87(28)
5.1 Introduction
87(1)
5.2 Concordances
87(17)
5.2.1 A Concordance Based on an Unannotated Corpus
87(8)
5.2.2 A Concordance Based on an Annotated Corpus
95(9)
5.3 Making a Data Frame from an Annotated Corpus
104(4)
5.3.1 Planning the Data Frame
104(1)
5.3.2 Compiling the Data Frame
104(2)
5.3.3 The Full Script
106(2)
5.4 Frequency Lists
108(7)
5.4.1 A Frequency List of a Raw Text File
108(2)
5.4.2 A Frequency List of an Annotated File
110(3)
Exercises
113(1)
References
114(1)
6 Summary Graphics for Frequency Data
115(24)
6.1 Introduction
115(1)
6.2 Plots, Barplots, and Histograms
115(3)
6.3 Word Clouds
118(4)
6.4 Dispersion Plots
122(3)
6.5 Strip Charts
125(2)
6.6 Reshaping Tabulated Data
127(5)
6.7 Motion Charts
132(7)
Exercises
133(2)
References
135(4)
Part II Statistics for Corpus Linguistics
7 Descriptive Statistics
139(12)
7.1 Variables
139(1)
7.2 Central Tendency
140(5)
7.2.1 The Mean
140(2)
7.2.2 The Median
142(1)
7.2.3 The Mode
143(2)
7.3 Dispersion
145(6)
7.3.1 Quantiles
145(1)
7.3.2 Boxplots
146(1)
7.3.3 Variance and Standard Deviation
147(1)
Exercises
148(3)
8 Notions of Statistical Testing
151(46)
8.1 Introduction
151(1)
8.2 Probabilities
151(6)
8.2.1 Definition
151(1)
8.2.2 Simple Probabilities
152(1)
8.2.3 Joint and Marginal Probabilities
153(2)
8.2.4 Union vs. Intersection
155(1)
8.2.5 Conditional Probabilities
155(1)
8.2.6 Independence
156(1)
8.3 Populations, Samples, and Individuals
157(1)
8.4 Random Variables
158(1)
8.5 Response/Dependent vs. Explanatory/Descriptive/Independent Variables
159(1)
8.6 Hypotheses
160(2)
8.7 Hypothesis Testing
162(1)
8.8 Probability Distributions
163(15)
8.8.1 Discrete Distributions
165(4)
8.8.2 Continuous Distributions
169(9)
8.9 The Χ2 Test
178(7)
8.9.1 A Case Study: The Quotative System in British and Canadian Youth
178(7)
8.10 The Fisher Exact Test of Independence
185(1)
8.11 Correlation
186(11)
8.11.1 Pearson's r
186(3)
8.11.2 Kendall's τ
189(3)
8.11.3 Spearman's ρ
192(1)
8.11.4 Correlation Is Not Causation
193(1)
Exercises
193(1)
References
194(3)
9 Association and Productivity
197(42)
9.1 Introduction
197(1)
9.2 Cooccurrence Phenomena
198(5)
9.2.1 Collocation
198(2)
9.2.2 Colligation
200(2)
9.2.3 Collostruction
202(1)
9.3 Association Measures
203(23)
9.3.1 Measuring Significant Co-occurrences
203(1)
9.3.2 The Logic of Association Measures
204(1)
9.3.3 A Quick Inventory of Association Measures
205(5)
9.3.4 A Loop for Association Measures
210(3)
9.3.5 There Is No Perfect Association Measure
213(1)
9.3.6 Collostructions
213(9)
9.3.7 Asymmetric Association Measures
222(4)
9.4 Lexical Richness and Productivity
226(13)
9.4.1 Hapax-Based Measures
226(1)
9.4.2 Types, Tokens, and Type-Token Ratio
227(1)
9.4.3 Vocabulary Growth Curves
228(7)
Exercise
235(1)
References
235(4)
10 Clustering Methods
239(56)
10.1 Introduction
239(3)
10.1.1 Multidimensional Data
239(1)
10.1.2 Visualization
240(2)
10.2 Principal Component Analysis
242(10)
10.2.1 Principles of Principal Component Analysis
243(1)
10.2.2 A Case Study: Characterizing Genres with Prosody in Spoken French
243(2)
10.2.3 How PCA Works
245(7)
10.3 An Alternative to PCA: t-SNE
252(5)
10.4 Correspondence Analysis
257(11)
10.4.1 Principles of Correspondence Analysis
257(1)
10.4.2 Case Study: General Extenders in the Speech of English Teenagers
257(4)
10.4.3 How CA Works
261(5)
10.4.4 Supplementary Variables
266(2)
10.5 Multiple Correspondence Analysis
268(8)
10.5.1 Principles of Multiple Correspondence Analysis
269(1)
10.5.2 Case Study: Predeterminer vs. Preadjectival Uses of Quite and Rather
270(5)
10.5.3 Confidence Ellipses
275(1)
10.5.4 Beyond MCA
276(1)
10.6 Hierarchical Cluster Analysis
276(7)
10.6.1 The Principles of Hierarchical Cluster Analysis
277(1)
10.6.2 Case Study: Clustering English Intensifiers
278(1)
10.6.3 Cluster Classes
279(2)
10.6.4 Standardizing Variables
281(2)
10.7 Networks
283(12)
10.7.1 What Is a Graph?
283(2)
10.7.2 The Linguistic Relevance of Graphs
285(5)
Exercises
290(2)
References
292(3)
A Appendix
295(6)
A.1
Chapter 6
295(2)
A.1.1 Dispersion Plots
295(2)
A.2
Chapter 8
297(4)
A.2.1 Contingency Table
297(1)
A.2.2 Discrete Probability Distributions
298(2)
A.2.3 A Χ2 Distribution Table
300(1)
B Bibliography
301(8)
Solutions 309(42)
Index 351
Guillaume Desagulier is an Associate Professor of English grammar and linguistics at Paris 8 University and President of the French Cognitive Linguistics Association (2015-).