Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics 1st ed. 2017 [Kõva köide]

4.50/5 (4 hinnangut Goodreads-ist)

Guillaume Desagulier

Formaat: Hardback, 353 pages, kõrgus x laius: 254x178 mm, kaal: 8342 g, 55 Illustrations, color; 43 Illustrations, black and white; XIII, 353 p. 98 illus., 55 illus. in color., 1 Hardback
Sari: Quantitative Methods in the Humanities and Social Sciences
Ilmumisaeg: 07-Dec-2017
Kirjastus: Springer International Publishing AG
ISBN-10: 3319645706
ISBN-13: 9783319645704

Teised raamatud teemal:

Computational linguistics - (Hetkel poes: 1 nimetust)
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Probability & statistics - (Hetkel poes: 2 nimetust)

Kõva köide
Hind: 141,35 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 166,29 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 353 pages, kõrgus x laius: 254x178 mm, kaal: 8342 g, 55 Illustrations, color; 43 Illustrations, black and white; XIII, 353 p. 98 illus., 55 illus. in color., 1 Hardback
Sari: Quantitative Methods in the Humanities and Social Sciences
Ilmumisaeg: 07-Dec-2017
Kirjastus: Springer International Publishing AG
ISBN-10: 3319645706
ISBN-13: 9783319645704

Teised raamatud teemal:

Computational linguistics - (Hetkel poes: 1 nimetust)
Mathematical & statistical software - (Hetkel poes: 1 nimetust)
Probability & statistics - (Hetkel poes: 2 nimetust)

Püsilink: https://www.kriso.ee/db/9783319645704.html

Märksõnad:

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.

Arvustused

The fine expository qualities of CLSR, together with the sheer enthusiasm and generosity of its author, do everything to encourage its readers to use the rich and various resources R makes available to explore and discover new ways of addressing familiar problems. (Graham Ranger, Corpora, Vol. 14 (2), 2019)

1 Introduction

(14)

1.1 From Introspective to Corpus-Informed Judgments

(2)

1.2 Looking for Corpus Linguistics

(12)

1.2.1 What Counts as a Corpus

(3)

1.2.2 What Linguists Do with the Corpus

(2)

1.2.3 How Central the Corpus Is to a Linguist's Work

(2)

References

(5)

Part I Methods in Corpus Linguistics

2 R Fundamentals

(36)

2.1 Introduction

(1)

2.2 Downloads and Installs

(2)

2.2.1 Downloading and Installing R

(1)

2.2.2 Downloading and Installing RStudio

(1)

2.2.3 Downloading the Book Materials

(1)

2.3 Setting the Working Directory

(1)

2.4 R Scripts

(1)

2.5 Packages

(1)

2.5.1 Downloading Packages

(1)

2.5.2 Loading Packages

(1)

2.6 Simple Commands

(1)

2.7 Variables and Assignment

(1)

2.8 Functions and Arguments

(3)

2.8.1 Ready-Made Functions

(1)

2.8.2 User-Defined Functions

(2)

2.9 R Objects

(17)

2.9.1 Vectors

(9)

2.9.2 Lists

(1)

2.9.3 Matrices

(2)

2.9.4 Data Frames (and Factors)

(5)

2.10 For Loops

(2)

2.11 If and if ... else Statements

(2)

2.11.1 If Statements

(1)

2.11.2 If ... else Statements

(1)

2.12 Cleanup

(1)

2.13 Common Mistakes and How to Avoid Them

(1)

2.14 Further Reading

(4)

Exercises

(2)

References

(2)

3 Digital Corpora

(18)

3.1 A Short Typology

(1)

3.2 Corpus Compilation: Kennedy's Five Steps

(2)

3.3 Unannotated Corpora

(4)

3.3.1 Collecting Textual Data

(1)

3.3.2 Character Encoding Issues

(2)

3.3.3 Creating an Unannotated Corpus

(1)

3.4 Annotated Corpora

(7)

3.4.1 Markup

(1)

3.4.2 POS-Tagging

(1)

3.4.3 POS-Tagging in R

(4)

3.4.4 Semantic Tagging

(2)

3.5 Obtaining Corpora

(4)

Exercise

(1)

References

(3)

4 Processing and Manipulating Character Strings

(18)

4.1 Introduction

(1)

4.2 Character Strings

(2)

4.2.1 Definition

(1)

4.2.2 Loading Several Text Files

(1)

4.3 First Forays into Character String Processing

(2)

4.3.1 Splitting

(1)

4.3.2 Matching

(1)

4.3.3 Replacing and Deleting

(1)

4.3.4 Limitations

(1)

4.4 Regular Expressions

(14)

4.4.1 Overview

(1)

4.4.2 Literals vs. Metacharacters

(1)

4.4.3 Line Anchors

(1)

4.4.4 Quantifiers

(1)

4.4.5 Alternations and Groupings

(1)

4.4.6 Character Classes

(2)

4.4.7 Lazy vs. Greedy Matching

(1)

4.4.8 Backreference

(1)

4.4.9 Exact Matching with strapply ()

(1)

4.4.10 Lookaround

(3)

Exercises

(2)

5 Applied Character String Processing

(28)

5.1 Introduction

(1)

5.2 Concordances

(17)

5.2.1 A Concordance Based on an Unannotated Corpus

(8)

5.2.2 A Concordance Based on an Annotated Corpus

(9)

5.3 Making a Data Frame from an Annotated Corpus

104

(4)

5.3.1 Planning the Data Frame

104

(1)

5.3.2 Compiling the Data Frame

104

(2)

5.3.3 The Full Script

106

(2)

5.4 Frequency Lists

108

(7)

5.4.1 A Frequency List of a Raw Text File

108

(2)

5.4.2 A Frequency List of an Annotated File

110

(3)

Exercises

113

(1)

References

114

(1)

6 Summary Graphics for Frequency Data

115

(24)

6.1 Introduction

115

(1)

6.2 Plots, Barplots, and Histograms

115

(3)

6.3 Word Clouds

118

(4)

6.4 Dispersion Plots

122

(3)

6.5 Strip Charts

125

(2)

6.6 Reshaping Tabulated Data

127

(5)

6.7 Motion Charts

132

(7)

Exercises

133

(2)

References

135

(4)

Part II Statistics for Corpus Linguistics

7 Descriptive Statistics

139

(12)

7.1 Variables

139

(1)

7.2 Central Tendency

140

(5)

7.2.1 The Mean

140

(2)

7.2.2 The Median

142

(1)

7.2.3 The Mode

143

(2)

7.3 Dispersion

145

(6)

7.3.1 Quantiles

145

(1)

7.3.2 Boxplots

146

(1)

7.3.3 Variance and Standard Deviation

147

(1)

Exercises

148

(3)

8 Notions of Statistical Testing

151

(46)

8.1 Introduction

151

(1)

8.2 Probabilities

151

(6)

8.2.1 Definition

151

(1)

8.2.2 Simple Probabilities

152

(1)

8.2.3 Joint and Marginal Probabilities

153

(2)

8.2.4 Union vs. Intersection

155

(1)

8.2.5 Conditional Probabilities

155

(1)

8.2.6 Independence

156

(1)

8.3 Populations, Samples, and Individuals

157

(1)

8.4 Random Variables

158

(1)

8.5 Response/Dependent vs. Explanatory/Descriptive/Independent Variables

159

(1)

8.6 Hypotheses

160

(2)

8.7 Hypothesis Testing

162

(1)

8.8 Probability Distributions

163

(15)

8.8.1 Discrete Distributions

165

(4)

8.8.2 Continuous Distributions

169

(9)

8.9 The Χ2 Test

178

(7)

8.9.1 A Case Study: The Quotative System in British and Canadian Youth

178

(7)

8.10 The Fisher Exact Test of Independence

185

(1)

8.11 Correlation

186

(11)

8.11.1 Pearson's r

186

(3)

8.11.2 Kendall's τ

189

(3)

8.11.3 Spearman's ρ

192

(1)

8.11.4 Correlation Is Not Causation

193

(1)

Exercises

193

(1)

References

194

(3)

9 Association and Productivity

197

(42)

9.1 Introduction

197

(1)

9.2 Cooccurrence Phenomena

198

(5)

9.2.1 Collocation

198

(2)

9.2.2 Colligation

200

(2)

9.2.3 Collostruction

202

(1)

9.3 Association Measures

203

(23)

9.3.1 Measuring Significant Co-occurrences

203

(1)

9.3.2 The Logic of Association Measures

204

(1)

9.3.3 A Quick Inventory of Association Measures

205

(5)

9.3.4 A Loop for Association Measures

210

(3)

9.3.5 There Is No Perfect Association Measure

213

(1)

9.3.6 Collostructions

213

(9)

9.3.7 Asymmetric Association Measures

222

(4)

9.4 Lexical Richness and Productivity

226

(13)

9.4.1 Hapax-Based Measures

226

(1)

9.4.2 Types, Tokens, and Type-Token Ratio

227

(1)

9.4.3 Vocabulary Growth Curves

228

(7)

Exercise

235

(1)

References

235

(4)

10 Clustering Methods

239

(56)

10.1 Introduction

239

(3)

10.1.1 Multidimensional Data

239

(1)

10.1.2 Visualization

240

(2)

10.2 Principal Component Analysis

242

(10)

10.2.1 Principles of Principal Component Analysis

243

(1)

10.2.2 A Case Study: Characterizing Genres with Prosody in Spoken French

243

(2)

10.2.3 How PCA Works

245

(7)

10.3 An Alternative to PCA: t-SNE

252

(5)

10.4 Correspondence Analysis

257

(11)

10.4.1 Principles of Correspondence Analysis

257

(1)

10.4.2 Case Study: General Extenders in the Speech of English Teenagers

257

(4)

10.4.3 How CA Works

261

(5)

10.4.4 Supplementary Variables

266

(2)

10.5 Multiple Correspondence Analysis

268

(8)

10.5.1 Principles of Multiple Correspondence Analysis

269

(1)

10.5.2 Case Study: Predeterminer vs. Preadjectival Uses of Quite and Rather

270

(5)

10.5.3 Confidence Ellipses

275

(1)

10.5.4 Beyond MCA

276

(1)

10.6 Hierarchical Cluster Analysis

276

(7)

10.6.1 The Principles of Hierarchical Cluster Analysis

277

(1)

10.6.2 Case Study: Clustering English Intensifiers

278

(1)

10.6.3 Cluster Classes

279

(2)

10.6.4 Standardizing Variables

281

(2)

10.7 Networks

283

(12)

10.7.1 What Is a Graph?

283

(2)

10.7.2 The Linguistic Relevance of Graphs

285

(5)

Exercises

290

(2)

References

292

(3)

A Appendix

295

(6)

A.1
Chapter 6

295

(2)

A.1.1 Dispersion Plots

295

(2)

A.2
Chapter 8

297

(4)

A.2.1 Contingency Table

297

(1)

A.2.2 Discrete Probability Distributions

298

(2)

A.2.3 A Χ2 Distribution Table

300

(1)

B Bibliography

301

(8)

Solutions

309

(42)

Index

351

Guillaume Desagulier is an Associate Professor of English grammar and linguistics at Paris 8 University and President of the French Cognitive Linguistics Association (2015-).

Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics 1st ed. 2017 [Kõva köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv