Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Statistics in Corpus Linguistics Research: A New Approach

Sean Wallis

Formaat: 382 pages
Ilmumisaeg: 22-Nov-2020
Kirjastus: Routledge
ISBN-13: 9780429958663

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 48,09 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 382 pages
Ilmumisaeg: 22-Nov-2020
Kirjastus: Routledge
ISBN-13: 9780429958663

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

"Traditional approaches focused on significance tests have often been difficult for linguistics researchers to visualize. Statistics in Corpus Linguistics Research: A new approach breaks these significance tests down for researchers in corpus linguisticsand linguistic analysis, promoting a visual approach to understanding the performance of tests with real data, and demonstrating how to derive new intervals and tests. Software agnostic, this book discusses the "why" behind the statistical model, allowing readers a greater facility for choosing their own methodologies. Accessibly written for those with little to no mathematical or statistical background, it explains the mathematical fundamentals of simple significance tests by relating them to confidenceintervals. With sample data sets and easy-to-read visuals, this book focuses on practical issues, such as how to: pose research questions in terms of choice and constraint, employ confidence intervals correctly (including in graph plots), select optimal significance tests (and what results mean), measure the size of the effect of one variable on another, estimate the similarity of distribution patterns, and evaluate whether the results of two experiments significantly differ. Appropriate for anyone from the student just beginning their career to the seasoned researcher, this book is both a practical overview and valuable resource"--

Traditional approaches focused on significance tests have often been difficult for linguistics researchers to visualize. Statistics in Corpus Linguistics Research: A new approach breaks these significance tests down for researchers in corpus linguistics and linguistic analysis, promoting a visual approach to understanding the performance of tests with real data, and demonstrating how to derive new intervals and tests.

Software agnostic, this book discusses the "why" behind the statistical model, allowing readers a greater facility for choosing their own methodologies. Accessibly written for those with little to no mathematical or statistical background, it explains the mathematical fundamentals of simple significance tests by relating them to confidence intervals. With sample data sets and easy-to-read visuals, this book focuses on practical issues, such as how to:

• pose research questions in terms of choice and constraint,

• employ confidence intervals correctly (including in graph plots),

• select optimal significance tests (and what results mean),

• measure the size of the effect of one variable on another,

• estimate the similarity of distribution patterns, and

• evaluate whether the results of two experiments significantly differ.

Appropriate for anyone from the student just beginning their career to the seasoned researcher, this book is both a practical overview and valuable resource.

Arvustused

"Perfect for corpus linguists: an introduction to statistics written by one of them."

Christian Mair, University of Freiburg, Germany. "Perfect for corpus linguists: an introduction to statistics written by one of them."

Christian Mair, University of Freiburg, Germany

Preface

xiii

1 Why Do We Need Another Book on Statistics?

xiii

2 Statistics and Scientific Rigour

xiv

3 Why Is Statistics Difficult?

xvi

4 Looking Down the Observer's End of the Telescope

xvii

5 What Do Linguists Need to Know About Statistics?

xix

Acknowledgments

xxii

A Note on Terminology and Notation

xxiv

Contingency Tests for Different Purposes

xxvi

PART 1 Motivations

(24)

1 What Might Corpora Tell Us About Language?

(22)

1.1 Introduction

(3)

1.2 What Might a Corpus Tell Us?

(2)

1.3 The 3A Cycle

(6)

1.3.1 Annotation, Abstraction and Analysis

(3)

1.3.2 The Problem of Representational Plurality

(1)

1.3.3 ICECUP: A Platform for Treebank Research

(2)

1.4 What Might a Richly Annotated Corpus Tell Us?

(1)

1.5 External Influences: Modal Shall I Will Over Time

(2)

1.6 Interacting Grammatical Decisions: NP Premodification

(3)

1.7 Framing Constraints and Interaction Evidence

(3)

1.7.1 Framing Frequency Evidence

(1)

1.7.2 Framing Interaction Evidence

(1)

1.7.3 Framing and Annotation

(1)

1.7.4 Framing and Sampling

(1)

1.8 Conclusions

(2)

PART 2 Designing Experiments with Corpora

(70)

2 The Idea of Corpus Experiments

(20)

2.1 Introduction

(1)

2.2 Experimentation and Observation

(4)

2.2.1 Obtaining Data

(1)

2.2.2 Research Questions and Hypotheses

(2)

2.2.3 From Hypothesis to Experiment

(1)

2.3 Evaluating a Hypothesis

(5)

2.3.1 The Chi-Square Test

(1)

2.3.2 Extracting Data

(1)

2.3.3 Visualising Proportions, Probabilities and Significance

(2)

2.4 Refining the Experiment

(2)

2.5 Correlations and Causes

(1)

2.6 A Linguistic Interaction Experiment

(3)

2.7 Experiments and Disproof

(1)

2.8 What Is the Point of an Experiment?

(1)

2.9 Conclusions

(3)

3 That Vexed Problem of Choice

(30)

3.1 Introduction

(6)

3.1.1 The Traditional `Per Million Words' Approach

(1)

3.1.2 How Did Per Million Word Statistics Become Dominant?

(1)

3.1.3 Choice Models and Linguistic Theory

(1)

3.1.4 The Vexed Problem of Choice

(1)

3.1.5 Exposure Rates and Other Experimental Models

(1)

3.1.6 What Do We Mean by `Choice'?

(1)

3.2 Parameters of Choice

(9)

3.2.1 Types of Mutual Substitution

(1)

3.2.2 Multi-Way Choices and Decision Trees

(2)

3.2.3 Binomial Statistics, Tests and Time Series

(3)

3.2.4 Lavandera's Dangerous Hypothesis

(3)

3.3 A Methodological Progression?

(5)

3.3.1 Per Million Words

(1)

3.3.2 Selecting a More Plausible Baseline

(2)

3.3.3 Enumerating Alternates

(1)

3.3.4 Linguistically Restricting the Sample

(1)

3.3.5 Eliminating Non-Alternating Cases

(1)

3.3.6 A Methodological Progression

(1)

3.4 Objections to Variationism

(7)

3.4.1 Feasibility

(2)

3.4.2 Arbitrariness

(1)

3.4.3 Oversimplification

(1)

3.4.4 The Problem of Polysemy

(1)

3.4.5 A Complex Ecology?

(1)

3.4.6 Necessary Reductionism Versus Complex Statistical Models

(1)

3.4.7 Discussion

(1)

3.5 Conclusions

(3)

4 Choice Versus Meaning

(10)

4.1 Introduction

(1)

4.2 The Meanings of Very

(1)

4.3 The Choices of Very

(4)

4.4 Refining Baselines by Type

(2)

4.5 Conclusions

(2)

5 Balanced Samples and Imagined Populations

(8)

5.1 Introduction

(1)

5.2 A Study in Genre Variation

(2)

5.3 Imagining Populations

(1)

5.4 Multi-Variate and Multi-Level Modelling

(1)

5.5 More Texts -- or Longer Ones?

(1)

5.6 Conclusions

(3)

PART 3 Confidence Intervals and Significance Tests

(124)

6 Introducing Inferential Statistics

(19)

6.1 Why Is Statistics Difficult?

(2)

6.2 The Idea of Inferential Statistics

(1)

6.3 The Randomness of Life

(14)

6.3.1 The Binomial Distribution

(3)

6.3.2 The Ideal Binomial Distribution

102

(1)

6.3.3 Skewed Distributions

103

(1)

6.3.4 From Binomial to Normal

104

(3)

6.3.5 From Gauss to Wilson

107

(4)

6.3.6 Scatter and Confidence

111

(2)

6.4 Conclusions

113

(3)

7 Plotting With Confidence

116

(18)

7.1 Introduction

116

(5)

7.1.1 Visualising Data

118

(1)

7.1.2 Comparing Observations and Identifying Significant Differences

119

(2)

7.2 Plotting the Graph

121

(3)

7.2.1 Step
1. Gather Raw Data

121

(1)

7.2.2 Step
2. Calculate Basic Wilson Score Interval Terms

122

(1)

7.2.3 Step
3. Calculate the Wilson Interval

123

(1)

7.2.4 Step
4. Plotting Intervals on Graphs

124

(1)

7.3 Comparing and Plotting Change

124

(7)

7.3.1 The Newcombe-Wilson Interval

124

(2)

7.3.2 Comparing Intervals: An Illustration

126

(1)

7.3.3 What Does the Newcombe-Wilson Interval Represent?

127

(1)

7.3.4 Comparing Multiple Points

127

(1)

7.3.5 Plotting Percentage Difference

128

(2)

7.3.6 Floating Bar Charts

130

(1)

7.4 An Apparent Paradox

131

(1)

7.5 Conclusions

131

(3)

8 From Intervals to Tests

134

(32)

8.1 Introduction

134

(6)

8.1.1 Binomial Intervals and Tests

135

(1)

8.1.2 Sampling Assumptions

135

(2)

8.1.3 Deriving a Binomial Distribution

137

(2)

8.1.4 Some Example Data

139

(1)

8.2 Tests for a Single Binomial Proportion

140

(9)

8.2.1 The Single-Sample z Test

140

(2)

8.2.2 The 2 × 1 Goodness of Fit Χ2 Test

142

(1)

8.2.3 The Wilson Score Interval

143

(1)

8.2.4 Correcting for Continuity

144

(2)

8.2.5 The `Exact' Binomial Test

146

(1)

8.2.6 The Clopper-Pearson Interval

147

(1)

8.2.7 The Log-Likelihood Test

147

(1)

8.2.8 A Simple Performance Comparison

148

(1)

8.3 Tests for Comparing Two Observed Proportions

149

(6)

8.3.1 The 2 × 2 Χ2 and z Test for Two Independent Proportions

149

(2)

8.3.2 The z Test for Two Independent Proportions from Independent Populations

151

(2)

8.3.3 The z Test for Two Independent Proportions with a Given Difference in Population Means

153

(1)

8.3.4 Continuity-Corrected 2 × 2 Tests

154

(1)

8.3.5 The Fisher `Exact' Test

154

(1)

8.4 Applying Contingency Tests

155

(7)

8.4.1 Selecting Tests

155

(1)

8.4.2 Analysing Larger Tables

156

(2)

8.4.3 Linguistic Choice

158

(1)

8.4.4 Case Interaction

159

(1)

8.4.5 Large Samples and Small Populations

160

(2)

8.5 Comparing the Results of Experiments

162

(1)

8.6 Conclusions

163

(3)

9 Comparing Frequencies in the Same Distribution

166

(5)

9.1 Introduction

166

(1)

9.2 The Single-Sample z Test

166

(2)

9.2.1 Comparing Frequency Pairs for Significant Difference

168

(1)

9.2.2 Performing the Test

168

(1)

9.3 Testing and Interpreting Intervals

168

(1)

9.3.1 The Wilson Interval Comparison Heuristic

168

(1)

9.3.2 Visualising the Test

169

(1)

9.4 Conclusions

169

(2)

10 Reciprocating the Wilson Interval

171

(7)

10.1 Introduction

171

(1)

10.2 The Wilson Interval of Mean Utterance Length

171

(4)

10.2.1 Scatter and Confidence

171

(1)

10.2.2 From Length to Proportion

172

(1)

10.2.3 Example: Confidence Intervals on Mean Length of Utterance

173

(1)

10.2.4 Plotting the Results

174

(1)

10.3 Intervals on Monotonic Functions of p

175

(1)

10.4 Conclusions

176

(2)

11 Competition Between Choices Over Time

178

(17)

11.1 Introduction

178

(1)

11.2 The `S Curve'

178

(2)

11.3 Boundaries and Confidence Intervals

180

(2)

11.3.1 Confidence Intervals for p

180

(1)

11.3.2 Logistic Curves and Wilson Intervals

180

(2)

11.4 Logistic Regression

182

(7)

11.4.1 From Linear to Logistic Regression

183

(1)

11.4.2 Logit-Wilson Regression

183

(1)

11.4.3 Example 1: The Decline of the To-infinitive Perfect

184

(2)

11.4.4 Example 2: Catenative Verbs in Competition

186

(1)

11.4.5 Review

186

(3)

11.5 Impossible Logistic Multinomials

189

(4)

11.5.1 Binomials

190

(1)

11.5.2 Impossible Multinomials

190

(1)

11.5.3 Possible Hierarchical Multinomials

191

(1)

11.5.4 A Hierarchical Reanalysis of Example 2

191

(1)

11.5.5 The Three-Body Problem

191

(2)

11.6 Conclusions

193

(2)

12 The Replication Crisis and the New Statistics

195

(10)

12.1 Introduction

195

(1)

12.2 A Corpus Linguistics Debate

195

(2)

12.3 Psychology Lessons?

197

(1)

12.4 The Road Not Travelled

198

(1)

12.5 What Does This Mean for Corpus Linguistics?

199

(2)

12.6 Some Recommendations

201

(2)

12.6.1 Recommendation 1: Include a Replication Step

201

(1)

12.6.2 Recommendation 2: Focus on Large Effects - and Clear Visualisations

202

(1)

12.6.3 Recommendation 3: Play Devil's Advocate

202

(1)

12.6.4 A Checklist for Empirical Linguistics

203

(1)

12.7 Conclusions

203

(2)

13 Choosing the Right Test

205

(14)

13.1 Introduction

205

(4)

13.1.1 Choosing a Dependent Variable and Baseline

206

(1)

13.1.2 Choosing Independent Variables

207

(2)

13.2 Tests for Categorical Data

209

(4)

13.2.1 Two Types of Contingency Test

209

(1)

13.2.2 The Benefits of Simple Tests

210

(1)

13.2.3 Visualising Uncertainty

210

(1)

13.2.4 When to Use Goodness of Fit Tests

211

(1)

13.2.5 Tests for Comparing Results

212

(1)

13.2.6 Optimum Methods of Calculation

212

(1)

13.3 Tests for Other Types of Data

213

(4)

13.3.1 T Tests for Comparing Two Independent Samples of Numeric Data

213

(2)

13.3.2 Reversing Tests

215

(1)

13.3.3 Tests for Other Types of Variables

216

(1)

13.3.4 Quantisation

217

(1)

13.4 Conclusions

217

(2)

PART 4 Effect Sizes and Meta-Tests

219

(42)

14 The Size of an Effect

221

(12)

14.1 Introduction

221

(1)

14.2 Effect Sizes for Two-Variable Tables

221

(3)

14.2.1 Simple Difference

221

(1)

14.2.2 The Problem of Prediction

222

(1)

14.2.3 Cramer's φ

223

(1)

14.2.4 Other Probabilistic Approaches to Dependent Probability

224

(1)

14.3 Confidence Intervals on φ

224

(5)

14.3.1 Confidence Intervals on 2 × 2 φ

225

(1)

14.3.2 Confidence Intervals for Cramir's φ

225

(1)

14.3.3 Example: Investigating Grammatical Priming

226

(3)

14.4 Goodness of Fit Effect Sizes

229

(2)

14.4.1 Unweighted φp

229

(1)

14.4.2 Variance-Weighted φe

229

(1)

14.4.3 Example: Correlating the Present Perfect

230

(1)

14.5 Conclusions

231

(2)

15 Meta-Tests for Comparing Tables of Results

233

(28)

15.1 Introduction

233

(4)

15.1.1 How Not to Compare Test Results

234

(2)

15.1.2 Comparing Sizes of Effect

236

(1)

15.1.3 Other Meta-Tests

236

(1)

15.2 Some Preliminaries

237

(2)

15.2.1 Test Assumptions

237

(1)

15.2.2 Correcting for Continuity

237

(2)

15.2.3 Example Data and Notation

239

(1)

15.3 Point and Multi-Point Tests for Homogeneity Tables

239

(4)

15.3.1 Reorganising Contingency Tables for 2 × 1 Tests

239

(1)

15.3.2 The Newcombe-Wilson Point Test

240

(1)

15.3.3 The Gaussian Point Test

241

(1)

15.3.4 The Multi-Point Test for r × c Homogeneity Tables

242

(1)

15.4 Gradient Tests for Homogeneity Tables

243

(6)

15.4.1 The 2 × 2 Newcombe-Wilson Gradient Test

244

(1)

15.4.2 Cramer's φ Interval and Test

245

(1)

15.4.3 R × 2 Homogeneity Gradient Tests

246

(3)

15.4.4 Interpreting Gradient Meta-Tests for Large Tables

249

(1)

15.5 Gradient Tests for Goodness of Fit Tables

249

(3)

15.5.1 The 2 × 1 Wilson Interval Gradient Test

250

(2)

15.5.2 R × 1 Goodness of Fit Gradient Tests

252

(1)

15.6 Subset Tests

252

(6)

15.6.1 Point Tests for Subsets

253

(2)

15.6.2 Multi-Point Subset Tests

255

(1)

15.6.3 Gradient Subset Tests

255

(1)

15.6.4 Goodness of Fit Subset Tests

255

(3)

15.7 Conclusions

258

(3)

PART 5 Statistical Solutions for Corpus Samples

261

(34)

16 Conducting Research with Imperfect Data

263

(14)

16.1 Introduction

263

(1)

16.2 Reviewing Subsamples

264

(5)

16.2.1 Example 1: Get Versus Be Passive

264

(1)

16.2.2 Subsampling and Reviewing

265

(1)

16.2.3 Estimating the Observed Probability p

266

(1)

16.2.4 Contingency Tests and Multinomial Dependent Variables

267

(2)

16.3 Reviewing Preliminary Analyses

269

(5)

16.3.1 Example 2: Embedded and Sequential Postmodifiers

269

(1)

16.3.2 Testing the Worst-Case Scenario

270

(2)

16.3.3 Combining Subsampling with Worst-Case Analysis

272

(2)

16.3.4 Ambiguity and Error

274

(1)

16.4 Resampling and p-hacking

274

(1)

16.5 Conclusions

275

(2)

17 Adjusting Intervals for Random-Text Samples

277

(18)

17.1 Introduction

277

(1)

17.2 Recalibrating Binomial Models

278

(2)

17.3 Examples with Large Samples

280

(7)

17.3.1 Example 1: Interrogative Clause Proportion, `Direct Conversations'

280

(2)

17.3.2 Example 2: Clauses Per Word, `Direct Conversations'

282

(2)

17.3.3 Uneven-Size Subsamples

284

(1)

17.3.4 Example 1 Revisited, Across ICE-GB

284

(3)

17.4 Alternation Studies with Small Samples

287

(6)

17.4.1 Applying the Large Sample Method

288

(1)

17.4.2 Singletons, Partitioning and Pooling

289

(3)

17.4.3 Review

292

(1)

17.5 Conclusions

293

(2)

PART 6 Concluding Remarks

295

(22)

18 Plotting the Wilson Distribution

297

(17)

18.1 Introduction

297

(1)

18.2 Plotting the Distribution

298

(4)

18.2.1 Calculating w-(α) from the Standard Normal Distribution

298

(2)

18.2.2 Plotting Points

300

(1)

18.2.3 Delta Approximation

300

(2)

18.3 Example Plots

302

(5)

18.3.1 Sample Size n = 10, Observed Proportion p = 0.5

302

(1)

18.3.2 Properties of Wilson Areas

303

(1)

18.3.3 The Effect of p Tending to Extremes

303

(1)

18.3.4 The Effect of Very Small n

304

(3)

18.4 Further Perspectives on Wilson Distributions

307

(1)

18.4.1 Percentiles of Wilson Distributions

307

(1)

18.4.2 The Logit-Wilson Distribution

307

(1)

18.5 Alternative Distributions

308

(2)

18.5.1 Continuity-Corrected Wilson Distributions

308

(2)

18.5.2 Clopper-Pearson Distributions

310

(1)

18.6 Conclusions

310

(4)

19 In Conclusion

314

(3)

Appendices

317

(2)

A The Interval Equality Principle

319

(5)

1 Introduction

319

(1)

1.1 Axiom

319

(1)

1.2 Functional Notation

319

(1)

2 Applications

320

(2)

2.1 Wilson Score Interval

320

(1)

2.2 Wilson Score Interval with Continuity Correction

321

(1)

2.3 Binomial and Clopper-Pearson Intervals

321

(1)

2.4 Log-Likelihood and Other Significance Test Functions

321

(1)

3 Searching for Interval Bounds with a Computer

322

(2)

B Pseudo-Code for Computational Procedures

324

(5)

1 Simple Logistic Regression Algorithm with Logit-Wilson Variance

324

(2)

1.1 Calculate Sum of Squared Errors e for Known m and k

324

(1)

1.2 Find Optimum Value of k by Search for Smallest Error e for Gradient m

324

(1)

1.3 Find Optimum Values of m and k by the Method of Least Squares

325

(1)

1.4 Perform Regression

326

(1)

2 Binomial and Fisher Functions

326

(3)

2.1 Core Functions

326

(1)

2.2 The Clopper-Pearson Interval

327

(2)

Glossary

329

(13)

References

342

(5)

Index

347

Sean Wallis is Principal Research Fellow and Deputy Director of the Survey of English Usage at UCL.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97804299586636e.html

Märksõnad:

E-raamat: Statistics in Corpus Linguistics Research: A New Approach

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv