Muutke küpsiste eelistusi

E-raamat: Statistics in Corpus Linguistics Research: A New Approach

  • Formaat: 382 pages
  • Ilmumisaeg: 22-Nov-2020
  • Kirjastus: Routledge
  • ISBN-13: 9780429958663
  • Formaat - EPUB+DRM
  • Hind: 48,09 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 382 pages
  • Ilmumisaeg: 22-Nov-2020
  • Kirjastus: Routledge
  • ISBN-13: 9780429958663

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

"Traditional approaches focused on significance tests have often been difficult for linguistics researchers to visualize. Statistics in Corpus Linguistics Research: A new approach breaks these significance tests down for researchers in corpus linguisticsand linguistic analysis, promoting a visual approach to understanding the performance of tests with real data, and demonstrating how to derive new intervals and tests. Software agnostic, this book discusses the "why" behind the statistical model, allowing readers a greater facility for choosing their own methodologies. Accessibly written for those with little to no mathematical or statistical background, it explains the mathematical fundamentals of simple significance tests by relating them to confidenceintervals. With sample data sets and easy-to-read visuals, this book focuses on practical issues, such as how to: pose research questions in terms of choice and constraint, employ confidence intervals correctly (including in graph plots), select optimal significance tests (and what results mean), measure the size of the effect of one variable on another, estimate the similarity of distribution patterns, and evaluate whether the results of two experiments significantly differ. Appropriate for anyone from the student just beginning their career to the seasoned researcher, this book is both a practical overview and valuable resource"--

Traditional approaches focused on significance tests have often been difficult for linguistics researchers to visualize. Statistics in Corpus Linguistics Research: A new approach breaks these significance tests down for researchers in corpus linguistics and linguistic analysis, promoting a visual approach to understanding the performance of tests with real data, and demonstrating how to derive new intervals and tests.

Software agnostic, this book discusses the "why" behind the statistical model, allowing readers a greater facility for choosing their own methodologies. Accessibly written for those with little to no mathematical or statistical background, it explains the mathematical fundamentals of simple significance tests by relating them to confidence intervals. With sample data sets and easy-to-read visuals, this book focuses on practical issues, such as how to:

• pose research questions in terms of choice and constraint,

• employ confidence intervals correctly (including in graph plots),

• select optimal significance tests (and what results mean),

• measure the size of the effect of one variable on another,

• estimate the similarity of distribution patterns, and

• evaluate whether the results of two experiments significantly differ.

Appropriate for anyone from the student just beginning their career to the seasoned researcher, this book is both a practical overview and valuable resource.

Arvustused

"Perfect for corpus linguists: an introduction to statistics written by one of them."

Christian Mair, University of Freiburg, Germany. "Perfect for corpus linguists: an introduction to statistics written by one of them."

Christian Mair, University of Freiburg, Germany

Preface xiii
1 Why Do We Need Another Book on Statistics? xiii
2 Statistics and Scientific Rigour xiv
3 Why Is Statistics Difficult? xvi
4 Looking Down the Observer's End of the Telescope xvii
5 What Do Linguists Need to Know About Statistics? xix
Acknowledgments xxii
A Note on Terminology and Notation xxiv
Contingency Tests for Different Purposes xxvi
PART 1 Motivations
1(24)
1 What Might Corpora Tell Us About Language?
3(22)
1.1 Introduction
3(3)
1.2 What Might a Corpus Tell Us?
6(2)
1.3 The 3A Cycle
8(6)
1.3.1 Annotation, Abstraction and Analysis
8(3)
1.3.2 The Problem of Representational Plurality
11(1)
1.3.3 ICECUP: A Platform for Treebank Research
12(2)
1.4 What Might a Richly Annotated Corpus Tell Us?
14(1)
1.5 External Influences: Modal Shall I Will Over Time
15(2)
1.6 Interacting Grammatical Decisions: NP Premodification
17(3)
1.7 Framing Constraints and Interaction Evidence
20(3)
1.7.1 Framing Frequency Evidence
20(1)
1.7.2 Framing Interaction Evidence
21(1)
1.7.3 Framing and Annotation
22(1)
1.7.4 Framing and Sampling
22(1)
1.8 Conclusions
23(2)
PART 2 Designing Experiments with Corpora
25(70)
2 The Idea of Corpus Experiments
27(20)
2.1 Introduction
27(1)
2.2 Experimentation and Observation
28(4)
2.2.1 Obtaining Data
28(1)
2.2.2 Research Questions and Hypotheses
29(2)
2.2.3 From Hypothesis to Experiment
31(1)
2.3 Evaluating a Hypothesis
32(5)
2.3.1 The Chi-Square Test
33(1)
2.3.2 Extracting Data
34(1)
2.3.3 Visualising Proportions, Probabilities and Significance
35(2)
2.4 Refining the Experiment
37(2)
2.5 Correlations and Causes
39(1)
2.6 A Linguistic Interaction Experiment
40(3)
2.7 Experiments and Disproof
43(1)
2.8 What Is the Point of an Experiment?
44(1)
2.9 Conclusions
44(3)
3 That Vexed Problem of Choice
47(30)
3.1 Introduction
47(6)
3.1.1 The Traditional `Per Million Words' Approach
47(1)
3.1.2 How Did Per Million Word Statistics Become Dominant?
48(1)
3.1.3 Choice Models and Linguistic Theory
49(1)
3.1.4 The Vexed Problem of Choice
50(1)
3.1.5 Exposure Rates and Other Experimental Models
51(1)
3.1.6 What Do We Mean by `Choice'?
52(1)
3.2 Parameters of Choice
53(9)
3.2.1 Types of Mutual Substitution
53(1)
3.2.2 Multi-Way Choices and Decision Trees
54(2)
3.2.3 Binomial Statistics, Tests and Time Series
56(3)
3.2.4 Lavandera's Dangerous Hypothesis
59(3)
3.3 A Methodological Progression?
62(5)
3.3.1 Per Million Words
62(1)
3.3.2 Selecting a More Plausible Baseline
63(2)
3.3.3 Enumerating Alternates
65(1)
3.3.4 Linguistically Restricting the Sample
66(1)
3.3.5 Eliminating Non-Alternating Cases
67(1)
3.3.6 A Methodological Progression
67(1)
3.4 Objections to Variationism
67(7)
3.4.1 Feasibility
67(2)
3.4.2 Arbitrariness
69(1)
3.4.3 Oversimplification
70(1)
3.4.4 The Problem of Polysemy
71(1)
3.4.5 A Complex Ecology?
72(1)
3.4.6 Necessary Reductionism Versus Complex Statistical Models
72(1)
3.4.7 Discussion
73(1)
3.5 Conclusions
74(3)
4 Choice Versus Meaning
77(10)
4.1 Introduction
77(1)
4.2 The Meanings of Very
78(1)
4.3 The Choices of Very
79(4)
4.4 Refining Baselines by Type
83(2)
4.5 Conclusions
85(2)
5 Balanced Samples and Imagined Populations
87(8)
5.1 Introduction
87(1)
5.2 A Study in Genre Variation
88(2)
5.3 Imagining Populations
90(1)
5.4 Multi-Variate and Multi-Level Modelling
91(1)
5.5 More Texts -- or Longer Ones?
92(1)
5.6 Conclusions
92(3)
PART 3 Confidence Intervals and Significance Tests
95(124)
6 Introducing Inferential Statistics
97(19)
6.1 Why Is Statistics Difficult?
97(2)
6.2 The Idea of Inferential Statistics
99(1)
6.3 The Randomness of Life
99(14)
6.3.1 The Binomial Distribution
99(3)
6.3.2 The Ideal Binomial Distribution
102(1)
6.3.3 Skewed Distributions
103(1)
6.3.4 From Binomial to Normal
104(3)
6.3.5 From Gauss to Wilson
107(4)
6.3.6 Scatter and Confidence
111(2)
6.4 Conclusions
113(3)
7 Plotting With Confidence
116(18)
7.1 Introduction
116(5)
7.1.1 Visualising Data
118(1)
7.1.2 Comparing Observations and Identifying Significant Differences
119(2)
7.2 Plotting the Graph
121(3)
7.2.1 Step
1. Gather Raw Data
121(1)
7.2.2 Step
2. Calculate Basic Wilson Score Interval Terms
122(1)
7.2.3 Step
3. Calculate the Wilson Interval
123(1)
7.2.4 Step
4. Plotting Intervals on Graphs
124(1)
7.3 Comparing and Plotting Change
124(7)
7.3.1 The Newcombe-Wilson Interval
124(2)
7.3.2 Comparing Intervals: An Illustration
126(1)
7.3.3 What Does the Newcombe-Wilson Interval Represent?
127(1)
7.3.4 Comparing Multiple Points
127(1)
7.3.5 Plotting Percentage Difference
128(2)
7.3.6 Floating Bar Charts
130(1)
7.4 An Apparent Paradox
131(1)
7.5 Conclusions
131(3)
8 From Intervals to Tests
134(32)
8.1 Introduction
134(6)
8.1.1 Binomial Intervals and Tests
135(1)
8.1.2 Sampling Assumptions
135(2)
8.1.3 Deriving a Binomial Distribution
137(2)
8.1.4 Some Example Data
139(1)
8.2 Tests for a Single Binomial Proportion
140(9)
8.2.1 The Single-Sample z Test
140(2)
8.2.2 The 2 × 1 Goodness of Fit Χ2 Test
142(1)
8.2.3 The Wilson Score Interval
143(1)
8.2.4 Correcting for Continuity
144(2)
8.2.5 The `Exact' Binomial Test
146(1)
8.2.6 The Clopper-Pearson Interval
147(1)
8.2.7 The Log-Likelihood Test
147(1)
8.2.8 A Simple Performance Comparison
148(1)
8.3 Tests for Comparing Two Observed Proportions
149(6)
8.3.1 The 2 × 2 Χ2 and z Test for Two Independent Proportions
149(2)
8.3.2 The z Test for Two Independent Proportions from Independent Populations
151(2)
8.3.3 The z Test for Two Independent Proportions with a Given Difference in Population Means
153(1)
8.3.4 Continuity-Corrected 2 × 2 Tests
154(1)
8.3.5 The Fisher `Exact' Test
154(1)
8.4 Applying Contingency Tests
155(7)
8.4.1 Selecting Tests
155(1)
8.4.2 Analysing Larger Tables
156(2)
8.4.3 Linguistic Choice
158(1)
8.4.4 Case Interaction
159(1)
8.4.5 Large Samples and Small Populations
160(2)
8.5 Comparing the Results of Experiments
162(1)
8.6 Conclusions
163(3)
9 Comparing Frequencies in the Same Distribution
166(5)
9.1 Introduction
166(1)
9.2 The Single-Sample z Test
166(2)
9.2.1 Comparing Frequency Pairs for Significant Difference
168(1)
9.2.2 Performing the Test
168(1)
9.3 Testing and Interpreting Intervals
168(1)
9.3.1 The Wilson Interval Comparison Heuristic
168(1)
9.3.2 Visualising the Test
169(1)
9.4 Conclusions
169(2)
10 Reciprocating the Wilson Interval
171(7)
10.1 Introduction
171(1)
10.2 The Wilson Interval of Mean Utterance Length
171(4)
10.2.1 Scatter and Confidence
171(1)
10.2.2 From Length to Proportion
172(1)
10.2.3 Example: Confidence Intervals on Mean Length of Utterance
173(1)
10.2.4 Plotting the Results
174(1)
10.3 Intervals on Monotonic Functions of p
175(1)
10.4 Conclusions
176(2)
11 Competition Between Choices Over Time
178(17)
11.1 Introduction
178(1)
11.2 The `S Curve'
178(2)
11.3 Boundaries and Confidence Intervals
180(2)
11.3.1 Confidence Intervals for p
180(1)
11.3.2 Logistic Curves and Wilson Intervals
180(2)
11.4 Logistic Regression
182(7)
11.4.1 From Linear to Logistic Regression
183(1)
11.4.2 Logit-Wilson Regression
183(1)
11.4.3 Example 1: The Decline of the To-infinitive Perfect
184(2)
11.4.4 Example 2: Catenative Verbs in Competition
186(1)
11.4.5 Review
186(3)
11.5 Impossible Logistic Multinomials
189(4)
11.5.1 Binomials
190(1)
11.5.2 Impossible Multinomials
190(1)
11.5.3 Possible Hierarchical Multinomials
191(1)
11.5.4 A Hierarchical Reanalysis of Example 2
191(1)
11.5.5 The Three-Body Problem
191(2)
11.6 Conclusions
193(2)
12 The Replication Crisis and the New Statistics
195(10)
12.1 Introduction
195(1)
12.2 A Corpus Linguistics Debate
195(2)
12.3 Psychology Lessons?
197(1)
12.4 The Road Not Travelled
198(1)
12.5 What Does This Mean for Corpus Linguistics?
199(2)
12.6 Some Recommendations
201(2)
12.6.1 Recommendation 1: Include a Replication Step
201(1)
12.6.2 Recommendation 2: Focus on Large Effects - and Clear Visualisations
202(1)
12.6.3 Recommendation 3: Play Devil's Advocate
202(1)
12.6.4 A Checklist for Empirical Linguistics
203(1)
12.7 Conclusions
203(2)
13 Choosing the Right Test
205(14)
13.1 Introduction
205(4)
13.1.1 Choosing a Dependent Variable and Baseline
206(1)
13.1.2 Choosing Independent Variables
207(2)
13.2 Tests for Categorical Data
209(4)
13.2.1 Two Types of Contingency Test
209(1)
13.2.2 The Benefits of Simple Tests
210(1)
13.2.3 Visualising Uncertainty
210(1)
13.2.4 When to Use Goodness of Fit Tests
211(1)
13.2.5 Tests for Comparing Results
212(1)
13.2.6 Optimum Methods of Calculation
212(1)
13.3 Tests for Other Types of Data
213(4)
13.3.1 T Tests for Comparing Two Independent Samples of Numeric Data
213(2)
13.3.2 Reversing Tests
215(1)
13.3.3 Tests for Other Types of Variables
216(1)
13.3.4 Quantisation
217(1)
13.4 Conclusions
217(2)
PART 4 Effect Sizes and Meta-Tests
219(42)
14 The Size of an Effect
221(12)
14.1 Introduction
221(1)
14.2 Effect Sizes for Two-Variable Tables
221(3)
14.2.1 Simple Difference
221(1)
14.2.2 The Problem of Prediction
222(1)
14.2.3 Cramer's φ
223(1)
14.2.4 Other Probabilistic Approaches to Dependent Probability
224(1)
14.3 Confidence Intervals on φ
224(5)
14.3.1 Confidence Intervals on 2 × 2 φ
225(1)
14.3.2 Confidence Intervals for Cramir's φ
225(1)
14.3.3 Example: Investigating Grammatical Priming
226(3)
14.4 Goodness of Fit Effect Sizes
229(2)
14.4.1 Unweighted φp
229(1)
14.4.2 Variance-Weighted φe
229(1)
14.4.3 Example: Correlating the Present Perfect
230(1)
14.5 Conclusions
231(2)
15 Meta-Tests for Comparing Tables of Results
233(28)
15.1 Introduction
233(4)
15.1.1 How Not to Compare Test Results
234(2)
15.1.2 Comparing Sizes of Effect
236(1)
15.1.3 Other Meta-Tests
236(1)
15.2 Some Preliminaries
237(2)
15.2.1 Test Assumptions
237(1)
15.2.2 Correcting for Continuity
237(2)
15.2.3 Example Data and Notation
239(1)
15.3 Point and Multi-Point Tests for Homogeneity Tables
239(4)
15.3.1 Reorganising Contingency Tables for 2 × 1 Tests
239(1)
15.3.2 The Newcombe-Wilson Point Test
240(1)
15.3.3 The Gaussian Point Test
241(1)
15.3.4 The Multi-Point Test for r × c Homogeneity Tables
242(1)
15.4 Gradient Tests for Homogeneity Tables
243(6)
15.4.1 The 2 × 2 Newcombe-Wilson Gradient Test
244(1)
15.4.2 Cramer's φ Interval and Test
245(1)
15.4.3 R × 2 Homogeneity Gradient Tests
246(3)
15.4.4 Interpreting Gradient Meta-Tests for Large Tables
249(1)
15.5 Gradient Tests for Goodness of Fit Tables
249(3)
15.5.1 The 2 × 1 Wilson Interval Gradient Test
250(2)
15.5.2 R × 1 Goodness of Fit Gradient Tests
252(1)
15.6 Subset Tests
252(6)
15.6.1 Point Tests for Subsets
253(2)
15.6.2 Multi-Point Subset Tests
255(1)
15.6.3 Gradient Subset Tests
255(1)
15.6.4 Goodness of Fit Subset Tests
255(3)
15.7 Conclusions
258(3)
PART 5 Statistical Solutions for Corpus Samples
261(34)
16 Conducting Research with Imperfect Data
263(14)
16.1 Introduction
263(1)
16.2 Reviewing Subsamples
264(5)
16.2.1 Example 1: Get Versus Be Passive
264(1)
16.2.2 Subsampling and Reviewing
265(1)
16.2.3 Estimating the Observed Probability p
266(1)
16.2.4 Contingency Tests and Multinomial Dependent Variables
267(2)
16.3 Reviewing Preliminary Analyses
269(5)
16.3.1 Example 2: Embedded and Sequential Postmodifiers
269(1)
16.3.2 Testing the Worst-Case Scenario
270(2)
16.3.3 Combining Subsampling with Worst-Case Analysis
272(2)
16.3.4 Ambiguity and Error
274(1)
16.4 Resampling and p-hacking
274(1)
16.5 Conclusions
275(2)
17 Adjusting Intervals for Random-Text Samples
277(18)
17.1 Introduction
277(1)
17.2 Recalibrating Binomial Models
278(2)
17.3 Examples with Large Samples
280(7)
17.3.1 Example 1: Interrogative Clause Proportion, `Direct Conversations'
280(2)
17.3.2 Example 2: Clauses Per Word, `Direct Conversations'
282(2)
17.3.3 Uneven-Size Subsamples
284(1)
17.3.4 Example 1 Revisited, Across ICE-GB
284(3)
17.4 Alternation Studies with Small Samples
287(6)
17.4.1 Applying the Large Sample Method
288(1)
17.4.2 Singletons, Partitioning and Pooling
289(3)
17.4.3 Review
292(1)
17.5 Conclusions
293(2)
PART 6 Concluding Remarks
295(22)
18 Plotting the Wilson Distribution
297(17)
18.1 Introduction
297(1)
18.2 Plotting the Distribution
298(4)
18.2.1 Calculating w-(α) from the Standard Normal Distribution
298(2)
18.2.2 Plotting Points
300(1)
18.2.3 Delta Approximation
300(2)
18.3 Example Plots
302(5)
18.3.1 Sample Size n = 10, Observed Proportion p = 0.5
302(1)
18.3.2 Properties of Wilson Areas
303(1)
18.3.3 The Effect of p Tending to Extremes
303(1)
18.3.4 The Effect of Very Small n
304(3)
18.4 Further Perspectives on Wilson Distributions
307(1)
18.4.1 Percentiles of Wilson Distributions
307(1)
18.4.2 The Logit-Wilson Distribution
307(1)
18.5 Alternative Distributions
308(2)
18.5.1 Continuity-Corrected Wilson Distributions
308(2)
18.5.2 Clopper-Pearson Distributions
310(1)
18.6 Conclusions
310(4)
19 In Conclusion
314(3)
Appendices
317(2)
A The Interval Equality Principle
319(5)
1 Introduction
319(1)
1.1 Axiom
319(1)
1.2 Functional Notation
319(1)
2 Applications
320(2)
2.1 Wilson Score Interval
320(1)
2.2 Wilson Score Interval with Continuity Correction
321(1)
2.3 Binomial and Clopper-Pearson Intervals
321(1)
2.4 Log-Likelihood and Other Significance Test Functions
321(1)
3 Searching for Interval Bounds with a Computer
322(2)
B Pseudo-Code for Computational Procedures
324(5)
1 Simple Logistic Regression Algorithm with Logit-Wilson Variance
324(2)
1.1 Calculate Sum of Squared Errors e for Known m and k
324(1)
1.2 Find Optimum Value of k by Search for Smallest Error e for Gradient m
324(1)
1.3 Find Optimum Values of m and k by the Method of Least Squares
325(1)
1.4 Perform Regression
326(1)
2 Binomial and Fisher Functions
326(3)
2.1 Core Functions
326(1)
2.2 The Clopper-Pearson Interval
327(2)
Glossary 329(13)
References 342(5)
Index 347
Sean Wallis is Principal Research Fellow and Deputy Director of the Survey of English Usage at UCL.