Why Use This Book |
|
xxi | |
Simplified Notation |
|
xxiv | |
Acknowledgments |
|
xxv | |
|
|
1 | (168) |
|
|
3 | (27) |
|
|
4 | (1) |
|
|
5 | (2) |
|
1 A1 Case Study -- Finding a Good Deal among Hotels: Data Collection |
|
|
6 | (1) |
|
|
7 | (4) |
|
1.B1 Case Study -- Comparing Online and Offline Prices: Data Collection |
|
|
9 | (1) |
|
1.C1 Case Study -- Management Quality and Firm Performance: Data Collection |
|
|
10 | (1) |
|
1.4 How Data Is Born: The Big Picture |
|
|
11 | (1) |
|
1.5 Collecting Data from Existing Sources |
|
|
12 | (4) |
|
1.A2 Case Study -- Finding a Good Deal among Hotels: Data Collection |
|
|
14 | (1) |
|
1.B2 Case Study -- Comparing Online and Offline Prices: Data Collection |
|
|
15 | (1) |
|
|
16 | (2) |
|
1.C2 Case Study -- Management Quality and Firm Size: Data Collection |
|
|
18 | (1) |
|
|
18 | (1) |
|
|
19 | (3) |
|
1.B3 Case Study -- Comparing Online and Offline Prices: Data Collection |
|
|
21 | (1) |
|
1.C3 Case Study -- Management Quality and Firm Size: Data Collection |
|
|
21 | (1) |
|
|
22 | (2) |
|
1.10 Good Practices in Data Collection |
|
|
24 | (2) |
|
1.11 Ethical and Legal Issues of Data Collection |
|
|
26 | (1) |
|
|
27 | (3) |
|
|
27 | (1) |
|
|
28 | (1) |
|
References and Further Reading |
|
|
28 | (2) |
|
2 Preparing Data for Analysis |
|
|
30 | (28) |
|
|
31 | (2) |
|
2.2 Stock Variables, Flow Variables |
|
|
33 | (1) |
|
2.3 Types of Observations |
|
|
33 | (2) |
|
|
35 | (2) |
|
2.A1 Case Study -- Finding a Good Deal among Hotels: Data Preparation |
|
|
36 | (1) |
|
2.5 Tidy Approach for Multi-dimensional Data |
|
|
37 | (1) |
|
2.B1 Case Study -- Displaying Immunization Rates across Countries |
|
|
37 | (1) |
|
2.6 Relational Data and Linking Data Tables |
|
|
38 | (4) |
|
2.C1 Case Study -- Identifying Successful Football Managers |
|
|
40 | (2) |
|
2.7 Entity Resolution: Duplicates, Ambiguous Identification, and Non-entity Rows |
|
|
42 | (2) |
|
2.C2 Case Study -- Identifying Successful Football Managers |
|
|
43 | (1) |
|
2.8 Discovering Missing Values |
|
|
44 | (2) |
|
2.9 Managing Missing Values |
|
|
46 | (2) |
|
2.A2 Case Study -- Finding a Good Deal among Hotels: Data Preparation |
|
|
47 | (1) |
|
2.10 The Process of Cleaning Data |
|
|
48 | (1) |
|
2.11 Reproducible Workflow: Write Code and Document Your Steps |
|
|
49 | (1) |
|
2.12 Organizing Data Tables for a Project |
|
|
50 | (4) |
|
2.C3 Case Study -- Identifying Successful Football Managers |
|
|
52 | (1) |
|
2.C4 Case Study -- Identifying Successful Football Managers |
|
|
53 | (1) |
|
|
54 | (4) |
|
|
54 | (1) |
|
|
55 | (1) |
|
References and Further Reading |
|
|
56 | (1) |
|
2.U1 Under the Hood: Naming Files |
|
|
56 | (2) |
|
3 Exploratory Data Analysis |
|
|
58 | (38) |
|
3.1 Why Do Exploratory Data Analysis? |
|
|
59 | (1) |
|
3.2 Frequencies and Probabilities |
|
|
60 | (1) |
|
3.3 Visualizing Distributions |
|
|
61 | (4) |
|
3.A1 Case Study -- Finding a Good Deal among Hotels: Data Exploration |
|
|
62 | (3) |
|
|
65 | (3) |
|
3.A2 Case Study -- Finding a Good Deal among Hotels: Data Exploration |
|
|
66 | (2) |
|
3.5 Good Graphs: Guidelines for Data Visualization |
|
|
68 | (4) |
|
3.A3 Case Study -- Finding a Good Deal among Hotels: Data Exploration |
|
|
71 | (1) |
|
3.6 Summary Statistics for Quantitative Variables |
|
|
72 | (5) |
|
3.B1 Case Study -- Comparing Hotel Prices in Europe: Vienna vs. London |
|
|
74 | (3) |
|
3.7 Visualizing Summary Statistics |
|
|
77 | (3) |
|
3.C1 Case Study -- Measuring Home Team Advantage in Football |
|
|
78 | (2) |
|
|
80 | (3) |
|
3.C2 Case Study -- Measuring Home Team Advantage in Football |
|
|
82 | (1) |
|
3.9 Theoretical Distributions |
|
|
83 | (4) |
|
3.D1 Case Study -- Distributions of Body Height and Income |
|
|
85 | (2) |
|
3.10 Steps of Exploratory Data Analysis |
|
|
87 | (1) |
|
|
88 | (8) |
|
|
88 | (1) |
|
|
89 | (1) |
|
References and Further Reading |
|
|
90 | (1) |
|
3.U1 Under the Hood: More on Theoretical Distributions |
|
|
90 | (1) |
|
|
91 | (1) |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
92 | (4) |
|
4 Comparison and Correlation |
|
|
96 | (22) |
|
|
97 | (3) |
|
4.A1 Case Study -- Management Quality and Firm Size: Describing Patterns of Association |
|
|
98 | (2) |
|
|
100 | (1) |
|
4.3 Conditional Probabilities |
|
|
101 | (2) |
|
4.A2 Case Study -- Management Quality and Firm Size: Describing Patterns of Association |
|
|
102 | (1) |
|
4.4 Conditional Distribution, Conditional Expectation |
|
|
103 | (1) |
|
4.5 Conditional Distribution, Conditional Expectation with Quantitative x |
|
|
104 | (4) |
|
4.A3 Case Study -- Management Quality and Firm Size: Describing Patterns of Association |
|
|
105 | (3) |
|
4.6 Dependence, Covariance, Correlation |
|
|
108 | (2) |
|
4.7 From Latent Variables to Observed Variables |
|
|
110 | (3) |
|
4.A4 Case Study -- Management Quality and Firm Size: Describing Patterns of Association |
|
|
111 | (2) |
|
4.8 Sources of Variation in x |
|
|
113 | (1) |
|
|
114 | (4) |
|
|
115 | (1) |
|
|
115 | (1) |
|
References and Further Reading |
|
|
116 | (1) |
|
4.U1 Under the Hood: Inverse Conditional Probabilities, Bayes' Rule |
|
|
116 | (2) |
|
|
118 | (25) |
|
5.1 When to Generalize and to What? |
|
|
119 | (3) |
|
5.A1 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
121 | (1) |
|
5.2 Repeated Samples, Sampling Distribution, Standard Error |
|
|
122 | (3) |
|
5.A2 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
123 | (2) |
|
5.3 Properties of the Sampling Distribution |
|
|
125 | (3) |
|
5.A3 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
127 | (1) |
|
5.4 The confidence interval |
|
|
128 | (1) |
|
5.A4 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
129 | (1) |
|
5.5 Discussion of the CI: Confidence or Probability? |
|
|
129 | (1) |
|
5.6 Estimating the Standard Error with the Bootstrap Method |
|
|
130 | (3) |
|
5.A5 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
132 | (1) |
|
5.7 The Standard Error Formula |
|
|
133 | (2) |
|
5.A6 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
134 | (1) |
|
|
135 | (2) |
|
5.A7 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio? |
|
|
136 | (1) |
|
5.9 Big Data, Statistical Inference, External Validity |
|
|
137 | (1) |
|
|
138 | (5) |
|
|
138 | (1) |
|
|
139 | (1) |
|
References and Further Reading |
|
|
139 | (1) |
|
5.U1 Under the Hood: The Law of Large Numbers and the Central Limit Theorem |
|
|
140 | (3) |
|
|
143 | (26) |
|
6.1 The Logic of Testing Hypotheses |
|
|
144 | (4) |
|
6.A1 Case Study -- Comparing Online and Offline Prices: Testing the Difference |
|
|
145 | (3) |
|
6.2 Null Hypothesis, Alternative Hypothesis |
|
|
148 | (1) |
|
|
149 | (1) |
|
6.4 Making a Decision; False Negatives, False Positives |
|
|
150 | (4) |
|
|
154 | (3) |
|
6.A2 Case Study -- Comparing Online and Offline Prices: Testing the Difference |
|
|
155 | (2) |
|
6.6 Steps of Hypothesis Testing |
|
|
157 | (1) |
|
6.7 One-Sided Alternatives |
|
|
158 | (2) |
|
6.B1 Case Study -- Testing the Likelihood of Loss on a Stock Portfolio |
|
|
159 | (1) |
|
6.8 Testing Multiple Hypotheses |
|
|
160 | (2) |
|
6.A3 Case Study -- Comparing Online and Offline Prices: Testing the Difference |
|
|
161 | (1) |
|
|
162 | (2) |
|
6.10 Testing Hypotheses with Big Data |
|
|
164 | (1) |
|
|
165 | (4) |
|
|
165 | (1) |
|
|
166 | (1) |
|
References and Further Reading |
|
|
167 | (2) |
|
|
169 | (194) |
|
|
171 | (29) |
|
7.1 When and Why Do Simple Regression Analysis? |
|
|
172 | (1) |
|
7.2 Regression: Definition |
|
|
172 | (2) |
|
7.3 Non-parametric Regression |
|
|
174 | (4) |
|
7.A1 Case Study -- Finding a Good Deal among Hotels with Simple Regression |
|
|
175 | (3) |
|
7.4 Linear Regression: Introduction |
|
|
178 | (1) |
|
7.5 Linear Regression: Coefficient Interpretation |
|
|
179 | (1) |
|
7.6 Linear Regression with a Binary Explanatory Variable |
|
|
180 | (1) |
|
|
181 | (3) |
|
7.A2 Case Study -- Finding a Good Deal among Hotels with Simple Regression |
|
|
183 | (1) |
|
7.8 Predicted Dependent Variable and Regression Residual |
|
|
184 | (4) |
|
7.A3 Case Study -- Finding a Good Deal among Hotels with Simple Regression |
|
|
185 | (3) |
|
7.9 Goodness of Fit, R-Squared |
|
|
188 | (1) |
|
7.10 Correlation and Linear Regression |
|
|
189 | (1) |
|
7.11 Regression Analysis, Regression toward the Mean, Mean Reversion |
|
|
190 | (1) |
|
7.12 Regression and Causation |
|
|
190 | (2) |
|
7.A4 Case Study -- Finding a Good Deal among Hotels with Simple Regression |
|
|
192 | (1) |
|
|
192 | (8) |
|
|
193 | (1) |
|
|
193 | (1) |
|
References and Further Reading |
|
|
194 | (1) |
|
7.U1 Under the Hood: Derivation of the OLS Formulae for the Intercept and Slope Coefficients |
|
|
194 | (3) |
|
7.U2 Under the Hood: More on Residuals and Predicted Values with OLS |
|
|
197 | (3) |
|
8 Complicated Patterns and Messy Data |
|
|
200 | (36) |
|
8.1 When and Why Care about the Shape of the Association between y and x? |
|
|
201 | (1) |
|
8.2 Taking Relative Differences or Log |
|
|
202 | (2) |
|
8.3 Log Transformation and Non-positive Values |
|
|
204 | (2) |
|
8.4 Interpreting Log Values in a Regression |
|
|
206 | (4) |
|
8.A1 Case Study -- Finding a Good Deal among Hotels with Nonlinear Function |
|
|
207 | (3) |
|
8.5 Other Transformations of Variables |
|
|
210 | (5) |
|
8.B1 Case Study -- How is Life Expectancy Related to the Average Income of a Country? |
|
|
210 | (5) |
|
8.6 Regression with a Piecewise Linear Spline |
|
|
215 | (1) |
|
8.7 Regression with Polynomial |
|
|
216 | (2) |
|
8.8 Choosing a Functional Form in a Regression |
|
|
218 | (3) |
|
8.B2 Case Study -- How is Life Expectancy Related to the Average Income of a Country? |
|
|
219 | (2) |
|
8.9 Extreme Values and Influential Observations |
|
|
221 | (1) |
|
8.10 Measurement Error in Variables |
|
|
222 | (1) |
|
8.11 Classical Measurement Error |
|
|
223 | (4) |
|
8.C1 Case Study -- Hotel Ratings and Measurement Error |
|
|
225 | (2) |
|
8.12 Non-classical Measurement Error and General Advice |
|
|
227 | (1) |
|
8.13 Using Weights in Regression Analysis |
|
|
228 | (2) |
|
8.B3 Case Study -- How is Life Expectancy Related to the Average Income of a Country? |
|
|
229 | (1) |
|
|
230 | (6) |
|
|
231 | (1) |
|
|
232 | (1) |
|
References and Further Reading |
|
|
232 | (1) |
|
8.U1 Under the Hood: Details of the Log Approximation |
|
|
233 | (1) |
|
8.U2 Under the Hood: Deriving the Consequences of Classical Measurement Error |
|
|
234 | (2) |
|
9 Generalizing Results of a Regression |
|
|
236 | (30) |
|
9.1 Generalizing Linear Regression Coefficients |
|
|
237 | (1) |
|
9.2 Statistical Inference: CI and SE of Regression Coefficients |
|
|
238 | (5) |
|
9.A1 Case Study -- Estimating Gender and Age Differences in Earnings |
|
|
240 | (3) |
|
9.3 Intervals for Predicted Values |
|
|
243 | (6) |
|
9.A2 Case Study -- Estimating Gender and Age Differences in Earnings |
|
|
245 | (4) |
|
9.4 Testing Hypotheses about Regression Coefficients |
|
|
249 | (2) |
|
9.5 Testing More Complex Hypotheses |
|
|
251 | (2) |
|
9.A3 Case Study -- Estimating Gender and Age Differences in Earnings |
|
|
252 | (1) |
|
9.6 Presenting Regression Results |
|
|
253 | (3) |
|
9.A4 Case Study -- Estimating Gender and Age Differences in Earnings |
|
|
254 | (2) |
|
9.7 Data Analysis to Help Assess External Validity |
|
|
256 | (4) |
|
9.B1 Case Study -- How Stable is the Hotel Price-Distance to Center Relationship? |
|
|
256 | (4) |
|
|
260 | (6) |
|
|
261 | (1) |
|
|
261 | (1) |
|
References and Further Reading |
|
|
262 | (1) |
|
9.U1 Under the Hood: The Simple SE Formula for Regression Intercept |
|
|
262 | (1) |
|
9.U2 Under the Hood: The Law of Large Numbers for β |
|
|
263 | (1) |
|
9.U3 Under the Hood: Deriving SE(β) with the Central Limit Theorem |
|
|
264 | (1) |
|
9.U4 Under the Hood: Degrees of Freedom Adjustment for the SE Formula |
|
|
265 | (1) |
|
10 Multiple Linear Regression |
|
|
266 | (31) |
|
10.1 Multiple Regression: Why and When? |
|
|
267 | (1) |
|
10.2 Multiple Linear Regression with Two Explanatory Variables |
|
|
267 | (1) |
|
10.3 Multiple Regression and Simple Regression: Omitted Variable Bias |
|
|
268 | (4) |
|
10.A1 Case Study -- Understanding the Gender Difference in Earnings |
|
|
270 | (2) |
|
10.4 Multiple Linear Regression Terminology |
|
|
272 | (1) |
|
10.5 Standard Errors and Confidence Intervals in Multiple Linear Regression |
|
|
273 | (2) |
|
10.6 Hypothesis Testing in Multiple Linear Regression |
|
|
275 | (1) |
|
10.A2 Case Study -- Understanding the Gender Difference in Earnings |
|
|
275 | (1) |
|
10.7 Multiple Linear Regression with Three or More Explanatory Variables |
|
|
276 | (1) |
|
10.8 Nonlinear Patterns and Multiple Linear Regression |
|
|
277 | (2) |
|
10.A3 Case Study -- Understanding the Gender Difference in Earnings |
|
|
278 | (1) |
|
10.9 Qualitative Right-Hand-Side Variables |
|
|
279 | (3) |
|
10.A4 Case Study -- Understanding the Gender Difference in Earnings |
|
|
280 | (2) |
|
10.10 Interactions: Uncovering Different Slopes across Groups |
|
|
282 | (4) |
|
10.A5 Case Study -- Understanding the Gender Difference in Earnings |
|
|
284 | (2) |
|
10.11 Multiple Regression and Causal Analysis |
|
|
286 | (4) |
|
10.A6 Case Study -- Understanding the Gender Difference in Earnings |
|
|
287 | (3) |
|
10.12 Multiple Regression and Prediction |
|
|
290 | (4) |
|
10.B1 Case Study -- Finding a Good Deal among Hotels with Multiple Regression |
|
|
292 | (2) |
|
|
294 | (3) |
|
|
294 | (1) |
|
|
295 | (1) |
|
References and Further Reading |
|
|
296 | (1) |
|
10.U1 Under the Hood: A Two-Step Procedure to Get the Multiple Regression Coefficient |
|
|
296 | (1) |
|
11 Modeling Probabilities |
|
|
297 | (32) |
|
11.1 The Linear Probability Model |
|
|
298 | (1) |
|
11.2 Predicted Probabilities in the Linear Probability Model |
|
|
299 | (8) |
|
11.A1 Case Study -- Does Smoking Pose a Health Risk? |
|
|
301 | (6) |
|
|
307 | (2) |
|
11.A2 Case Study -- Does Smoking Pose a Health Risk? |
|
|
308 | (1) |
|
11.4 Marginal Differences |
|
|
309 | (3) |
|
11.A3 Case Study -- Does Smoking Pose a Health Risk? |
|
|
311 | (1) |
|
11.5 Goodness of Fit: R-Squared and Alternatives |
|
|
312 | (2) |
|
11.6 The Distribution of Predicted Probabilities |
|
|
314 | (1) |
|
11.7 Bias and Calibration |
|
|
314 | (3) |
|
11 B1 Case Study -- Are Australian Weather Forecasts Well Calibrated? |
|
|
315 | (2) |
|
|
317 | (4) |
|
11.A4 Case Study -- Does Smoking Pose a Health risk? |
|
|
318 | (3) |
|
11.9 Using Probability Models for Other Kinds of y Variables |
|
|
321 | (2) |
|
|
323 | (6) |
|
|
323 | (1) |
|
|
324 | (1) |
|
References and Further Reading |
|
|
325 | (1) |
|
11.U1 Under the Hood: Saturated Models |
|
|
325 | (1) |
|
11.U2 Under the Hood: Maximum Likelihood Estimation and Search Algorithms |
|
|
326 | (1) |
|
11.U3 Under the Hood: From Logit and Probit Coefficients to Marginal Differences |
|
|
327 | (2) |
|
12 Regression with Time Series Data |
|
|
329 | (34) |
|
12.1 Preparation of Time Series Data |
|
|
330 | (2) |
|
12.2 Trend and Seasonality |
|
|
332 | (1) |
|
12.3 Stationarity, Non-stationarity, Random Walk |
|
|
333 | (5) |
|
12.A1 Case Study -- Returns on a Company Stock and Market Returns |
|
|
335 | (3) |
|
12.4 Time Series Regression |
|
|
338 | (5) |
|
12.A2 Case Study -- Returns on a Company Stock and Market Returns |
|
|
339 | (4) |
|
12.5 Trends, Seasonality, Random Walks in a Regression |
|
|
343 | (6) |
|
12.B1 Case Study -- Electricity Consumption and Temperature |
|
|
346 | (3) |
|
|
349 | (1) |
|
12.7 Dealing with Serial Correlation in Time Series Regressions |
|
|
350 | (5) |
|
12.B2 Case Study -- Electricity Consumption and Temperature |
|
|
352 | (3) |
|
12.8 Lags of x in a Time Series Regression |
|
|
355 | (4) |
|
12.B3 Case Study -- Electricity Consumption and Temperature |
|
|
357 | (2) |
|
12.9 The Process of Time Series Regression Analysis |
|
|
359 | (1) |
|
|
360 | (3) |
|
|
360 | (1) |
|
|
361 | (1) |
|
References and Further Reading |
|
|
362 | (1) |
|
12.U1 Under the Hood: Testing for Unit Root |
|
|
362 | (1) |
|
|
363 | (154) |
|
13 A Framework for Prediction |
|
|
365 | (26) |
|
|
366 | (1) |
|
13.2 Various Kinds of Prediction |
|
|
367 | (2) |
|
13.A1 Case Study -- Predicting Used Car Value with Linear Regressions |
|
|
369 | (1) |
|
13.3 The Prediction Error and Its Components |
|
|
369 | (4) |
|
13.A2 Case Study -- Predicting Used Car Value with Linear Regressions |
|
|
371 | (2) |
|
|
373 | (2) |
|
13.5 Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) |
|
|
375 | (1) |
|
13.6 Bias and Variance of Predictions |
|
|
376 | (1) |
|
13.7 The Task of Finding the Best Model |
|
|
377 | (2) |
|
13.8 Finding the Best Model by Best Fit and Penalty: The BIC |
|
|
379 | (1) |
|
13.9 Finding the Best Model by Training and Test Samples |
|
|
380 | (2) |
|
13.10 Finding the Best Model by Cross-Validation |
|
|
382 | (2) |
|
13.A3 Case Study -- Predicting Used Car Value with Linear Regressions |
|
|
383 | (1) |
|
13.11 External Validity and Stable Patterns |
|
|
384 | (3) |
|
13.A4 Case Study -- Predicting Used Car Value with Linear Regressions |
|
|
386 | (1) |
|
13.12 Machine Learning and the Role of Algorithms |
|
|
387 | (2) |
|
|
389 | (2) |
|
|
389 | (1) |
|
|
390 | (1) |
|
References and Further Reading |
|
|
390 | (1) |
|
14 Model Building for Prediction |
|
|
391 | (26) |
|
|
392 | (1) |
|
|
393 | (1) |
|
14.3 Label Engineering and Predicting Log y |
|
|
394 | (3) |
|
14.A1 Case Study -- Predicting Used Car Value: Log Prices |
|
|
395 | (2) |
|
14.4 Feature Engineering: Dealing with Missing Values |
|
|
397 | (1) |
|
14.5 Feature Engineering: What x Variables to Have and in What Functional Form |
|
|
398 | (4) |
|
14.B1 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model |
|
|
399 | (3) |
|
14.6 We Can't Try Out All Possible Models |
|
|
402 | (1) |
|
14.7 Evaluating the Prediction Using a Holdout Set |
|
|
403 | (4) |
|
14.B2 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model |
|
|
404 | (3) |
|
14.8 Selecting Variables in Regressions by LASSO |
|
|
407 | (3) |
|
14.B3 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model |
|
|
409 | (1) |
|
|
410 | (2) |
|
14.B4 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model |
|
|
411 | (1) |
|
14.10 Prediction with Big Data |
|
|
412 | (2) |
|
|
414 | (3) |
|
|
414 | (1) |
|
|
415 | (1) |
|
References and Further Reading |
|
|
415 | (1) |
|
14.U1 Under the Hood: Text Parsing |
|
|
415 | (1) |
|
14.U2 Under the Hood: Log Correction |
|
|
416 | (1) |
|
|
417 | (21) |
|
15.1 The Case for Regression Trees |
|
|
418 | (1) |
|
15.2 Regression Tree Basics |
|
|
419 | (1) |
|
15.3 Measuring Fit and Stopping Rules |
|
|
420 | (5) |
|
5.A1 Case Study -- Predicting Used Car Value with a Regression Tree |
|
|
421 | (4) |
|
15.4 Regression Tree with Multiple Predictor Variables |
|
|
425 | (1) |
|
15.5 Pruning a Regression Tree |
|
|
426 | (1) |
|
15.6 A Regression Tree is a Non-parametric Regression |
|
|
426 | (4) |
|
15.A2 Case Study -- Predicting Used Car Value with a Regression Tree |
|
|
427 | (3) |
|
|
430 | (1) |
|
15.8 Pros and Cons of Using a Regression Tree for Prediction |
|
|
431 | (4) |
|
15.A3 Case Study -- Predicting Used Car Value with a Regression Tree |
|
|
433 | (2) |
|
|
435 | (3) |
|
|
435 | (1) |
|
|
436 | (1) |
|
References and Further Reading |
|
|
437 | (1) |
|
16 Random Forest and Boosting |
|
|
438 | (19) |
|
16.1 From a Tree to a Forest: Ensemble Methods |
|
|
439 | (1) |
|
|
440 | (2) |
|
16.3 The Practice of Prediction with Random Forest |
|
|
442 | (2) |
|
16.A1 Case Study -- Predicting Airbnb Apartment Prices with Random Forest |
|
|
443 | (1) |
|
16.4 Diagnostics: The Variable Importance Plot |
|
|
444 | (1) |
|
16.5 Diagnostics: The Partial Dependence Plot |
|
|
445 | (1) |
|
16.6 Diagnostics: Fit in Various Subsets |
|
|
446 | (3) |
|
16.A2 Case Study -- Predicting Airbnb Apartment Prices with Random Forest |
|
|
446 | (3) |
|
16.7 An Introduction to Boosting and the GBM Model |
|
|
449 | (3) |
|
16.A3 Case Study -- Predicting Airbnb Apartment Prices with Random Forest |
|
|
450 | (2) |
|
16.8 A Review of Different Approaches to Predict a Quantitative y |
|
|
452 | (2) |
|
|
454 | (3) |
|
|
454 | (1) |
|
|
455 | (1) |
|
References and Further Reading |
|
|
456 | (1) |
|
17 Probability Prediction and Classification |
|
|
457 | (30) |
|
17.1 Predicting a Binary y. Probability Prediction and Classification |
|
|
458 | (4) |
|
17.A1 Case Study -- Predicting Firm Exit: Probability and Classification |
|
|
459 | (3) |
|
17.2 The Practice of Predicting Probabilities |
|
|
462 | (4) |
|
17.A2 Case Study -- Predicting Firm Exit: Probability and Classification |
|
|
463 | (3) |
|
17.3 Classification and the Confusion Table |
|
|
466 | (2) |
|
17.4 Illustrating the Trade-Off between Different Classification Thresholds: The ROC Curve |
|
|
468 | (3) |
|
17.A3 Case Study -- Predicting Firm Exit: Probability and Classification |
|
|
469 | (2) |
|
17.5 Loss Function and Finding the Optimal Classification Threshold |
|
|
471 | (4) |
|
17.A4 Case Study -- Predicting Firm Exit: Probability and Classification |
|
|
473 | (2) |
|
17.6 Probability Prediction and Classification with Random Forest |
|
|
475 | (5) |
|
17.A5 Case Study -- Predicting Firm Exit: Probability and Classification |
|
|
477 | (3) |
|
|
480 | (1) |
|
17.8 The Process of Prediction with a Binary Target Variable |
|
|
481 | (1) |
|
|
482 | (5) |
|
|
482 | (1) |
|
|
483 | (1) |
|
References and Further Reading |
|
|
484 | (1) |
|
17.U1 Under the Hood: The Gini Node Impurity Measure and MSE |
|
|
484 | (1) |
|
17.U2 Under the Hood: On the Method of Finding an Optimal Threshold |
|
|
485 | (2) |
|
18 Forecasting from Time Series Data |
|
|
487 | (30) |
|
18.1 Forecasting: Prediction Using Time Series Data |
|
|
488 | (1) |
|
18.2 Holdout, Training, and Test Samples in Time Series Data |
|
|
489 | (2) |
|
18.3 Long-Horizon Forecasting: Seasonality and Predictable Events |
|
|
491 | (1) |
|
18.4 Long-Horizon Forecasting: Trends |
|
|
492 | (8) |
|
18.A1 Case Study -- Forecasting Daily Ticket Volumes for a Swimming Pool |
|
|
494 | (6) |
|
18.5 Forecasting for a Short Horizon Using the Patterns of Serial Correlation |
|
|
500 | (1) |
|
18.6 Modeling Serial Correlation: AR(1) |
|
|
500 | (1) |
|
18.7 Modeling Serial Correlation: ARIMA |
|
|
501 | (4) |
|
18.B1 Case Study -- Forecasting a Home Price Index |
|
|
503 | (2) |
|
18.8 VAR: Vector Autoregressions |
|
|
505 | (4) |
|
18.B2 Case Study -- Forecasting a Home Price index |
|
|
507 | (2) |
|
18.9 External Validity of Forecasts |
|
|
509 | (3) |
|
18.B3 Case Study -- Forecasting a Home Price Index |
|
|
510 | (2) |
|
|
512 | (5) |
|
|
512 | (1) |
|
|
513 | (1) |
|
References and Further Reading |
|
|
514 | (1) |
|
18.U1 Under the Hood: Details of the ARIMA Model |
|
|
514 | (2) |
|
18.U2 Under the Hood: Auto-Arima |
|
|
516 | (1) |
|
|
517 | (187) |
|
19 A Framework for Causal Analysis |
|
|
519 | (36) |
|
19.1 Intervention, Treatment, Subjects, Outcomes |
|
|
520 | (2) |
|
|
522 | (1) |
|
19.3 The Individual Treatment Effect |
|
|
523 | (1) |
|
19.4 Heterogeneous Treatment Effects |
|
|
524 | (1) |
|
19.5 ATE: The Average Treatment Effect |
|
|
525 | (2) |
|
19.6 Average Effects in Subgroups and ATET |
|
|
527 | (1) |
|
19.7 Quantitative Causal Variables |
|
|
527 | (3) |
|
19.A1 Case Study -- Food and Health |
|
|
528 | (2) |
|
19.8 Ceteris Paribus: Other Things Being the Same |
|
|
530 | (1) |
|
|
531 | (2) |
|
19.10 Comparing Different Observations to Uncover Average Effects |
|
|
533 | (2) |
|
|
535 | (1) |
|
19.12 Sources of Variation in the Causal Variable |
|
|
536 | (3) |
|
19.A2 Case Study -- Food and Health |
|
|
537 | (2) |
|
19.13 Experimenting versus Conditioning |
|
|
539 | (2) |
|
19.14 Confounders in Observational Data |
|
|
541 | (2) |
|
19.15 From Latent Variables to Measured Variables |
|
|
543 | (1) |
|
19.16 Bad Conditioners: Variables Not to Condition On |
|
|
544 | (5) |
|
19.A3 Case Study -- Food and Health |
|
|
545 | (4) |
|
19.17 External Validity, Internal Validity |
|
|
549 | (2) |
|
19.18 Constructive Skepticism |
|
|
551 | (1) |
|
|
552 | (3) |
|
|
552 | (1) |
|
|
553 | (1) |
|
References and Further Reading |
|
|
554 | (1) |
|
20 Designing and Analyzing Experiments |
|
|
555 | (33) |
|
20.1 Randomized Experiments and Potential Outcomes |
|
|
556 | (1) |
|
20.2 Field Experiments, A/B Testing, Survey Experiments |
|
|
557 | (3) |
|
20.A1 Case Study -- Working from Home and Employee Performance |
|
|
558 | (1) |
|
20.B1 Case Study -- Fine Tuning Social Media Advertising |
|
|
559 | (1) |
|
20.3 The Experimental Setup: Definitions |
|
|
560 | (1) |
|
20.4 Random Assignment in Practice |
|
|
560 | (2) |
|
20.5 Number of Subjects and Proportion Treated |
|
|
562 | (1) |
|
20.6 Random Assignment and Covariate Balance |
|
|
563 | (4) |
|
20.A2 Case Study -- Working from Home and Employee Performance |
|
|
565 | (2) |
|
20.7 Imperfect Compliance and Intent-to-Treat |
|
|
567 | (3) |
|
20.A3 Case Study -- Working from Home and Employee Performance |
|
|
569 | (1) |
|
20.8 Estimation and Statistical Inference |
|
|
570 | (2) |
|
20.B2 Case Study -- Fine Tuning Social Media Advertising |
|
|
571 | (1) |
|
20.9 Including Covariates in a Regression |
|
|
572 | (4) |
|
20.A4 Case Study -- Working from Home and Employee Performance |
|
|
573 | (3) |
|
|
576 | (1) |
|
20.11 Additional Threats to Internal Validity |
|
|
577 | (4) |
|
20.A5 Case Study -- Working from Home and Employee Performance |
|
|
579 | (2) |
|
20.12 External Validity, and How to Use the Results in Decision Making |
|
|
581 | (2) |
|
20.A6 Case Study -- Working from Home and Employee Performance |
|
|
582 | (1) |
|
|
583 | (5) |
|
|
584 | (1) |
|
|
585 | (1) |
|
References and Further Reading |
|
|
585 | (1) |
|
20.U1 Under the Hood: LATE: The Local Average Treatment Effect |
|
|
586 | (1) |
|
20.U2 Under the Hood: The Formula for Sample Size Calculation |
|
|
586 | (2) |
|
21 Regression and Matching with Observational Data |
|
|
588 | (32) |
|
|
589 | (2) |
|
21.A1 Case Study -- Founder/Family Ownership and Quality of Management |
|
|
590 | (1) |
|
21.2 Variables to Condition on. Variables Not to Condition On |
|
|
591 | (4) |
|
21.A2 Case Study -- Founder/Family Ownership and Quality of Management |
|
|
592 | (3) |
|
21.3 Conditioning on Confounders by Regression |
|
|
595 | (2) |
|
21.4 Selection of Variables and Functional Form in a Regression for Causal Analysis |
|
|
597 | (4) |
|
21.A3 Case Study -- Founder/Family Ownership and Quality of Management |
|
|
598 | (3) |
|
|
601 | (2) |
|
|
603 | (1) |
|
21.7 Matching on the Propensity Score |
|
|
604 | (3) |
|
21.A4 Case Study -- Founder/Family Ownership and Quality of Management |
|
|
605 | (2) |
|
21.8 Comparing Linear Regression and Matching |
|
|
607 | (3) |
|
21.A5 Case Study -- Founder/Family Ownership and Quality of Management |
|
|
609 | (1) |
|
21.9 Instrumental Variables |
|
|
610 | (3) |
|
21.10 Regression-Discontinuity |
|
|
613 | (1) |
|
|
614 | (6) |
|
|
614 | (1) |
|
|
615 | (1) |
|
References and Further Reading |
|
|
616 | (1) |
|
21.U1 Under the Hood: Unobserved Heterogeneity and Endogenous x in a Regression |
|
|
616 | (2) |
|
21.U2 Under the hood: LATE is IV |
|
|
618 | (2) |
|
22 Difference-in-Differences |
|
|
620 | (29) |
|
22.1 Conditioning on Pre-intervention Outcomes |
|
|
621 | (1) |
|
22.2 Basic Difference-in-Differences Analysis: Comparing Average Changes |
|
|
622 | (7) |
|
22.A1 Case Study -- How Does a Merger between Airlines Affect Prices? |
|
|
625 | (4) |
|
22.3 The Parallel Trends Assumption |
|
|
629 | (4) |
|
22.A2 Case Study -- How Does a Merger between Airlines Affect Prices? |
|
|
631 | (2) |
|
22.4 Conditioning on Additional Confounders in Diff-in-Diffs Regressions |
|
|
633 | (4) |
|
22.A3 Case Study -- How Does a Merger between Airlines Affect Prices? |
|
|
635 | (2) |
|
22.5 Quantitative Causal Variable |
|
|
637 | (3) |
|
22.A4 Case Study -- How Does a Merger between Airlines Affect Prices? |
|
|
638 | (2) |
|
22.6 Difference-in-Differences with Pooled Cross-Sections |
|
|
640 | (5) |
|
22.A5 Case Study -- How Does a Merger between Airlines Affect Prices? |
|
|
643 | (2) |
|
|
645 | (4) |
|
|
646 | (1) |
|
|
647 | (1) |
|
References and Further Reading |
|
|
648 | (1) |
|
23 Methods for Panel Data |
|
|
649 | (32) |
|
23.1 Multiple Time Periods Can Be Helpful |
|
|
650 | (1) |
|
23.2 Estimating Effects Using Observational Time Series |
|
|
651 | (2) |
|
23.3 Lags to Estimate the Time Path of Effects |
|
|
653 | (1) |
|
23.4 Leads to Examine Pre-trends and Reverse Effects |
|
|
653 | (1) |
|
23.5 Pooled Time Series to Estimate the Effect for One Unit |
|
|
654 | (5) |
|
23.A1 Case Study -- Import Demand and Industrial Production |
|
|
656 | (3) |
|
23.6 Panel Regression with Fixed Effects |
|
|
659 | (2) |
|
|
661 | (4) |
|
23.B1 Case Study -- Immunization against Measles and Saving Children |
|
|
662 | (3) |
|
23.8 Clustered Standard Errors |
|
|
665 | (1) |
|
23.9 Panel Regression in First Differences |
|
|
666 | (1) |
|
23.10 Lags and Leads in FD Panel Regressions |
|
|
667 | (4) |
|
23.B2 Case Study -- Immunization against Measles and Saving Children |
|
|
669 | (2) |
|
23.11 Aggregate Trend and Individual Trends in FD Models |
|
|
671 | (3) |
|
23.B3 Case Study -- Immunization against Measles and Saving Children |
|
|
672 | (2) |
|
23.12 Panel Regressions and Causality |
|
|
674 | (1) |
|
23.13 First Differences or Fixed Effects? |
|
|
675 | (2) |
|
23.14 Dealing with Unbalanced Panels |
|
|
677 | (1) |
|
|
678 | (3) |
|
|
678 | (2) |
|
|
680 | (1) |
|
References and Further Reading |
|
|
680 | (1) |
|
24 Appropriate Control Groups for Panel Data |
|
|
681 | (23) |
|
24.1 When and Why to Select a Control Group in xt Panel Data |
|
|
682 | (1) |
|
24.2 Comparative Case Studies |
|
|
682 | (1) |
|
24.3 The Synthetic Control Method |
|
|
683 | (4) |
|
24.A1 Case Study -- Estimating the Effect of the 2010 Haiti Earthquake on GDP |
|
|
684 | (3) |
|
|
687 | (7) |
|
24.B1 Case Study -- Estimating the Impact of Replacing Football Team Managers |
|
|
690 | (4) |
|
24.5 Selecting a Control Group in Event Studies |
|
|
694 | (7) |
|
24.B2 Case Study -- Estimating the Impact of Replacing Football Team Managers |
|
|
696 | (5) |
|
|
701 | (3) |
|
|
701 | (1) |
|
|
702 | (1) |
|
References and Further Reading |
|
|
703 | (1) |
References |
|
704 | (5) |
Index |
|
709 | |