Muutke küpsiste eelistusi

Data Analysis for Business, Economics, and Policy [Pehme köide]

, (University of Michigan, Ann Arbor)
  • Formaat: Paperback / softback, 738 pages, kõrgus x laius x paksus: 246x190x33 mm, kaal: 1590 g, Worked examples or Exercises
  • Ilmumisaeg: 06-May-2021
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1108716202
  • ISBN-13: 9781108716208
  • Formaat: Paperback / softback, 738 pages, kõrgus x laius x paksus: 246x190x33 mm, kaal: 1590 g, Worked examples or Exercises
  • Ilmumisaeg: 06-May-2021
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1108716202
  • ISBN-13: 9781108716208
This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real-life questions; to carry out data analysis; and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other. As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry-relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python, can be found at www.gabors-data-analysis.com.

Arvustused

'This exciting new text covers everything today's aspiring data scientist needs to know, managing to be comprehensive as well as accessible. Like a good confidence interval, the Gabors have got you almost completely covered!' Joshua Angrist, Massachusetts Institute of Technology, winner of the Nobel Memorial Prize in Economic Sciences 'This is an excellent book for students learning the art of modern data analytics. It combines the latest techniques with practical applications, replicating the implementation side of classroom teaching that is typically missing in textbooks. For example, they used the World Management Survey data to generate exercises on firm performance for students to gain experience in handling real data, with all its quirks, problems, and issues. For students looking to learn data analysis from one textbook, this is a great way to proceed.' Nicholas Bloom, Stanford University 'I know of few books about data analysis and visualization that are as comprehensive, deep, practical, and current as this one; and I know of almost none that are as fun to read. Gábor Békés and Gábor Kézdi have created a most unusual and most compelling beast: a textbook that teaches you the subject matter well and that, at the same time, you can enjoy reading cover to cover.' Alberto Cairo, University of Miami 'A beautiful integration of econometrics and data science that provides a direct path from data collection and exploratory analysis to conventional regression modeling, then on to prediction and causal modeling. Exactly what is needed to equip the next generation of students with the tools and insights from the two fields.' David Card, University of California, Berkeley, winner of the Nobel Memorial Prize in Economic Sciences 'This textbook is excellent at dissecting and explaining the underlying process of data analysis. Békés and Kézdi have masterfully woven into their instruction a comprehensive range of case studies. The result is a rigorous textbook grounded in real-world learning, at once accessible and engaging to novice scholars and advanced practitioners alike. I have every confidence it will be valued by future generations.' Kerwin K. Charles, Yale School of Management 'This book takes you by the hand in a journey that will bring you to understand the core value of data in the fields of machine learning and economics. The large amount of accessible examples combined with the intuitive explanation of foundational concepts is an ideal mix for anyone who wants to do data analysis. It is highly recommended to anyone interested in the new way in which data will be analyzed in the social sciences in the next years.' Christian Fons-Rosen, Barcelona Graduate School of Economics 'This sophisticatedly simple book is ideal for undergraduate- or Master's-level Data Analytics courses with a broad audience. The authors discuss the key aspects of examining data, regression analysis, prediction, Lasso, and random forests, and more, with using elegant prose instead of algebra. Using well-chosen case studies, they illustrate the techniques and discuss all of them patiently and thoroughly.' Carter Hill, Louisiana State University 'This is not an econometrics textbook. It is a data analysis textbook. And a highly unusual one - written in plain English, based on simplified notation, and full of case studies. An excellent starting point for future data analysts or anyone interested in finding out what data can tell us.' Beata Javorcik, University of Oxford 'A multifaceted book that considers many sides of data analysis, all of them important for the contemporary student and practitioner. It brings together classical statistics, regression, and causal inference, sending the message that awareness of all three aspects is important for success in this field. Many 'best practices' are discussed in accessible language, and illustrated using interesting datasets.' llya Ryzhov, University of Maryland 'This is a fantastic book to have. Strong data skills are critical for modern business and economic research, and this text provides a thorough and practical guide to acquiring them. Highly recommended.' John van Reenen, MIT Sloan 'Energy and climate change is one of the most important public policy challenges, and high- quality data and its empirical analysis is a foundation of solid policy. Data Analysis for Business, Economics, and Policy will make an important contribution to this with its innovative approach. In addition to the comprehensive treatment of modern econometric techniques, the book also covers the less glamorous but crucial aspects of procuring and cleaning data, and drawing useful inferences from less-than-perfect datasets. As the center of gravity of the energy system shifts to developing economies where data quality is still an issue, this will provide an important and practical combination for both academic and policy professionals.' Laszlo Varro, Chief Economist, International Energy Agency 'The content is comprehensive and well-organized, with useful examples and case studies. The level is suitable for my students. The layout is clean and easy to navigate, and the availability in both print and digital formats is convenient. The supporting materials, particularly the Python codes provided, greatly enhance the learning experience by offering practical, hands-on examples.' Ejiro Esohwode, Anglia Ruskin University

Muu info

A comprehensive textbook on data analysis for business, applied economics and public policy that uses case studies with real-world data.
Why Use This Book xxi
Simplified Notation xxiv
Acknowledgments xxv
I DATA EXPLORATION
1(168)
1 Origins of Data
3(27)
1.1 What Is Data?
4(1)
1.2 Data Structures
5(2)
1 A1 Case Study -- Finding a Good Deal among Hotels: Data Collection
6(1)
1.3 Data Quality
7(4)
1.B1 Case Study -- Comparing Online and Offline Prices: Data Collection
9(1)
1.C1 Case Study -- Management Quality and Firm Performance: Data Collection
10(1)
1.4 How Data Is Born: The Big Picture
11(1)
1.5 Collecting Data from Existing Sources
12(4)
1.A2 Case Study -- Finding a Good Deal among Hotels: Data Collection
14(1)
1.B2 Case Study -- Comparing Online and Offline Prices: Data Collection
15(1)
1.6 Surveys
16(2)
1.C2 Case Study -- Management Quality and Firm Size: Data Collection
18(1)
1.7 Sampling
18(1)
1.8 Random Sampling
19(3)
1.B3 Case Study -- Comparing Online and Offline Prices: Data Collection
21(1)
1.C3 Case Study -- Management Quality and Firm Size: Data Collection
21(1)
1.9 Big Data
22(2)
1.10 Good Practices in Data Collection
24(2)
1.11 Ethical and Legal Issues of Data Collection
26(1)
1.12 Main Takeaways
27(3)
Practice Questions
27(1)
Data Exercises
28(1)
References and Further Reading
28(2)
2 Preparing Data for Analysis
30(28)
2.1 Types of Variables
31(2)
2.2 Stock Variables, Flow Variables
33(1)
2.3 Types of Observations
33(2)
2.4 Tidy Data
35(2)
2.A1 Case Study -- Finding a Good Deal among Hotels: Data Preparation
36(1)
2.5 Tidy Approach for Multi-dimensional Data
37(1)
2.B1 Case Study -- Displaying Immunization Rates across Countries
37(1)
2.6 Relational Data and Linking Data Tables
38(4)
2.C1 Case Study -- Identifying Successful Football Managers
40(2)
2.7 Entity Resolution: Duplicates, Ambiguous Identification, and Non-entity Rows
42(2)
2.C2 Case Study -- Identifying Successful Football Managers
43(1)
2.8 Discovering Missing Values
44(2)
2.9 Managing Missing Values
46(2)
2.A2 Case Study -- Finding a Good Deal among Hotels: Data Preparation
47(1)
2.10 The Process of Cleaning Data
48(1)
2.11 Reproducible Workflow: Write Code and Document Your Steps
49(1)
2.12 Organizing Data Tables for a Project
50(4)
2.C3 Case Study -- Identifying Successful Football Managers
52(1)
2.C4 Case Study -- Identifying Successful Football Managers
53(1)
2.13 Main Takeaways
54(4)
Practice Questions
54(1)
Data Exercises
55(1)
References and Further Reading
56(1)
2.U1 Under the Hood: Naming Files
56(2)
3 Exploratory Data Analysis
58(38)
3.1 Why Do Exploratory Data Analysis?
59(1)
3.2 Frequencies and Probabilities
60(1)
3.3 Visualizing Distributions
61(4)
3.A1 Case Study -- Finding a Good Deal among Hotels: Data Exploration
62(3)
3.4 Extreme Values
65(3)
3.A2 Case Study -- Finding a Good Deal among Hotels: Data Exploration
66(2)
3.5 Good Graphs: Guidelines for Data Visualization
68(4)
3.A3 Case Study -- Finding a Good Deal among Hotels: Data Exploration
71(1)
3.6 Summary Statistics for Quantitative Variables
72(5)
3.B1 Case Study -- Comparing Hotel Prices in Europe: Vienna vs. London
74(3)
3.7 Visualizing Summary Statistics
77(3)
3.C1 Case Study -- Measuring Home Team Advantage in Football
78(2)
3.8 Good Tables
80(3)
3.C2 Case Study -- Measuring Home Team Advantage in Football
82(1)
3.9 Theoretical Distributions
83(4)
3.D1 Case Study -- Distributions of Body Height and Income
85(2)
3.10 Steps of Exploratory Data Analysis
87(1)
3.11 Main Takeaways
88(8)
Practice Questions
88(1)
Data Exercises
89(1)
References and Further Reading
90(1)
3.U1 Under the Hood: More on Theoretical Distributions
90(1)
Bernoulli Distribution
91(1)
Binomial Distribution
91(1)
Uniform Distribution
92(1)
Power-Law Distribution
92(4)
4 Comparison and Correlation
96(22)
4.1 The y and the x
97(3)
4.A1 Case Study -- Management Quality and Firm Size: Describing Patterns of Association
98(2)
4.2 Conditioning
100(1)
4.3 Conditional Probabilities
101(2)
4.A2 Case Study -- Management Quality and Firm Size: Describing Patterns of Association
102(1)
4.4 Conditional Distribution, Conditional Expectation
103(1)
4.5 Conditional Distribution, Conditional Expectation with Quantitative x
104(4)
4.A3 Case Study -- Management Quality and Firm Size: Describing Patterns of Association
105(3)
4.6 Dependence, Covariance, Correlation
108(2)
4.7 From Latent Variables to Observed Variables
110(3)
4.A4 Case Study -- Management Quality and Firm Size: Describing Patterns of Association
111(2)
4.8 Sources of Variation in x
113(1)
4.9 Main Takeaways
114(4)
Practice Questions
115(1)
Data Exercises
115(1)
References and Further Reading
116(1)
4.U1 Under the Hood: Inverse Conditional Probabilities, Bayes' Rule
116(2)
5 Generalizing from Data
118(25)
5.1 When to Generalize and to What?
119(3)
5.A1 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
121(1)
5.2 Repeated Samples, Sampling Distribution, Standard Error
122(3)
5.A2 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
123(2)
5.3 Properties of the Sampling Distribution
125(3)
5.A3 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
127(1)
5.4 The confidence interval
128(1)
5.A4 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
129(1)
5.5 Discussion of the CI: Confidence or Probability?
129(1)
5.6 Estimating the Standard Error with the Bootstrap Method
130(3)
5.A5 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
132(1)
5.7 The Standard Error Formula
133(2)
5.A6 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
134(1)
5.8 External Validity
135(2)
5.A7 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?
136(1)
5.9 Big Data, Statistical Inference, External Validity
137(1)
5.10 Main Takeaways
138(5)
Practice Questions
138(1)
Data Exercises
139(1)
References and Further Reading
139(1)
5.U1 Under the Hood: The Law of Large Numbers and the Central Limit Theorem
140(3)
6 Testing Hypotheses
143(26)
6.1 The Logic of Testing Hypotheses
144(4)
6.A1 Case Study -- Comparing Online and Offline Prices: Testing the Difference
145(3)
6.2 Null Hypothesis, Alternative Hypothesis
148(1)
6.3 The t-Test
149(1)
6.4 Making a Decision; False Negatives, False Positives
150(4)
6.5 The p-Value
154(3)
6.A2 Case Study -- Comparing Online and Offline Prices: Testing the Difference
155(2)
6.6 Steps of Hypothesis Testing
157(1)
6.7 One-Sided Alternatives
158(2)
6.B1 Case Study -- Testing the Likelihood of Loss on a Stock Portfolio
159(1)
6.8 Testing Multiple Hypotheses
160(2)
6.A3 Case Study -- Comparing Online and Offline Prices: Testing the Difference
161(1)
6.9 p-Hacking
162(2)
6.10 Testing Hypotheses with Big Data
164(1)
6.11 Main Takeaways
165(4)
Practice Questions
165(1)
Data Exercises
166(1)
References and Further Reading
167(2)
II REGRESSION ANALYSIS
169(194)
7 Simple Regression
171(29)
7.1 When and Why Do Simple Regression Analysis?
172(1)
7.2 Regression: Definition
172(2)
7.3 Non-parametric Regression
174(4)
7.A1 Case Study -- Finding a Good Deal among Hotels with Simple Regression
175(3)
7.4 Linear Regression: Introduction
178(1)
7.5 Linear Regression: Coefficient Interpretation
179(1)
7.6 Linear Regression with a Binary Explanatory Variable
180(1)
7.7 Coefficient Formula
181(3)
7.A2 Case Study -- Finding a Good Deal among Hotels with Simple Regression
183(1)
7.8 Predicted Dependent Variable and Regression Residual
184(4)
7.A3 Case Study -- Finding a Good Deal among Hotels with Simple Regression
185(3)
7.9 Goodness of Fit, R-Squared
188(1)
7.10 Correlation and Linear Regression
189(1)
7.11 Regression Analysis, Regression toward the Mean, Mean Reversion
190(1)
7.12 Regression and Causation
190(2)
7.A4 Case Study -- Finding a Good Deal among Hotels with Simple Regression
192(1)
7.13 Main Takeaways
192(8)
Practice Questions
193(1)
Data Exercises
193(1)
References and Further Reading
194(1)
7.U1 Under the Hood: Derivation of the OLS Formulae for the Intercept and Slope Coefficients
194(3)
7.U2 Under the Hood: More on Residuals and Predicted Values with OLS
197(3)
8 Complicated Patterns and Messy Data
200(36)
8.1 When and Why Care about the Shape of the Association between y and x?
201(1)
8.2 Taking Relative Differences or Log
202(2)
8.3 Log Transformation and Non-positive Values
204(2)
8.4 Interpreting Log Values in a Regression
206(4)
8.A1 Case Study -- Finding a Good Deal among Hotels with Nonlinear Function
207(3)
8.5 Other Transformations of Variables
210(5)
8.B1 Case Study -- How is Life Expectancy Related to the Average Income of a Country?
210(5)
8.6 Regression with a Piecewise Linear Spline
215(1)
8.7 Regression with Polynomial
216(2)
8.8 Choosing a Functional Form in a Regression
218(3)
8.B2 Case Study -- How is Life Expectancy Related to the Average Income of a Country?
219(2)
8.9 Extreme Values and Influential Observations
221(1)
8.10 Measurement Error in Variables
222(1)
8.11 Classical Measurement Error
223(4)
8.C1 Case Study -- Hotel Ratings and Measurement Error
225(2)
8.12 Non-classical Measurement Error and General Advice
227(1)
8.13 Using Weights in Regression Analysis
228(2)
8.B3 Case Study -- How is Life Expectancy Related to the Average Income of a Country?
229(1)
8.14 Main Takeaways
230(6)
Practice Questions
231(1)
Data Exercises
232(1)
References and Further Reading
232(1)
8.U1 Under the Hood: Details of the Log Approximation
233(1)
8.U2 Under the Hood: Deriving the Consequences of Classical Measurement Error
234(2)
9 Generalizing Results of a Regression
236(30)
9.1 Generalizing Linear Regression Coefficients
237(1)
9.2 Statistical Inference: CI and SE of Regression Coefficients
238(5)
9.A1 Case Study -- Estimating Gender and Age Differences in Earnings
240(3)
9.3 Intervals for Predicted Values
243(6)
9.A2 Case Study -- Estimating Gender and Age Differences in Earnings
245(4)
9.4 Testing Hypotheses about Regression Coefficients
249(2)
9.5 Testing More Complex Hypotheses
251(2)
9.A3 Case Study -- Estimating Gender and Age Differences in Earnings
252(1)
9.6 Presenting Regression Results
253(3)
9.A4 Case Study -- Estimating Gender and Age Differences in Earnings
254(2)
9.7 Data Analysis to Help Assess External Validity
256(4)
9.B1 Case Study -- How Stable is the Hotel Price-Distance to Center Relationship?
256(4)
9.8 Main Takeaways
260(6)
Practice Questions
261(1)
Data Exercises
261(1)
References and Further Reading
262(1)
9.U1 Under the Hood: The Simple SE Formula for Regression Intercept
262(1)
9.U2 Under the Hood: The Law of Large Numbers for β
263(1)
9.U3 Under the Hood: Deriving SE(β) with the Central Limit Theorem
264(1)
9.U4 Under the Hood: Degrees of Freedom Adjustment for the SE Formula
265(1)
10 Multiple Linear Regression
266(31)
10.1 Multiple Regression: Why and When?
267(1)
10.2 Multiple Linear Regression with Two Explanatory Variables
267(1)
10.3 Multiple Regression and Simple Regression: Omitted Variable Bias
268(4)
10.A1 Case Study -- Understanding the Gender Difference in Earnings
270(2)
10.4 Multiple Linear Regression Terminology
272(1)
10.5 Standard Errors and Confidence Intervals in Multiple Linear Regression
273(2)
10.6 Hypothesis Testing in Multiple Linear Regression
275(1)
10.A2 Case Study -- Understanding the Gender Difference in Earnings
275(1)
10.7 Multiple Linear Regression with Three or More Explanatory Variables
276(1)
10.8 Nonlinear Patterns and Multiple Linear Regression
277(2)
10.A3 Case Study -- Understanding the Gender Difference in Earnings
278(1)
10.9 Qualitative Right-Hand-Side Variables
279(3)
10.A4 Case Study -- Understanding the Gender Difference in Earnings
280(2)
10.10 Interactions: Uncovering Different Slopes across Groups
282(4)
10.A5 Case Study -- Understanding the Gender Difference in Earnings
284(2)
10.11 Multiple Regression and Causal Analysis
286(4)
10.A6 Case Study -- Understanding the Gender Difference in Earnings
287(3)
10.12 Multiple Regression and Prediction
290(4)
10.B1 Case Study -- Finding a Good Deal among Hotels with Multiple Regression
292(2)
10.13 Main Takeaways
294(3)
Practice Questions
294(1)
Data Exercises
295(1)
References and Further Reading
296(1)
10.U1 Under the Hood: A Two-Step Procedure to Get the Multiple Regression Coefficient
296(1)
11 Modeling Probabilities
297(32)
11.1 The Linear Probability Model
298(1)
11.2 Predicted Probabilities in the Linear Probability Model
299(8)
11.A1 Case Study -- Does Smoking Pose a Health Risk?
301(6)
11.3 Logit and Probit
307(2)
11.A2 Case Study -- Does Smoking Pose a Health Risk?
308(1)
11.4 Marginal Differences
309(3)
11.A3 Case Study -- Does Smoking Pose a Health Risk?
311(1)
11.5 Goodness of Fit: R-Squared and Alternatives
312(2)
11.6 The Distribution of Predicted Probabilities
314(1)
11.7 Bias and Calibration
314(3)
11 B1 Case Study -- Are Australian Weather Forecasts Well Calibrated?
315(2)
11.8 Refinement
317(4)
11.A4 Case Study -- Does Smoking Pose a Health risk?
318(3)
11.9 Using Probability Models for Other Kinds of y Variables
321(2)
11.10 Main Takeaways
323(6)
Practice Questions
323(1)
Data Exercises
324(1)
References and Further Reading
325(1)
11.U1 Under the Hood: Saturated Models
325(1)
11.U2 Under the Hood: Maximum Likelihood Estimation and Search Algorithms
326(1)
11.U3 Under the Hood: From Logit and Probit Coefficients to Marginal Differences
327(2)
12 Regression with Time Series Data
329(34)
12.1 Preparation of Time Series Data
330(2)
12.2 Trend and Seasonality
332(1)
12.3 Stationarity, Non-stationarity, Random Walk
333(5)
12.A1 Case Study -- Returns on a Company Stock and Market Returns
335(3)
12.4 Time Series Regression
338(5)
12.A2 Case Study -- Returns on a Company Stock and Market Returns
339(4)
12.5 Trends, Seasonality, Random Walks in a Regression
343(6)
12.B1 Case Study -- Electricity Consumption and Temperature
346(3)
12.6 Serial Correlation
349(1)
12.7 Dealing with Serial Correlation in Time Series Regressions
350(5)
12.B2 Case Study -- Electricity Consumption and Temperature
352(3)
12.8 Lags of x in a Time Series Regression
355(4)
12.B3 Case Study -- Electricity Consumption and Temperature
357(2)
12.9 The Process of Time Series Regression Analysis
359(1)
12.10 Main Takeaways
360(3)
Practice Questions
360(1)
Data Exercises
361(1)
References and Further Reading
362(1)
12.U1 Under the Hood: Testing for Unit Root
362(1)
III PREDICTION
363(154)
13 A Framework for Prediction
365(26)
13.1 Prediction Basics
366(1)
13.2 Various Kinds of Prediction
367(2)
13.A1 Case Study -- Predicting Used Car Value with Linear Regressions
369(1)
13.3 The Prediction Error and Its Components
369(4)
13.A2 Case Study -- Predicting Used Car Value with Linear Regressions
371(2)
13.4 The Loss Function
373(2)
13.5 Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
375(1)
13.6 Bias and Variance of Predictions
376(1)
13.7 The Task of Finding the Best Model
377(2)
13.8 Finding the Best Model by Best Fit and Penalty: The BIC
379(1)
13.9 Finding the Best Model by Training and Test Samples
380(2)
13.10 Finding the Best Model by Cross-Validation
382(2)
13.A3 Case Study -- Predicting Used Car Value with Linear Regressions
383(1)
13.11 External Validity and Stable Patterns
384(3)
13.A4 Case Study -- Predicting Used Car Value with Linear Regressions
386(1)
13.12 Machine Learning and the Role of Algorithms
387(2)
13.13 Main Takeaways
389(2)
Practice Questions
389(1)
Data Exercises
390(1)
References and Further Reading
390(1)
14 Model Building for Prediction
391(26)
14.1 Steps of Prediction
392(1)
14.2 Sample Design
393(1)
14.3 Label Engineering and Predicting Log y
394(3)
14.A1 Case Study -- Predicting Used Car Value: Log Prices
395(2)
14.4 Feature Engineering: Dealing with Missing Values
397(1)
14.5 Feature Engineering: What x Variables to Have and in What Functional Form
398(4)
14.B1 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model
399(3)
14.6 We Can't Try Out All Possible Models
402(1)
14.7 Evaluating the Prediction Using a Holdout Set
403(4)
14.B2 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model
404(3)
14.8 Selecting Variables in Regressions by LASSO
407(3)
14.B3 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model
409(1)
14.9 Diagnostics
410(2)
14.B4 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model
411(1)
14.10 Prediction with Big Data
412(2)
14.11 Main Takeaways
414(3)
Practice Questions
414(1)
Data Exercises
415(1)
References and Further Reading
415(1)
14.U1 Under the Hood: Text Parsing
415(1)
14.U2 Under the Hood: Log Correction
416(1)
15 Regression Trees
417(21)
15.1 The Case for Regression Trees
418(1)
15.2 Regression Tree Basics
419(1)
15.3 Measuring Fit and Stopping Rules
420(5)
5.A1 Case Study -- Predicting Used Car Value with a Regression Tree
421(4)
15.4 Regression Tree with Multiple Predictor Variables
425(1)
15.5 Pruning a Regression Tree
426(1)
15.6 A Regression Tree is a Non-parametric Regression
426(4)
15.A2 Case Study -- Predicting Used Car Value with a Regression Tree
427(3)
15.7 Variable Importance
430(1)
15.8 Pros and Cons of Using a Regression Tree for Prediction
431(4)
15.A3 Case Study -- Predicting Used Car Value with a Regression Tree
433(2)
15.9 Main Takeaways
435(3)
Practice Questions
435(1)
Data Exercises
436(1)
References and Further Reading
437(1)
16 Random Forest and Boosting
438(19)
16.1 From a Tree to a Forest: Ensemble Methods
439(1)
16.2 Random Forest
440(2)
16.3 The Practice of Prediction with Random Forest
442(2)
16.A1 Case Study -- Predicting Airbnb Apartment Prices with Random Forest
443(1)
16.4 Diagnostics: The Variable Importance Plot
444(1)
16.5 Diagnostics: The Partial Dependence Plot
445(1)
16.6 Diagnostics: Fit in Various Subsets
446(3)
16.A2 Case Study -- Predicting Airbnb Apartment Prices with Random Forest
446(3)
16.7 An Introduction to Boosting and the GBM Model
449(3)
16.A3 Case Study -- Predicting Airbnb Apartment Prices with Random Forest
450(2)
16.8 A Review of Different Approaches to Predict a Quantitative y
452(2)
16.9 Main Takeaways
454(3)
Practice Questions
454(1)
Data Exercises
455(1)
References and Further Reading
456(1)
17 Probability Prediction and Classification
457(30)
17.1 Predicting a Binary y. Probability Prediction and Classification
458(4)
17.A1 Case Study -- Predicting Firm Exit: Probability and Classification
459(3)
17.2 The Practice of Predicting Probabilities
462(4)
17.A2 Case Study -- Predicting Firm Exit: Probability and Classification
463(3)
17.3 Classification and the Confusion Table
466(2)
17.4 Illustrating the Trade-Off between Different Classification Thresholds: The ROC Curve
468(3)
17.A3 Case Study -- Predicting Firm Exit: Probability and Classification
469(2)
17.5 Loss Function and Finding the Optimal Classification Threshold
471(4)
17.A4 Case Study -- Predicting Firm Exit: Probability and Classification
473(2)
17.6 Probability Prediction and Classification with Random Forest
475(5)
17.A5 Case Study -- Predicting Firm Exit: Probability and Classification
477(3)
17.7 Class Imbalance
480(1)
17.8 The Process of Prediction with a Binary Target Variable
481(1)
17.9 Main Takeaways
482(5)
Practice Questions
482(1)
Data Exercises
483(1)
References and Further Reading
484(1)
17.U1 Under the Hood: The Gini Node Impurity Measure and MSE
484(1)
17.U2 Under the Hood: On the Method of Finding an Optimal Threshold
485(2)
18 Forecasting from Time Series Data
487(30)
18.1 Forecasting: Prediction Using Time Series Data
488(1)
18.2 Holdout, Training, and Test Samples in Time Series Data
489(2)
18.3 Long-Horizon Forecasting: Seasonality and Predictable Events
491(1)
18.4 Long-Horizon Forecasting: Trends
492(8)
18.A1 Case Study -- Forecasting Daily Ticket Volumes for a Swimming Pool
494(6)
18.5 Forecasting for a Short Horizon Using the Patterns of Serial Correlation
500(1)
18.6 Modeling Serial Correlation: AR(1)
500(1)
18.7 Modeling Serial Correlation: ARIMA
501(4)
18.B1 Case Study -- Forecasting a Home Price Index
503(2)
18.8 VAR: Vector Autoregressions
505(4)
18.B2 Case Study -- Forecasting a Home Price index
507(2)
18.9 External Validity of Forecasts
509(3)
18.B3 Case Study -- Forecasting a Home Price Index
510(2)
18.10 Main Takeaways
512(5)
Practice Questions
512(1)
Data Exercises
513(1)
References and Further Reading
514(1)
18.U1 Under the Hood: Details of the ARIMA Model
514(2)
18.U2 Under the Hood: Auto-Arima
516(1)
IV CAUSAL ANALYSIS
517(187)
19 A Framework for Causal Analysis
519(36)
19.1 Intervention, Treatment, Subjects, Outcomes
520(2)
19.2 Potential Outcomes
522(1)
19.3 The Individual Treatment Effect
523(1)
19.4 Heterogeneous Treatment Effects
524(1)
19.5 ATE: The Average Treatment Effect
525(2)
19.6 Average Effects in Subgroups and ATET
527(1)
19.7 Quantitative Causal Variables
527(3)
19.A1 Case Study -- Food and Health
528(2)
19.8 Ceteris Paribus: Other Things Being the Same
530(1)
19.9 Causal Maps
531(2)
19.10 Comparing Different Observations to Uncover Average Effects
533(2)
19.11 Random Assignment
535(1)
19.12 Sources of Variation in the Causal Variable
536(3)
19.A2 Case Study -- Food and Health
537(2)
19.13 Experimenting versus Conditioning
539(2)
19.14 Confounders in Observational Data
541(2)
19.15 From Latent Variables to Measured Variables
543(1)
19.16 Bad Conditioners: Variables Not to Condition On
544(5)
19.A3 Case Study -- Food and Health
545(4)
19.17 External Validity, Internal Validity
549(2)
19.18 Constructive Skepticism
551(1)
19.19 Main Takeaways
552(3)
Practice Questions
552(1)
Data Exercises
553(1)
References and Further Reading
554(1)
20 Designing and Analyzing Experiments
555(33)
20.1 Randomized Experiments and Potential Outcomes
556(1)
20.2 Field Experiments, A/B Testing, Survey Experiments
557(3)
20.A1 Case Study -- Working from Home and Employee Performance
558(1)
20.B1 Case Study -- Fine Tuning Social Media Advertising
559(1)
20.3 The Experimental Setup: Definitions
560(1)
20.4 Random Assignment in Practice
560(2)
20.5 Number of Subjects and Proportion Treated
562(1)
20.6 Random Assignment and Covariate Balance
563(4)
20.A2 Case Study -- Working from Home and Employee Performance
565(2)
20.7 Imperfect Compliance and Intent-to-Treat
567(3)
20.A3 Case Study -- Working from Home and Employee Performance
569(1)
20.8 Estimation and Statistical Inference
570(2)
20.B2 Case Study -- Fine Tuning Social Media Advertising
571(1)
20.9 Including Covariates in a Regression
572(4)
20.A4 Case Study -- Working from Home and Employee Performance
573(3)
20.10 Spillovers
576(1)
20.11 Additional Threats to Internal Validity
577(4)
20.A5 Case Study -- Working from Home and Employee Performance
579(2)
20.12 External Validity, and How to Use the Results in Decision Making
581(2)
20.A6 Case Study -- Working from Home and Employee Performance
582(1)
20.13 Main Takeaways
583(5)
Practice Questions
584(1)
Data Exercises
585(1)
References and Further Reading
585(1)
20.U1 Under the Hood: LATE: The Local Average Treatment Effect
586(1)
20.U2 Under the Hood: The Formula for Sample Size Calculation
586(2)
21 Regression and Matching with Observational Data
588(32)
21.1 Thought Experiments
589(2)
21.A1 Case Study -- Founder/Family Ownership and Quality of Management
590(1)
21.2 Variables to Condition on. Variables Not to Condition On
591(4)
21.A2 Case Study -- Founder/Family Ownership and Quality of Management
592(3)
21.3 Conditioning on Confounders by Regression
595(2)
21.4 Selection of Variables and Functional Form in a Regression for Causal Analysis
597(4)
21.A3 Case Study -- Founder/Family Ownership and Quality of Management
598(3)
21.5 Matching
601(2)
21.6 Common Support
603(1)
21.7 Matching on the Propensity Score
604(3)
21.A4 Case Study -- Founder/Family Ownership and Quality of Management
605(2)
21.8 Comparing Linear Regression and Matching
607(3)
21.A5 Case Study -- Founder/Family Ownership and Quality of Management
609(1)
21.9 Instrumental Variables
610(3)
21.10 Regression-Discontinuity
613(1)
21.11 Main Takeaways
614(6)
Practice Questions
614(1)
Data Exercises
615(1)
References and Further Reading
616(1)
21.U1 Under the Hood: Unobserved Heterogeneity and Endogenous x in a Regression
616(2)
21.U2 Under the hood: LATE is IV
618(2)
22 Difference-in-Differences
620(29)
22.1 Conditioning on Pre-intervention Outcomes
621(1)
22.2 Basic Difference-in-Differences Analysis: Comparing Average Changes
622(7)
22.A1 Case Study -- How Does a Merger between Airlines Affect Prices?
625(4)
22.3 The Parallel Trends Assumption
629(4)
22.A2 Case Study -- How Does a Merger between Airlines Affect Prices?
631(2)
22.4 Conditioning on Additional Confounders in Diff-in-Diffs Regressions
633(4)
22.A3 Case Study -- How Does a Merger between Airlines Affect Prices?
635(2)
22.5 Quantitative Causal Variable
637(3)
22.A4 Case Study -- How Does a Merger between Airlines Affect Prices?
638(2)
22.6 Difference-in-Differences with Pooled Cross-Sections
640(5)
22.A5 Case Study -- How Does a Merger between Airlines Affect Prices?
643(2)
22.7 Main Takeaways
645(4)
Practice Questions
646(1)
Data Exercises
647(1)
References and Further Reading
648(1)
23 Methods for Panel Data
649(32)
23.1 Multiple Time Periods Can Be Helpful
650(1)
23.2 Estimating Effects Using Observational Time Series
651(2)
23.3 Lags to Estimate the Time Path of Effects
653(1)
23.4 Leads to Examine Pre-trends and Reverse Effects
653(1)
23.5 Pooled Time Series to Estimate the Effect for One Unit
654(5)
23.A1 Case Study -- Import Demand and Industrial Production
656(3)
23.6 Panel Regression with Fixed Effects
659(2)
23.7 Aggregate Trend
661(4)
23.B1 Case Study -- Immunization against Measles and Saving Children
662(3)
23.8 Clustered Standard Errors
665(1)
23.9 Panel Regression in First Differences
666(1)
23.10 Lags and Leads in FD Panel Regressions
667(4)
23.B2 Case Study -- Immunization against Measles and Saving Children
669(2)
23.11 Aggregate Trend and Individual Trends in FD Models
671(3)
23.B3 Case Study -- Immunization against Measles and Saving Children
672(2)
23.12 Panel Regressions and Causality
674(1)
23.13 First Differences or Fixed Effects?
675(2)
23.14 Dealing with Unbalanced Panels
677(1)
23.15 Main Takeaways
678(3)
Practice Questions
678(2)
Data Exercises
680(1)
References and Further Reading
680(1)
24 Appropriate Control Groups for Panel Data
681(23)
24.1 When and Why to Select a Control Group in xt Panel Data
682(1)
24.2 Comparative Case Studies
682(1)
24.3 The Synthetic Control Method
683(4)
24.A1 Case Study -- Estimating the Effect of the 2010 Haiti Earthquake on GDP
684(3)
24.4 Event Studies
687(7)
24.B1 Case Study -- Estimating the Impact of Replacing Football Team Managers
690(4)
24.5 Selecting a Control Group in Event Studies
694(7)
24.B2 Case Study -- Estimating the Impact of Replacing Football Team Managers
696(5)
24.6 Main Takeaways
701(3)
Practice Questions
701(1)
Data Exercises
702(1)
References and Further Reading
703(1)
References 704(5)
Index 709
Gábor Békés is an assistant professor at the Department of Economics and Business of the Central European University, and Director of the Business Analytics Program. He is a senior fellow at KRTK and a research affiliate at the Center for Economic Policy Research (CEPR). He has published in top economics journals on multinational firm activities and productivity, business clusters, and innovation spillovers. He has managed international data collection projects on firm performance and supply chains. He has done policy advising (the European Commission, ECB) as well as private-sector consultancy (in finance, business intelligence, and real estate). He has taught graduate-level data analysis and economic geography courses since 2012. Gábor Kézdi is a research associate professor at the University of Michigan's Institute for Social Research. He has published in top journals in economics, statistics, and political science on topics including household finances, health, education, demography, and ethnic disadvantages and prejudice. He has managed several data collection projects in Europe; currently, he is co-investigator of the Health and Retirement Study in the US. He has consulted for various governmental and non-governmental institutions on the disadvantage of the Roma minority and the evaluation of social interventions. He has taught data analysis, econometrics, and labor economics from undergraduate to Ph.D. levels since 2002, and supervised a number of MA and Ph.D. students.