Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Data Analysis for Business, Economics, and Policy [Pehme köide]

4.73/5 (12 hinnangut Goodreads-ist)

Gįbor Békés, Gįbor Kézdi (University of Michigan, Ann Arbor)

Formaat: Paperback / softback, 738 pages, kõrgus x laius x paksus: 246x190x33 mm, kaal: 1590 g, Worked examples or Exercises
Ilmumisaeg: 06-May-2021
Kirjastus: Cambridge University Press
ISBN-10: 1108716202
ISBN-13: 9781108716208

Teised raamatud teemal:

Econometrics
Economic statistics
Finance - (Hetkel poes: 1 nimetust)
Knowledge management
Insurance & actuarial studies
Data analysis: general - (Hetkel poes: 1 nimetust)

Pehme köide
Hind: 73,00 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 738 pages, kõrgus x laius x paksus: 246x190x33 mm, kaal: 1590 g, Worked examples or Exercises
Ilmumisaeg: 06-May-2021
Kirjastus: Cambridge University Press
ISBN-10: 1108716202
ISBN-13: 9781108716208

Teised raamatud teemal:

Econometrics
Economic statistics
Finance - (Hetkel poes: 1 nimetust)
Knowledge management
Insurance & actuarial studies
Data analysis: general - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9781108716208.html

Märksõnad:

This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real-life questions; to carry out data analysis; and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other. As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry-relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python, can be found at www.gabors-data-analysis.com.

Arvustused

'This exciting new text covers everything today's aspiring data scientist needs to know, managing to be comprehensive as well as accessible. Like a good confidence interval, the Gabors have got you almost completely covered!' Joshua Angrist, Massachusetts Institute of Technology, winner of the Nobel Memorial Prize in Economic Sciences 'This is an excellent book for students learning the art of modern data analytics. It combines the latest techniques with practical applications, replicating the implementation side of classroom teaching that is typically missing in textbooks. For example, they used the World Management Survey data to generate exercises on firm performance for students to gain experience in handling real data, with all its quirks, problems, and issues. For students looking to learn data analysis from one textbook, this is a great way to proceed.' Nicholas Bloom, Stanford University 'I know of few books about data analysis and visualization that are as comprehensive, deep, practical, and current as this one; and I know of almost none that are as fun to read. Gįbor Békés and Gįbor Kézdi have created a most unusual and most compelling beast: a textbook that teaches you the subject matter well and that, at the same time, you can enjoy reading cover to cover.' Alberto Cairo, University of Miami 'A beautiful integration of econometrics and data science that provides a direct path from data collection and exploratory analysis to conventional regression modeling, then on to prediction and causal modeling. Exactly what is needed to equip the next generation of students with the tools and insights from the two fields.' David Card, University of California, Berkeley, winner of the Nobel Memorial Prize in Economic Sciences 'This textbook is excellent at dissecting and explaining the underlying process of data analysis. Békés and Kézdi have masterfully woven into their instruction a comprehensive range of case studies. The result is a rigorous textbook grounded in real-world learning, at once accessible and engaging to novice scholars and advanced practitioners alike. I have every confidence it will be valued by future generations.' Kerwin K. Charles, Yale School of Management 'This book takes you by the hand in a journey that will bring you to understand the core value of data in the fields of machine learning and economics. The large amount of accessible examples combined with the intuitive explanation of foundational concepts is an ideal mix for anyone who wants to do data analysis. It is highly recommended to anyone interested in the new way in which data will be analyzed in the social sciences in the next years.' Christian Fons-Rosen, Barcelona Graduate School of Economics 'This sophisticatedly simple book is ideal for undergraduate- or Master's-level Data Analytics courses with a broad audience. The authors discuss the key aspects of examining data, regression analysis, prediction, Lasso, and random forests, and more, with using elegant prose instead of algebra. Using well-chosen case studies, they illustrate the techniques and discuss all of them patiently and thoroughly.' Carter Hill, Louisiana State University 'This is not an econometrics textbook. It is a data analysis textbook. And a highly unusual one - written in plain English, based on simplified notation, and full of case studies. An excellent starting point for future data analysts or anyone interested in finding out what data can tell us.' Beata Javorcik, University of Oxford 'A multifaceted book that considers many sides of data analysis, all of them important for the contemporary student and practitioner. It brings together classical statistics, regression, and causal inference, sending the message that awareness of all three aspects is important for success in this field. Many 'best practices' are discussed in accessible language, and illustrated using interesting datasets.' llya Ryzhov, University of Maryland 'This is a fantastic book to have. Strong data skills are critical for modern business and economic research, and this text provides a thorough and practical guide to acquiring them. Highly recommended.' John van Reenen, MIT Sloan 'Energy and climate change is one of the most important public policy challenges, and high- quality data and its empirical analysis is a foundation of solid policy. Data Analysis for Business, Economics, and Policy will make an important contribution to this with its innovative approach. In addition to the comprehensive treatment of modern econometric techniques, the book also covers the less glamorous but crucial aspects of procuring and cleaning data, and drawing useful inferences from less-than-perfect datasets. As the center of gravity of the energy system shifts to developing economies where data quality is still an issue, this will provide an important and practical combination for both academic and policy professionals.' Laszlo Varro, Chief Economist, International Energy Agency 'The content is comprehensive and well-organized, with useful examples and case studies. The level is suitable for my students. The layout is clean and easy to navigate, and the availability in both print and digital formats is convenient. The supporting materials, particularly the Python codes provided, greatly enhance the learning experience by offering practical, hands-on examples.' Ejiro Esohwode, Anglia Ruskin University

Muu info

A comprehensive textbook on data analysis for business, applied economics and public policy that uses case studies with real-world data.

Why Use This Book

xxi

Simplified Notation

xxiv

Acknowledgments

xxv

I DATA EXPLORATION

(168)

1 Origins of Data

(27)

1.1 What Is Data?

(1)

1.2 Data Structures

(2)

1 A1 Case Study -- Finding a Good Deal among Hotels: Data Collection

(1)

1.3 Data Quality

(4)

1.B1 Case Study -- Comparing Online and Offline Prices: Data Collection

(1)

1.C1 Case Study -- Management Quality and Firm Performance: Data Collection

(1)

1.4 How Data Is Born: The Big Picture

(1)

1.5 Collecting Data from Existing Sources

(4)

1.A2 Case Study -- Finding a Good Deal among Hotels: Data Collection

(1)

1.B2 Case Study -- Comparing Online and Offline Prices: Data Collection

(1)

1.6 Surveys

(2)

1.C2 Case Study -- Management Quality and Firm Size: Data Collection

(1)

1.7 Sampling

(1)

1.8 Random Sampling

(3)

1.B3 Case Study -- Comparing Online and Offline Prices: Data Collection

(1)

1.C3 Case Study -- Management Quality and Firm Size: Data Collection

(1)

1.9 Big Data

(2)

1.10 Good Practices in Data Collection

(2)

1.11 Ethical and Legal Issues of Data Collection

(1)

1.12 Main Takeaways

(3)

Practice Questions

(1)

Data Exercises

(1)

References and Further Reading

(2)

2 Preparing Data for Analysis

(28)

2.1 Types of Variables

(2)

2.2 Stock Variables, Flow Variables

(1)

2.3 Types of Observations

(2)

2.4 Tidy Data

(2)

2.A1 Case Study -- Finding a Good Deal among Hotels: Data Preparation

(1)

2.5 Tidy Approach for Multi-dimensional Data

(1)

2.B1 Case Study -- Displaying Immunization Rates across Countries

(1)

2.6 Relational Data and Linking Data Tables

(4)

2.C1 Case Study -- Identifying Successful Football Managers

(2)

2.7 Entity Resolution: Duplicates, Ambiguous Identification, and Non-entity Rows

(2)

2.C2 Case Study -- Identifying Successful Football Managers

(1)

2.8 Discovering Missing Values

(2)

2.9 Managing Missing Values

(2)

2.A2 Case Study -- Finding a Good Deal among Hotels: Data Preparation

(1)

2.10 The Process of Cleaning Data

(1)

2.11 Reproducible Workflow: Write Code and Document Your Steps

(1)

2.12 Organizing Data Tables for a Project

(4)

2.C3 Case Study -- Identifying Successful Football Managers

(1)

2.C4 Case Study -- Identifying Successful Football Managers

(1)

2.13 Main Takeaways

(4)

Practice Questions

(1)

Data Exercises

(1)

References and Further Reading

(1)

2.U1 Under the Hood: Naming Files

(2)

3 Exploratory Data Analysis

(38)

3.1 Why Do Exploratory Data Analysis?

(1)

3.2 Frequencies and Probabilities

(1)

3.3 Visualizing Distributions

(4)

3.A1 Case Study -- Finding a Good Deal among Hotels: Data Exploration

(3)

3.4 Extreme Values

(3)

3.A2 Case Study -- Finding a Good Deal among Hotels: Data Exploration

(2)

3.5 Good Graphs: Guidelines for Data Visualization

(4)

3.A3 Case Study -- Finding a Good Deal among Hotels: Data Exploration

(1)

3.6 Summary Statistics for Quantitative Variables

(5)

3.B1 Case Study -- Comparing Hotel Prices in Europe: Vienna vs. London

(3)

3.7 Visualizing Summary Statistics

(3)

3.C1 Case Study -- Measuring Home Team Advantage in Football

(2)

3.8 Good Tables

(3)

3.C2 Case Study -- Measuring Home Team Advantage in Football

(1)

3.9 Theoretical Distributions

(4)

3.D1 Case Study -- Distributions of Body Height and Income

(2)

3.10 Steps of Exploratory Data Analysis

(1)

3.11 Main Takeaways

(8)

Practice Questions

(1)

Data Exercises

(1)

References and Further Reading

(1)

3.U1 Under the Hood: More on Theoretical Distributions

(1)

Bernoulli Distribution

(1)

Binomial Distribution

(1)

Uniform Distribution

(1)

Power-Law Distribution

(4)

4 Comparison and Correlation

(22)

4.1 The y and the x

(3)

4.A1 Case Study -- Management Quality and Firm Size: Describing Patterns of Association

(2)

4.2 Conditioning

100

(1)

4.3 Conditional Probabilities

101

(2)

4.A2 Case Study -- Management Quality and Firm Size: Describing Patterns of Association

102

(1)

4.4 Conditional Distribution, Conditional Expectation

103

(1)

4.5 Conditional Distribution, Conditional Expectation with Quantitative x

104

(4)

4.A3 Case Study -- Management Quality and Firm Size: Describing Patterns of Association

105

(3)

4.6 Dependence, Covariance, Correlation

108

(2)

4.7 From Latent Variables to Observed Variables

110

(3)

4.A4 Case Study -- Management Quality and Firm Size: Describing Patterns of Association

111

(2)

4.8 Sources of Variation in x

113

(1)

4.9 Main Takeaways

114

(4)

Practice Questions

115

(1)

Data Exercises

115

(1)

References and Further Reading

116

(1)

4.U1 Under the Hood: Inverse Conditional Probabilities, Bayes' Rule

116

(2)

5 Generalizing from Data

118

(25)

5.1 When to Generalize and to What?

119

(3)

5.A1 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

121

(1)

5.2 Repeated Samples, Sampling Distribution, Standard Error

122

(3)

5.A2 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

123

(2)

5.3 Properties of the Sampling Distribution

125

(3)

5.A3 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

127

(1)

5.4 The confidence interval

128

(1)

5.A4 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

129

(1)

5.5 Discussion of the CI: Confidence or Probability?

129

(1)

5.6 Estimating the Standard Error with the Bootstrap Method

130

(3)

5.A5 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

132

(1)

5.7 The Standard Error Formula

133

(2)

5.A6 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

134

(1)

5.8 External Validity

135

(2)

5.A7 Case Study -- What Likelihood of Loss to Expect on a Stock Portfolio?

136

(1)

5.9 Big Data, Statistical Inference, External Validity

137

(1)

5.10 Main Takeaways

138

(5)

Practice Questions

138

(1)

Data Exercises

139

(1)

References and Further Reading

139

(1)

5.U1 Under the Hood: The Law of Large Numbers and the Central Limit Theorem

140

(3)

6 Testing Hypotheses

143

(26)

6.1 The Logic of Testing Hypotheses

144

(4)

6.A1 Case Study -- Comparing Online and Offline Prices: Testing the Difference

145

(3)

6.2 Null Hypothesis, Alternative Hypothesis

148

(1)

6.3 The t-Test

149

(1)

6.4 Making a Decision; False Negatives, False Positives

150

(4)

6.5 The p-Value

154

(3)

6.A2 Case Study -- Comparing Online and Offline Prices: Testing the Difference

155

(2)

6.6 Steps of Hypothesis Testing

157

(1)

6.7 One-Sided Alternatives

158

(2)

6.B1 Case Study -- Testing the Likelihood of Loss on a Stock Portfolio

159

(1)

6.8 Testing Multiple Hypotheses

160

(2)

6.A3 Case Study -- Comparing Online and Offline Prices: Testing the Difference

161

(1)

6.9 p-Hacking

162

(2)

6.10 Testing Hypotheses with Big Data

164

(1)

6.11 Main Takeaways

165

(4)

Practice Questions

165

(1)

Data Exercises

166

(1)

References and Further Reading

167

(2)

II REGRESSION ANALYSIS

169

(194)

7 Simple Regression

171

(29)

7.1 When and Why Do Simple Regression Analysis?

172

(1)

7.2 Regression: Definition

172

(2)

7.3 Non-parametric Regression

174

(4)

7.A1 Case Study -- Finding a Good Deal among Hotels with Simple Regression

175

(3)

7.4 Linear Regression: Introduction

178

(1)

7.5 Linear Regression: Coefficient Interpretation

179

(1)

7.6 Linear Regression with a Binary Explanatory Variable

180

(1)

7.7 Coefficient Formula

181

(3)

7.A2 Case Study -- Finding a Good Deal among Hotels with Simple Regression

183

(1)

7.8 Predicted Dependent Variable and Regression Residual

184

(4)

7.A3 Case Study -- Finding a Good Deal among Hotels with Simple Regression

185

(3)

7.9 Goodness of Fit, R-Squared

188

(1)

7.10 Correlation and Linear Regression

189

(1)

7.11 Regression Analysis, Regression toward the Mean, Mean Reversion

190

(1)

7.12 Regression and Causation

190

(2)

7.A4 Case Study -- Finding a Good Deal among Hotels with Simple Regression

192

(1)

7.13 Main Takeaways

192

(8)

Practice Questions

193

(1)

Data Exercises

193

(1)

References and Further Reading

194

(1)

7.U1 Under the Hood: Derivation of the OLS Formulae for the Intercept and Slope Coefficients

194

(3)

7.U2 Under the Hood: More on Residuals and Predicted Values with OLS

197

(3)

8 Complicated Patterns and Messy Data

200

(36)

8.1 When and Why Care about the Shape of the Association between y and x?

201

(1)

8.2 Taking Relative Differences or Log

202

(2)

8.3 Log Transformation and Non-positive Values

204

(2)

8.4 Interpreting Log Values in a Regression

206

(4)

8.A1 Case Study -- Finding a Good Deal among Hotels with Nonlinear Function

207

(3)

8.5 Other Transformations of Variables

210

(5)

8.B1 Case Study -- How is Life Expectancy Related to the Average Income of a Country?

210

(5)

8.6 Regression with a Piecewise Linear Spline

215

(1)

8.7 Regression with Polynomial

216

(2)

8.8 Choosing a Functional Form in a Regression

218

(3)

8.B2 Case Study -- How is Life Expectancy Related to the Average Income of a Country?

219

(2)

8.9 Extreme Values and Influential Observations

221

(1)

8.10 Measurement Error in Variables

222

(1)

8.11 Classical Measurement Error

223

(4)

8.C1 Case Study -- Hotel Ratings and Measurement Error

225

(2)

8.12 Non-classical Measurement Error and General Advice

227

(1)

8.13 Using Weights in Regression Analysis

228

(2)

8.B3 Case Study -- How is Life Expectancy Related to the Average Income of a Country?

229

(1)

8.14 Main Takeaways

230

(6)

Practice Questions

231

(1)

Data Exercises

232

(1)

References and Further Reading

232

(1)

8.U1 Under the Hood: Details of the Log Approximation

233

(1)

8.U2 Under the Hood: Deriving the Consequences of Classical Measurement Error

234

(2)

9 Generalizing Results of a Regression

236

(30)

9.1 Generalizing Linear Regression Coefficients

237

(1)

9.2 Statistical Inference: CI and SE of Regression Coefficients

238

(5)

9.A1 Case Study -- Estimating Gender and Age Differences in Earnings

240

(3)

9.3 Intervals for Predicted Values

243

(6)

9.A2 Case Study -- Estimating Gender and Age Differences in Earnings

245

(4)

9.4 Testing Hypotheses about Regression Coefficients

249

(2)

9.5 Testing More Complex Hypotheses

251

(2)

9.A3 Case Study -- Estimating Gender and Age Differences in Earnings

252

(1)

9.6 Presenting Regression Results

253

(3)

9.A4 Case Study -- Estimating Gender and Age Differences in Earnings

254

(2)

9.7 Data Analysis to Help Assess External Validity

256

(4)

9.B1 Case Study -- How Stable is the Hotel Price-Distance to Center Relationship?

256

(4)

9.8 Main Takeaways

260

(6)

Practice Questions

261

(1)

Data Exercises

261

(1)

References and Further Reading

262

(1)

9.U1 Under the Hood: The Simple SE Formula for Regression Intercept

262

(1)

9.U2 Under the Hood: The Law of Large Numbers for β

263

(1)

9.U3 Under the Hood: Deriving SE(β) with the Central Limit Theorem

264

(1)

9.U4 Under the Hood: Degrees of Freedom Adjustment for the SE Formula

265

(1)

10 Multiple Linear Regression

266

(31)

10.1 Multiple Regression: Why and When?

267

(1)

10.2 Multiple Linear Regression with Two Explanatory Variables

267

(1)

10.3 Multiple Regression and Simple Regression: Omitted Variable Bias

268

(4)

10.A1 Case Study -- Understanding the Gender Difference in Earnings

270

(2)

10.4 Multiple Linear Regression Terminology

272

(1)

10.5 Standard Errors and Confidence Intervals in Multiple Linear Regression

273

(2)

10.6 Hypothesis Testing in Multiple Linear Regression

275

(1)

10.A2 Case Study -- Understanding the Gender Difference in Earnings

275

(1)

10.7 Multiple Linear Regression with Three or More Explanatory Variables

276

(1)

10.8 Nonlinear Patterns and Multiple Linear Regression

277

(2)

10.A3 Case Study -- Understanding the Gender Difference in Earnings

278

(1)

10.9 Qualitative Right-Hand-Side Variables

279

(3)

10.A4 Case Study -- Understanding the Gender Difference in Earnings

280

(2)

10.10 Interactions: Uncovering Different Slopes across Groups

282

(4)

10.A5 Case Study -- Understanding the Gender Difference in Earnings

284

(2)

10.11 Multiple Regression and Causal Analysis

286

(4)

10.A6 Case Study -- Understanding the Gender Difference in Earnings

287

(3)

10.12 Multiple Regression and Prediction

290

(4)

10.B1 Case Study -- Finding a Good Deal among Hotels with Multiple Regression

292

(2)

10.13 Main Takeaways

294

(3)

Practice Questions

294

(1)

Data Exercises

295

(1)

References and Further Reading

296

(1)

10.U1 Under the Hood: A Two-Step Procedure to Get the Multiple Regression Coefficient

296

(1)

11 Modeling Probabilities

297

(32)

11.1 The Linear Probability Model

298

(1)

11.2 Predicted Probabilities in the Linear Probability Model

299

(8)

11.A1 Case Study -- Does Smoking Pose a Health Risk?

301

(6)

11.3 Logit and Probit

307

(2)

11.A2 Case Study -- Does Smoking Pose a Health Risk?

308

(1)

11.4 Marginal Differences

309

(3)

11.A3 Case Study -- Does Smoking Pose a Health Risk?

311

(1)

11.5 Goodness of Fit: R-Squared and Alternatives

312

(2)

11.6 The Distribution of Predicted Probabilities

314

(1)

11.7 Bias and Calibration

314

(3)

11 B1 Case Study -- Are Australian Weather Forecasts Well Calibrated?

315

(2)

11.8 Refinement

317

(4)

11.A4 Case Study -- Does Smoking Pose a Health risk?

318

(3)

11.9 Using Probability Models for Other Kinds of y Variables

321

(2)

11.10 Main Takeaways

323

(6)

Practice Questions

323

(1)

Data Exercises

324

(1)

References and Further Reading

325

(1)

11.U1 Under the Hood: Saturated Models

325

(1)

11.U2 Under the Hood: Maximum Likelihood Estimation and Search Algorithms

326

(1)

11.U3 Under the Hood: From Logit and Probit Coefficients to Marginal Differences

327

(2)

12 Regression with Time Series Data

329

(34)

12.1 Preparation of Time Series Data

330

(2)

12.2 Trend and Seasonality

332

(1)

12.3 Stationarity, Non-stationarity, Random Walk

333

(5)

12.A1 Case Study -- Returns on a Company Stock and Market Returns

335

(3)

12.4 Time Series Regression

338

(5)

12.A2 Case Study -- Returns on a Company Stock and Market Returns

339

(4)

12.5 Trends, Seasonality, Random Walks in a Regression

343

(6)

12.B1 Case Study -- Electricity Consumption and Temperature

346

(3)

12.6 Serial Correlation

349

(1)

12.7 Dealing with Serial Correlation in Time Series Regressions

350

(5)

12.B2 Case Study -- Electricity Consumption and Temperature

352

(3)

12.8 Lags of x in a Time Series Regression

355

(4)

12.B3 Case Study -- Electricity Consumption and Temperature

357

(2)

12.9 The Process of Time Series Regression Analysis

359

(1)

12.10 Main Takeaways

360

(3)

Practice Questions

360

(1)

Data Exercises

361

(1)

References and Further Reading

362

(1)

12.U1 Under the Hood: Testing for Unit Root

362

(1)

III PREDICTION

363

(154)

13 A Framework for Prediction

365

(26)

13.1 Prediction Basics

366

(1)

13.2 Various Kinds of Prediction

367

(2)

13.A1 Case Study -- Predicting Used Car Value with Linear Regressions

369

(1)

13.3 The Prediction Error and Its Components

369

(4)

13.A2 Case Study -- Predicting Used Car Value with Linear Regressions

371

(2)

13.4 The Loss Function

373

(2)

13.5 Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)

375

(1)

13.6 Bias and Variance of Predictions

376

(1)

13.7 The Task of Finding the Best Model

377

(2)

13.8 Finding the Best Model by Best Fit and Penalty: The BIC

379

(1)

13.9 Finding the Best Model by Training and Test Samples

380

(2)

13.10 Finding the Best Model by Cross-Validation

382

(2)

13.A3 Case Study -- Predicting Used Car Value with Linear Regressions

383

(1)

13.11 External Validity and Stable Patterns

384

(3)

13.A4 Case Study -- Predicting Used Car Value with Linear Regressions

386

(1)

13.12 Machine Learning and the Role of Algorithms

387

(2)

13.13 Main Takeaways

389

(2)

Practice Questions

389

(1)

Data Exercises

390

(1)

References and Further Reading

390

(1)

14 Model Building for Prediction

391

(26)

14.1 Steps of Prediction

392

(1)

14.2 Sample Design

393

(1)

14.3 Label Engineering and Predicting Log y

394

(3)

14.A1 Case Study -- Predicting Used Car Value: Log Prices

395

(2)

14.4 Feature Engineering: Dealing with Missing Values

397

(1)

14.5 Feature Engineering: What x Variables to Have and in What Functional Form

398

(4)

14.B1 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model

399

(3)

14.6 We Can't Try Out All Possible Models

402

(1)

14.7 Evaluating the Prediction Using a Holdout Set

403

(4)

14.B2 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model

404

(3)

14.8 Selecting Variables in Regressions by LASSO

407

(3)

14.B3 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model

409

(1)

14.9 Diagnostics

410

(2)

14.B4 Case Study -- Predicting Airbnb Apartment Prices: Selecting a Regression Model

411

(1)

14.10 Prediction with Big Data

412

(2)

14.11 Main Takeaways

414

(3)

Practice Questions

414

(1)

Data Exercises

415

(1)

References and Further Reading

415

(1)

14.U1 Under the Hood: Text Parsing

415

(1)

14.U2 Under the Hood: Log Correction

416

(1)

15 Regression Trees

417

(21)

15.1 The Case for Regression Trees

418

(1)

15.2 Regression Tree Basics

419

(1)

15.3 Measuring Fit and Stopping Rules

420

(5)

5.A1 Case Study -- Predicting Used Car Value with a Regression Tree

421

(4)

15.4 Regression Tree with Multiple Predictor Variables

425

(1)

15.5 Pruning a Regression Tree

426

(1)

15.6 A Regression Tree is a Non-parametric Regression

426

(4)

15.A2 Case Study -- Predicting Used Car Value with a Regression Tree

427

(3)

15.7 Variable Importance

430

(1)

15.8 Pros and Cons of Using a Regression Tree for Prediction

431

(4)

15.A3 Case Study -- Predicting Used Car Value with a Regression Tree

433

(2)

15.9 Main Takeaways

435

(3)

Practice Questions

435

(1)

Data Exercises

436

(1)

References and Further Reading

437

(1)

16 Random Forest and Boosting

438

(19)

16.1 From a Tree to a Forest: Ensemble Methods

439

(1)

16.2 Random Forest

440

(2)

16.3 The Practice of Prediction with Random Forest

442

(2)

16.A1 Case Study -- Predicting Airbnb Apartment Prices with Random Forest

443

(1)

16.4 Diagnostics: The Variable Importance Plot

444

(1)

16.5 Diagnostics: The Partial Dependence Plot

445

(1)

16.6 Diagnostics: Fit in Various Subsets

446

(3)

16.A2 Case Study -- Predicting Airbnb Apartment Prices with Random Forest

446

(3)

16.7 An Introduction to Boosting and the GBM Model

449

(3)

16.A3 Case Study -- Predicting Airbnb Apartment Prices with Random Forest

450

(2)

16.8 A Review of Different Approaches to Predict a Quantitative y

452

(2)

16.9 Main Takeaways

454

(3)

Practice Questions

454

(1)

Data Exercises

455

(1)

References and Further Reading

456

(1)

17 Probability Prediction and Classification

457

(30)

17.1 Predicting a Binary y. Probability Prediction and Classification

458

(4)

17.A1 Case Study -- Predicting Firm Exit: Probability and Classification

459

(3)

17.2 The Practice of Predicting Probabilities

462

(4)

17.A2 Case Study -- Predicting Firm Exit: Probability and Classification

463

(3)

17.3 Classification and the Confusion Table

466

(2)

17.4 Illustrating the Trade-Off between Different Classification Thresholds: The ROC Curve

468

(3)

17.A3 Case Study -- Predicting Firm Exit: Probability and Classification

469

(2)

17.5 Loss Function and Finding the Optimal Classification Threshold

471

(4)

17.A4 Case Study -- Predicting Firm Exit: Probability and Classification

473

(2)

17.6 Probability Prediction and Classification with Random Forest

475

(5)

17.A5 Case Study -- Predicting Firm Exit: Probability and Classification

477

(3)

17.7 Class Imbalance

480

(1)

17.8 The Process of Prediction with a Binary Target Variable

481

(1)

17.9 Main Takeaways

482

(5)

Practice Questions

482

(1)

Data Exercises

483

(1)

References and Further Reading

484

(1)

17.U1 Under the Hood: The Gini Node Impurity Measure and MSE

484

(1)

17.U2 Under the Hood: On the Method of Finding an Optimal Threshold

485

(2)

18 Forecasting from Time Series Data

487

(30)

18.1 Forecasting: Prediction Using Time Series Data

488

(1)

18.2 Holdout, Training, and Test Samples in Time Series Data

489

(2)

18.3 Long-Horizon Forecasting: Seasonality and Predictable Events

491

(1)

18.4 Long-Horizon Forecasting: Trends

492

(8)

18.A1 Case Study -- Forecasting Daily Ticket Volumes for a Swimming Pool

494

(6)

18.5 Forecasting for a Short Horizon Using the Patterns of Serial Correlation

500

(1)

18.6 Modeling Serial Correlation: AR(1)

500

(1)

18.7 Modeling Serial Correlation: ARIMA

501

(4)

18.B1 Case Study -- Forecasting a Home Price Index

503

(2)

18.8 VAR: Vector Autoregressions

505

(4)

18.B2 Case Study -- Forecasting a Home Price index

507

(2)

18.9 External Validity of Forecasts

509

(3)

18.B3 Case Study -- Forecasting a Home Price Index

510

(2)

18.10 Main Takeaways

512

(5)

Practice Questions

512

(1)

Data Exercises

513

(1)

References and Further Reading

514

(1)

18.U1 Under the Hood: Details of the ARIMA Model

514

(2)

18.U2 Under the Hood: Auto-Arima

516

(1)

IV CAUSAL ANALYSIS

517

(187)

19 A Framework for Causal Analysis

519

(36)

19.1 Intervention, Treatment, Subjects, Outcomes

520

(2)

19.2 Potential Outcomes

522

(1)

19.3 The Individual Treatment Effect

523

(1)

19.4 Heterogeneous Treatment Effects

524

(1)

19.5 ATE: The Average Treatment Effect

525

(2)

19.6 Average Effects in Subgroups and ATET

527

(1)

19.7 Quantitative Causal Variables

527

(3)

19.A1 Case Study -- Food and Health

528

(2)

19.8 Ceteris Paribus: Other Things Being the Same

530

(1)

19.9 Causal Maps

531

(2)

19.10 Comparing Different Observations to Uncover Average Effects

533

(2)

19.11 Random Assignment

535

(1)

19.12 Sources of Variation in the Causal Variable

536

(3)

19.A2 Case Study -- Food and Health

537

(2)

19.13 Experimenting versus Conditioning

539

(2)

19.14 Confounders in Observational Data

541

(2)

19.15 From Latent Variables to Measured Variables

543

(1)

19.16 Bad Conditioners: Variables Not to Condition On

544

(5)

19.A3 Case Study -- Food and Health

545

(4)

19.17 External Validity, Internal Validity

549

(2)

19.18 Constructive Skepticism

551

(1)

19.19 Main Takeaways

552

(3)

Practice Questions

552

(1)

Data Exercises

553

(1)

References and Further Reading

554

(1)

20 Designing and Analyzing Experiments

555

(33)

20.1 Randomized Experiments and Potential Outcomes

556

(1)

20.2 Field Experiments, A/B Testing, Survey Experiments

557

(3)

20.A1 Case Study -- Working from Home and Employee Performance

558

(1)

20.B1 Case Study -- Fine Tuning Social Media Advertising

559

(1)

20.3 The Experimental Setup: Definitions

560

(1)

20.4 Random Assignment in Practice

560

(2)

20.5 Number of Subjects and Proportion Treated

562

(1)

20.6 Random Assignment and Covariate Balance

563

(4)

20.A2 Case Study -- Working from Home and Employee Performance

565

(2)

20.7 Imperfect Compliance and Intent-to-Treat

567

(3)

20.A3 Case Study -- Working from Home and Employee Performance

569

(1)

20.8 Estimation and Statistical Inference

570

(2)

20.B2 Case Study -- Fine Tuning Social Media Advertising

571

(1)

20.9 Including Covariates in a Regression

572

(4)

20.A4 Case Study -- Working from Home and Employee Performance

573

(3)

20.10 Spillovers

576

(1)

20.11 Additional Threats to Internal Validity

577

(4)

20.A5 Case Study -- Working from Home and Employee Performance

579

(2)

20.12 External Validity, and How to Use the Results in Decision Making

581

(2)

20.A6 Case Study -- Working from Home and Employee Performance

582

(1)

20.13 Main Takeaways

583

(5)

Practice Questions

584

(1)

Data Exercises

585

(1)

References and Further Reading

585

(1)

20.U1 Under the Hood: LATE: The Local Average Treatment Effect

586

(1)

20.U2 Under the Hood: The Formula for Sample Size Calculation

586

(2)

21 Regression and Matching with Observational Data

588

(32)

21.1 Thought Experiments

589

(2)

21.A1 Case Study -- Founder/Family Ownership and Quality of Management

590

(1)

21.2 Variables to Condition on. Variables Not to Condition On

591

(4)

21.A2 Case Study -- Founder/Family Ownership and Quality of Management

592

(3)

21.3 Conditioning on Confounders by Regression

595

(2)

21.4 Selection of Variables and Functional Form in a Regression for Causal Analysis

597

(4)

21.A3 Case Study -- Founder/Family Ownership and Quality of Management

598

(3)

21.5 Matching

601

(2)

21.6 Common Support

603

(1)

21.7 Matching on the Propensity Score

604

(3)

21.A4 Case Study -- Founder/Family Ownership and Quality of Management

605

(2)

21.8 Comparing Linear Regression and Matching

607

(3)

21.A5 Case Study -- Founder/Family Ownership and Quality of Management

609

(1)

21.9 Instrumental Variables

610

(3)

21.10 Regression-Discontinuity

613

(1)

21.11 Main Takeaways

614

(6)

Practice Questions

614

(1)

Data Exercises

615

(1)

References and Further Reading

616

(1)

21.U1 Under the Hood: Unobserved Heterogeneity and Endogenous x in a Regression

616

(2)

21.U2 Under the hood: LATE is IV

618

(2)

22 Difference-in-Differences

620

(29)

22.1 Conditioning on Pre-intervention Outcomes

621

(1)

22.2 Basic Difference-in-Differences Analysis: Comparing Average Changes

622

(7)

22.A1 Case Study -- How Does a Merger between Airlines Affect Prices?

625

(4)

22.3 The Parallel Trends Assumption

629

(4)

22.A2 Case Study -- How Does a Merger between Airlines Affect Prices?

631

(2)

22.4 Conditioning on Additional Confounders in Diff-in-Diffs Regressions

633

(4)

22.A3 Case Study -- How Does a Merger between Airlines Affect Prices?

635

(2)

22.5 Quantitative Causal Variable

637

(3)

22.A4 Case Study -- How Does a Merger between Airlines Affect Prices?

638

(2)

22.6 Difference-in-Differences with Pooled Cross-Sections

640

(5)

22.A5 Case Study -- How Does a Merger between Airlines Affect Prices?

643

(2)

22.7 Main Takeaways

645

(4)

Practice Questions

646

(1)

Data Exercises

647

(1)

References and Further Reading

648

(1)

23 Methods for Panel Data

649

(32)

23.1 Multiple Time Periods Can Be Helpful

650

(1)

23.2 Estimating Effects Using Observational Time Series

651

(2)

23.3 Lags to Estimate the Time Path of Effects

653

(1)

23.4 Leads to Examine Pre-trends and Reverse Effects

653

(1)

23.5 Pooled Time Series to Estimate the Effect for One Unit

654

(5)

23.A1 Case Study -- Import Demand and Industrial Production

656

(3)

23.6 Panel Regression with Fixed Effects

659

(2)

23.7 Aggregate Trend

661

(4)

23.B1 Case Study -- Immunization against Measles and Saving Children

662

(3)

23.8 Clustered Standard Errors

665

(1)

23.9 Panel Regression in First Differences

666

(1)

23.10 Lags and Leads in FD Panel Regressions

667

(4)

23.B2 Case Study -- Immunization against Measles and Saving Children

669

(2)

23.11 Aggregate Trend and Individual Trends in FD Models

671

(3)

23.B3 Case Study -- Immunization against Measles and Saving Children

672

(2)

23.12 Panel Regressions and Causality

674

(1)

23.13 First Differences or Fixed Effects?

675

(2)

23.14 Dealing with Unbalanced Panels

677

(1)

23.15 Main Takeaways

678

(3)

Practice Questions

678

(2)

Data Exercises

680

(1)

References and Further Reading

680

(1)

24 Appropriate Control Groups for Panel Data

681

(23)

24.1 When and Why to Select a Control Group in xt Panel Data

682

(1)

24.2 Comparative Case Studies

682

(1)

24.3 The Synthetic Control Method

683

(4)

24.A1 Case Study -- Estimating the Effect of the 2010 Haiti Earthquake on GDP

684

(3)

24.4 Event Studies

687

(7)

24.B1 Case Study -- Estimating the Impact of Replacing Football Team Managers

690

(4)

24.5 Selecting a Control Group in Event Studies

694

(7)

24.B2 Case Study -- Estimating the Impact of Replacing Football Team Managers

696

(5)

24.6 Main Takeaways

701

(3)

Practice Questions

701

(1)

Data Exercises

702

(1)

References and Further Reading

703

(1)

References

704

(5)

Index

709

Gįbor Békés is an assistant professor at the Department of Economics and Business of the Central European University, and Director of the Business Analytics Program. He is a senior fellow at KRTK and a research affiliate at the Center for Economic Policy Research (CEPR). He has published in top economics journals on multinational firm activities and productivity, business clusters, and innovation spillovers. He has managed international data collection projects on firm performance and supply chains. He has done policy advising (the European Commission, ECB) as well as private-sector consultancy (in finance, business intelligence, and real estate). He has taught graduate-level data analysis and economic geography courses since 2012. Gįbor Kézdi is a research associate professor at the University of Michigan's Institute for Social Research. He has published in top journals in economics, statistics, and political science on topics including household finances, health, education, demography, and ethnic disadvantages and prejudice. He has managed several data collection projects in Europe; currently, he is co-investigator of the Health and Retirement Study in the US. He has consulted for various governmental and non-governmental institutions on the disadvantage of the Roma minority and the evaluation of social interventions. He has taught data analysis, econometrics, and labor economics from undergraduate to Ph.D. levels since 2002, and supervised a number of MA and Ph.D. students.

Data Analysis for Business, Economics, and Policy [Pehme köide]

Arvustused

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv