Muutke küpsiste eelistusi

Handbook of Regression Modeling in People Analytics: With Examples in R and Python [Kõva köide]

  • Formaat: Hardback, 272 pages, kõrgus x laius: 234x156 mm, kaal: 576 g, 48 Line drawings, color; 48 Illustrations, color
  • Ilmumisaeg: 30-Jul-2021
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 1032041749
  • ISBN-13: 9781032041742
Teised raamatud teemal:
  • Formaat: Hardback, 272 pages, kõrgus x laius: 234x156 mm, kaal: 576 g, 48 Line drawings, color; 48 Illustrations, color
  • Ilmumisaeg: 30-Jul-2021
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 1032041749
  • ISBN-13: 9781032041742
Teised raamatud teemal:
"This book is a learning resource on inferential statistics and regression analysis. It teaches how to do a wide range of statistical analyses in both R and in Python, ranging from simple hypothesis testing to advanced multivariate modelling. Although itis primarily focused on examples related to the analysis of people and talent, the methods easily transfer to any discipline. The book hits a 'sweet spot' where there is just enough mathematical theory to support a strong understanding of the methods, but with a step-by-step guide and easily reproducible examples and code, so that the methods can be put into practice immediately. This makes the book accessible to a wide readership, from public and private sector analysts and practitioners to students andresearchers"--

This book is a learning resource on inferential statistics and regression analysis. It teaches how to do a wide range of statistical analyses in both R and in Python, ranging from simple hypothesis testing to advanced multivariate modelling.



Despite the recent rapid growth in machine learning and predictive analytics, many of the statistical questions that are faced by researchers and practitioners still involve explaining why something is happening. Regression analysis is the best ‘swiss army knife’ we have for answering these kinds of questions.

This book is a learning resource on inferential statistics and regression analysis. It teaches how to do a wide range of statistical analyses in both R and in Python, ranging from simple hypothesis testing to advanced multivariate modelling. Although it is primarily focused on examples related to the analysis of people and talent, the methods easily transfer to any discipline. The book hits a ‘sweet spot’ where there is just enough mathematical theory to support a strong understanding of the methods, but with a step-by-step guide and easily reproducible examples and code, so that the methods can be put into practice immediately. This makes the book accessible to a wide readership, from public and private sector analysts and practitioners to students and researchers.

Key Features:

• 16 accompanying datasets across a wide range of contexts (e.g. academic, corporate, sports, marketing)
• Clear step-by-step instructions on executing the analyses.
• Clear guidance on how to interpret results.
• Primary instruction in R but added sections for Python coders.
• Discussion exercises and data exercises for each of the main chapters.
• Final chapter of practice material and datasets ideal for class homework or project work.

Foreword xiii
Alexis Fink
Introduction xv
1 The Importance of Regression in People Analytics
1(8)
1.1 Why is regression modeling so important in people analytics?
2(1)
1.2 What do we mean by `modeling'?
3(3)
1.2.1 The theory of inferential modeling
3(2)
1.2.2 The process of inferential modeling
5(1)
1.3 The structure, system and organization of this book
6(3)
2 The Basics of the R Programming Language
9(30)
2.1 What is R?
10(1)
2.2 How to start using R
10(1)
2.3 Data in R
11(7)
2.3.1 Data types
13(1)
2.3.2 Homogeneous data structures
14(2)
2.3.3 Heterogeneous data structures
16(2)
2.4 Working with dataframes
18(6)
2.4.1 Loading and tidying data in dataframes
18(4)
2.4.2 Manipulating dataframes
22(2)
2.5 Functions, packages and libraries
24(5)
2.5.1 Using functions
24(1)
2.5.2 Help with functions
25(1)
2.5.3 Writing your own functions
26(1)
2.5.4 Installing packages
26(1)
2.5.5 Using packages
27(1)
2.5.6 The pipe operator
28(1)
2.6 Errors, warnings and messages
29(2)
2.7 Plotting and graphing
31(3)
2.7.1 Plotting in base R
31(2)
2.7.2 Specialist plotting and graphing packages
33(1)
2.8 Documenting your work using R Markdown
34(3)
2.9 Learning exercises
37(2)
2.9.1 Discussion questions
37(1)
2.9.2 Data exercises
38(1)
3 Statistics Foundations
39(26)
3.1 Elementary descriptive statistics of populations and samples
40(6)
3.1.1 Mean, variance and standard deviation
40(3)
3.1.2 Covariance and correlation
43(3)
3.2 Distribution of random variables
46(3)
3.2.1 Sampling of random variables
46(1)
3.2.2 Standard errors, the t-distribution and confidence intervals
47(2)
3.3 Hypothesis testing
49(9)
3.3.1 Testing for a difference in means (Welch's i-test)
51(3)
3.3.2 Testing for a non-zero correlation between two variables (i-test for correlation)
54(2)
3.3.3 Testing for a difference in frequency distribution between different categories in a data set (Chi-square test)
56(2)
3.4 Foundational statistics in Python
58(4)
3.5 Learning exercises
62(3)
3.5.1 Discussion tiuestions
62(1)
3.5.2 Data exercises
63(2)
4 Linear Regression for Continuous Outcomes
65(36)
4.1 When to use it
65(4)
4.1.1 Origins and intuition of linear regression
65(1)
4.1.2 Use cases for linear regression
66(1)
4.1.3 Walkthrough example
67(2)
4.2 Simple linear regression
69(7)
4.2.1 Linear relationship between a single input and an outcome
70(1)
4.2.2 Minimising the error
70(3)
4.2.3 Determining the best fit
73(1)
4.2.4 Measuring the fit of the model
74(2)
4.3 Multiple linear regression
76(6)
4.3.1 Running a multiple linear regression model and interpreting its coefficients
76(1)
4.3.2 Coefficient confidence
77(1)
4.3.3 Model `goodness-of-fit'
78(3)
4.3.4 Making predictions from your model
81(1)
4.4 Managing inputs in linear regression
82(4)
4.4.1 Relevance of input variables
83(1)
4.4.2 Sparseness (`missingness') of data
83(1)
4.4.3 Transforming categorical inputs to dummy variables
84(2)
4.5 Testing your model assumptions
86(7)
4.5.1 Assumption of linearity and additivity
86(2)
4.5.2 Assumption of constant error variance
88(1)
4.5.3 Assumption of normally distributed errors
89(1)
4.5.4 Avoiding high collinearity and multicollinearity between input variables
90(3)
4.6 Extending multiple linear regression
93(4)
4.6.1 Interactions between input variables
93(3)
4.6.2 Quadratic and higher-order polynomial terms
96(1)
4.7 Learning exercises
97(4)
4.7.1 Discussion questions
97(1)
4.7.2 Data exercises
97(4)
5 Binomial Logistic Regression for Binary Outcomes
101(26)
5.1 When to use it
102(4)
5.1.1 Origins and intuition of binomial logistic regression
102(1)
5.1.2 Use cases for binomial logistic regression
103(1)
5.1.3 Walkthrough example
104(2)
5.2 Modeling probabilistic outcomes using a logistic function
106(6)
5.2.1 Deriving the concept of log odds
107(2)
5.2.2 Modeling the log odds and interpreting the coefficients
109(1)
5.2.3 Odds versus probability
110(2)
5.3 Running a multivariate binomial logistic regression model
112(10)
5.3.1 Running and interpreting a multivariate binomial logistic regression model
113(3)
5.3.2 Understanding the fit and goodness-of-fit of a binomial logistic regression model
116(4)
5.3.3 Model parsimony
120(2)
5.4 Other considerations in binomial logistic regression
122(2)
5.5 Learning exercises
124(3)
5.5.1 Discussion questions
124(1)
5.5.2 Data exercises
124(3)
6 Multinomial Logistic Regression for Nominal Category Outcomes
127(16)
6.1 When to use it
127(4)
6.1.1 Intuition for multinomial logistic regression
127(1)
6.1.2 Use cases for multinomial logistic regression
128(1)
6.1.3 Walkthrough example
128(3)
6.2 Running stratified binomial models
131(2)
6.2.1 Modeling the choice of Product A versus other products
131(2)
6.2.2 Modeling other choices
133(1)
6.3 Running a multinomial regression model
133(5)
6.3.1 Defining a reference level and running the model
134(2)
6.3.2 Interpreting the model
136(1)
6.3.3 Changing the reference
137(1)
6.4 Model simplification, fit and goodness-of-fit for multinomial logistic regression models
138(2)
6.4.1 Gradual safe elimination of variables
138(1)
6.4.2 Model fit and goodness-of-fit
139(1)
6.5 Learning exercises
140(3)
6.5.1 Discussion questions
140(1)
6.5.2 Data exercises
141(2)
7 Proportional Odds Logistic Regression for Ordered Category Outcomes
143(20)
7.1 When to use it
143(5)
7.1.1 Intuition for proportional odds logistic regression
143(2)
7.1.2 Use cases for proportional odds logistic regression
145(1)
7.1.3 Walkthrough example
145(3)
7.2 Modeling ordinal outcomes under the assumption of proportional odds
148(7)
7.2.1 Using a latent continuous outcome variable to derive a proportional odds model
148(2)
7.2.2 Running a proportional odds logistic regression model
150(3)
7.2.3 Calculating the likelihood of an observation being in a specific ordinal category
153(1)
7.2.4 Model diagnostics
154(1)
7.3 Testing the proportional odds assumption
155(4)
7.3.1 Sighting the coefficients of stratified binomial models
156(1)
7.3.2 The Brant-Wald test
157(1)
7.3.3 Alternatives to proportional odds models
158(1)
7.4 Learning exercises
159(4)
7.4.1 Discussion questions
159(1)
7.4.2 Data exercises
160(3)
8 Modeling Explicit and Latent Hierarchy in Data
163(24)
8.1 Mixed models for explicit hierarchy in data
164(6)
8.1.1 Fixed and random effects
164(1)
8.1.2 Running a mixed model
165(5)
8.2 Structural equation models for latent hierarchy in data
170(15)
8.2.1 Running and assessing the measurement model
173(7)
8.2.2 Running and interpreting the structural model
180(5)
8.3 Learning exercises
185(2)
8.3.1 Discussion questions
185(1)
8.3.2 Data exercises
185(2)
9 Survival Analysis for Modeling Singular Events Over Time
187(16)
9.1 Tracking and illustrating survival rates over the study period
189(4)
9.2 Cox proportional hazard regression models
193(4)
9.2.1 Running a Cox proportional hazard regression model
194(2)
9.2.2 Checking the proportional hazard assumption
196(1)
9.3 Frailty models
197(3)
9.4 Learning exercises
200(3)
9.4.1 Discussion questions
200(1)
9.4.2 Data exercises
201(2)
10 Alternative Technical Approaches in R and Python
203(18)
10.1 `Tidier' modeling approaches in R
204(5)
10.1.1 The broom package
204(4)
10.1.2 The parsnip package
208(1)
10.2 Inferential statistical modeling in Python
209(12)
10.2.1 Ordinary Least Squares (OLS) linear regression
209(2)
10.2.2 Binomial logistic regression
211(1)
10.2.3 Multinomial logistic regression
212(1)
10.2.4 Structural equation models
213(2)
10.2.5 Survival analysis
215(3)
10.2.6 Other model variants
218(3)
11 Power Analysis to Estimate Required Sample Sizes for Modeling
221(14)
11.1 Errors, effect sizes and statistical power
222(2)
11.2 Power analysis for simple hypothesis tests
224(4)
11.3 Power analysis for linear regression models
228(1)
11.4 Power analysis for log-likelihood regression models
229(2)
11.5 Power analysis for hierarchicarregression models
231(1)
11.6 Power analysis using Python
232(3)
12 Further Exercises for Practice
235(12)
12.1 Analyzing graduate salaries
235(2)
12.1.1 The graduates data set
236(1)
12.1.2 Discussion questions
236(1)
12.1.3 Data exercises
236(1)
12.2 Analyzing a recruiting process
237(2)
12.2.1 The recruiting data set
238(1)
12.2.2 Discussion questions
238(1)
12.2.3 Data exercises
239(1)
12.3 Analyzing the drivers of performance ratings
239(2)
12.3.1 The employee performance data set
240(1)
12.3.2 Discussion questions
240(1)
12.3.3 Data exercises
241(1)
12.4 Analyzing promotion differences between groups
241(2)
12.4.1 The promotion data set
242(1)
12.4.2 Discussion questions
242(1)
12.4.3 Data exercises
242(1)
12.5 Analyzing feedback on learning programs
243(4)
12.5.1 The learning data set
243(1)
12.5.2 Discussion questions
244(1)
12.5.3 Data exercises
244(3)
References 247(2)
Glossary 249(4)
Index 253
Keith McNulty, PhD is a leading practitioner of applied statistics, psychometrics and people analytics. He is currently Global Director of Talent Science and Analytics at McKinsey and Company.