| Preface |
|
xv | |
| Chapter 1 The Baseball Datasets |
|
1 | (32) |
|
|
|
1 | (1) |
|
1.2 The Lahman Database: Season-by-Season Data |
|
|
1 | (13) |
|
1.2.1 Bonds, Aaron, Ruth, and Rodriguez home run trajectories |
|
|
1 | (2) |
|
1.2.2 Obtaining the database |
|
|
3 | (2) |
|
|
|
5 | (1) |
|
|
|
6 | (1) |
|
|
|
7 | (4) |
|
|
|
11 | (2) |
|
|
|
13 | (1) |
|
|
|
13 | (1) |
|
1.3 Retrosheet Game-by-Game Data |
|
|
14 | (3) |
|
1.3.1 The 1998 McGwire and Sosa home run race |
|
|
14 | (1) |
|
|
|
15 | (1) |
|
|
|
15 | (1) |
|
1.3.4 Obtaining the game logs from Retrosheet |
|
|
16 | (1) |
|
|
|
16 | (1) |
|
|
|
17 | (1) |
|
1.4 Retrosheet Play-by-Play Data |
|
|
17 | (4) |
|
|
|
17 | (2) |
|
|
|
19 | (2) |
|
|
|
21 | (1) |
|
|
|
21 | (4) |
|
1.5.1 MLBAM Gameday and PITCHf/x |
|
|
21 | (1) |
|
|
|
22 | (2) |
|
|
|
24 | (1) |
|
1.6 Player Movement and Off-the-Bat Data |
|
|
25 | (2) |
|
|
|
25 | (1) |
|
1.6.2 Baseball Savant data |
|
|
25 | (2) |
|
|
|
27 | (1) |
|
|
|
27 | (1) |
|
|
|
28 | (1) |
|
|
|
28 | (5) |
| Chapter 2 Introduction to R |
|
33 | (34) |
|
|
|
33 | (1) |
|
2.2 Installing R and RStudio |
|
|
33 | (1) |
|
|
|
34 | (2) |
|
|
|
35 | (1) |
|
|
|
35 | (1) |
|
|
|
36 | (1) |
|
|
|
36 | (1) |
|
|
|
36 | (6) |
|
2.4.1 Career of Warren Spahn |
|
|
36 | (1) |
|
|
|
37 | (2) |
|
2.4.3 Manipulations with data frames |
|
|
39 | (2) |
|
2.4.4 Merging and selecting from data frames |
|
|
41 | (1) |
|
|
|
42 | (4) |
|
2.5.1 Defining and computing with vectors |
|
|
42 | (1) |
|
|
|
43 | (2) |
|
2.5.3 Vector index and logical variables |
|
|
45 | (1) |
|
2.6 Objects and Containers in R |
|
|
46 | (6) |
|
2.6.1 Character data and data frames |
|
|
47 | (2) |
|
|
|
49 | (2) |
|
|
|
51 | (1) |
|
2.7 Collection of R Commands |
|
|
52 | (3) |
|
|
|
52 | (1) |
|
|
|
53 | (2) |
|
2.8 Reading and Writing Data in R |
|
|
55 | (2) |
|
2.8.1 Importing data from a file |
|
|
55 | (1) |
|
|
|
56 | (1) |
|
|
|
57 | (1) |
|
2.10 Splitting, Applying, and Combining Data |
|
|
58 | (4) |
|
2.10.1 Iterating using map() |
|
|
59 | (1) |
|
|
|
60 | (2) |
|
|
|
62 | (1) |
|
|
|
63 | (1) |
|
|
|
63 | (4) |
| Chapter 3 Graphics |
|
67 | (26) |
|
|
|
67 | (1) |
|
|
|
67 | (3) |
|
|
|
67 | (2) |
|
3.2.2 Add axes labels and a title |
|
|
69 | (1) |
|
3.2.3 Other graphs of a character variable |
|
|
69 | (1) |
|
|
|
70 | (2) |
|
3.4 Numeric Variable: One-Dimensional Scatterplot and Histogram |
|
|
72 | (3) |
|
3.5 Two Numeric Variables |
|
|
75 | (6) |
|
|
|
75 | (1) |
|
3.5.2 Building a graph, step-by-step |
|
|
76 | (5) |
|
3.6 A Numeric Variable and a Factor Variable |
|
|
81 | (2) |
|
3.6.1 Parallel stripcharts |
|
|
81 | (1) |
|
|
|
81 | (2) |
|
3.7 Comparing Ruth, Aaron, Bonds, and A-Rod |
|
|
83 | (2) |
|
|
|
83 | (1) |
|
3.7.2 Creating the player data frames |
|
|
84 | (1) |
|
3.7.3 Constructing the graph |
|
|
85 | (1) |
|
3.8 The 1998 Home Run Race |
|
|
85 | (3) |
|
|
|
86 | (1) |
|
3.8.2 Extracting the variables |
|
|
86 | (1) |
|
3.8.3 Constructing the graph |
|
|
87 | (1) |
|
|
|
88 | (1) |
|
|
|
88 | (5) |
| Chapter 4 The Relation Between Runs and Wins |
|
93 | (18) |
|
|
|
93 | (1) |
|
4.2 The Teams Table in the Lahman Database |
|
|
93 | (2) |
|
|
|
95 | (4) |
|
4.4 The Pythagorean Formula for Winning Percentage |
|
|
99 | (7) |
|
4.4.1 The Exponent in the Pythagorean model |
|
|
101 | (1) |
|
4.4.2 Good and bad predictions by the Pythagorean model |
|
|
102 | (4) |
|
4.5 How Many Runs for a Win? |
|
|
106 | (2) |
|
|
|
108 | (1) |
|
|
|
109 | (2) |
| Chapter 5 Value of Plays Using Run Expectancy |
|
111 | (26) |
|
5.1 The Run Expectancy Matrix |
|
|
111 | (1) |
|
5.2 Runs Scored in the Remainder of the Inning |
|
|
112 | (1) |
|
|
|
113 | (3) |
|
5.4 Measuring Success of a Batting Play |
|
|
116 | (1) |
|
|
|
117 | (3) |
|
5.6 Opportunity and Success for All Hitters |
|
|
120 | (3) |
|
5.7 Position in the Batting Lineup |
|
|
123 | (2) |
|
5.8 Run Values of Different Base Hits |
|
|
125 | (6) |
|
5.8.1 Value of a home run |
|
|
126 | (2) |
|
|
|
128 | (3) |
|
5.9 Value of Base Stealing |
|
|
131 | (3) |
|
5.10 Further Reading and Software |
|
|
134 | (1) |
|
|
|
135 | (2) |
| Chapter 6 Balls and Strikes Effects |
|
137 | (26) |
|
|
|
137 | (1) |
|
6.2 Hitter's Counts and Pitcher's Counts |
|
|
137 | (12) |
|
6.2.1 An example for a single pitcher |
|
|
137 | (2) |
|
6.2.2 Pitch sequences from Retrosheet |
|
|
139 | (5) |
|
6.2.2.1 Functions for string manipulation |
|
|
141 | (1) |
|
6.2.2.2 Finding plate appearances going through a given count |
|
|
142 | (2) |
|
6.2.3 Expected run value by count |
|
|
144 | (4) |
|
6.2.4 The importance of the previous count |
|
|
148 | (1) |
|
|
|
149 | (11) |
|
6.3.1 Swinging tendencies by count |
|
|
150 | (6) |
|
6.3.1.1 Propensity to swing by location |
|
|
150 | (3) |
|
6.3.1.2 Effect of the ball/strike count |
|
|
153 | (3) |
|
6.3.2 Pitch selection by count |
|
|
156 | (2) |
|
6.3.3 Umpires' behavior by count |
|
|
158 | (2) |
|
|
|
160 | (1) |
|
|
|
161 | (2) |
| Chapter 7 Catcher Framing |
|
163 | (16) |
|
|
|
163 | (1) |
|
7.2 Acquiring Pitch-Level Data |
|
|
164 | (1) |
|
7.3 Where Is the Strike Zone? |
|
|
165 | (2) |
|
7.4 Modeling Called Strike Percentage |
|
|
167 | (5) |
|
7.4.1 Visualizing the estimates |
|
|
168 | (1) |
|
7.4.2 Visualizing the estimated surface |
|
|
169 | (1) |
|
7.4.3 Controlling for handedness |
|
|
169 | (3) |
|
7.5 Modeling Catcher Framing |
|
|
172 | (4) |
|
|
|
176 | (1) |
|
|
|
176 | (3) |
| Chapter 8 Career Trajectories |
|
179 | (22) |
|
|
|
179 | (1) |
|
8.2 Mickey Mantle's Batting Trajectory |
|
|
180 | (5) |
|
8.3 Comparing Trajectories |
|
|
185 | (8) |
|
8.3.1 Some preliminary work |
|
|
185 | (1) |
|
8.3.2 Computing career statistics |
|
|
186 | (1) |
|
8.3.3 Computing similarity scores |
|
|
187 | (1) |
|
8.3.4 Defining age, OBP, SLG, and OPS variables |
|
|
188 | (1) |
|
8.3.5 Fitting and plotting trajectories |
|
|
189 | (4) |
|
8.4 General Patterns of Peak Ages |
|
|
193 | (4) |
|
8.4.1 Computing all fitted trajectories |
|
|
193 | (2) |
|
8.4.2 Patterns of peak age over time |
|
|
195 | (1) |
|
8.4.3 Peak age and career at-bats |
|
|
195 | (2) |
|
8.5 Trajectories and Fielding Position |
|
|
197 | (1) |
|
|
|
198 | (1) |
|
|
|
199 | (2) |
| Chapter 9 Simulation |
|
201 | (28) |
|
|
|
201 | (1) |
|
9.2 Simulating a Half Inning |
|
|
202 | (13) |
|
|
|
202 | (1) |
|
9.2.2 Review of work in run expectancy |
|
|
203 | (1) |
|
9.2.3 Computing the transition probabilities |
|
|
204 | (2) |
|
9.2.4 Simulating the Markov chain |
|
|
206 | (4) |
|
9.2.5 Beyond run expectancy |
|
|
210 | (2) |
|
9.2.6 Transition probabilities for individual teams |
|
|
212 | (3) |
|
9.3 Simulating a Baseball Season |
|
|
215 | (11) |
|
9.3.1 The Bradley-Terry model |
|
|
215 | (1) |
|
9.3.2 Making up a schedule |
|
|
216 | (1) |
|
9.3.3 Simulating talents and computing win probabilities |
|
|
217 | (1) |
|
9.3.4 Simulating the regular season |
|
|
218 | (1) |
|
9.3.5 Simulating the post-season |
|
|
218 | (2) |
|
9.3.6 Function to simulate one season |
|
|
220 | (2) |
|
9.3.7 Simulating many seasons |
|
|
222 | (4) |
|
|
|
226 | (1) |
|
|
|
226 | (3) |
| Chapter 10 Exploring Streaky Performances |
|
229 | (22) |
|
|
|
229 | (1) |
|
|
|
230 | (4) |
|
10.2.1 Finding game hitting streaks |
|
|
230 | (2) |
|
10.2.2 Moving batting averages |
|
|
232 | (2) |
|
10.3 Streaks in Individual At-Bats |
|
|
234 | (8) |
|
10.3.1 Streaks of hits and outs |
|
|
234 | (2) |
|
10.3.2 Moving batting averages |
|
|
236 | (1) |
|
10.3.3 Finding hitting slumps for all players |
|
|
236 | (3) |
|
10.3.4 Were Ichiro Suzuki and Mike Trout unusually streaky? |
|
|
239 | (3) |
|
10.4 Local Patterns of Statcast Launch Velocity |
|
|
242 | (5) |
|
|
|
247 | (1) |
|
|
|
248 | (3) |
| Chapter 11 Using a Database to Compute Park Factors |
|
251 | (18) |
|
|
|
251 | (1) |
|
11.2 Installing MySQL and Creating a Database |
|
|
252 | (1) |
|
11.3 Connecting R to MySQL |
|
|
253 | (1) |
|
11.3.1 Connecting using RMySQL |
|
|
253 | (1) |
|
11.3.2 Connecting R to other SQL backends |
|
|
253 | (1) |
|
11.4 Filling a MySQL Game Log Database from R |
|
|
254 | (2) |
|
11.4.1 From Retrosheet to R |
|
|
254 | (1) |
|
|
|
255 | (1) |
|
11.5 Querying Data from R |
|
|
256 | (4) |
|
|
|
256 | (2) |
|
11.5.2 Coors Field and run scoring |
|
|
258 | (2) |
|
11.6 Building Your Own Baseball Database |
|
|
260 | (2) |
|
|
|
260 | (1) |
|
11.6.2 Retrosheet database |
|
|
261 | (1) |
|
|
|
261 | (1) |
|
|
|
261 | (1) |
|
11.7 Calculating Basic Park Factors |
|
|
262 | (5) |
|
11.7.1 Loading the data into R |
|
|
262 | (1) |
|
11.7.2 Home run park factor |
|
|
263 | (2) |
|
11.7.3 Assumptions of the proposed approach |
|
|
265 | (1) |
|
11.7.4 Applying park factors |
|
|
265 | (2) |
|
|
|
267 | (1) |
|
|
|
267 | (2) |
| Chapter 12 Batted Ball Data from Statcast |
|
269 | (24) |
|
|
|
269 | (1) |
|
|
|
269 | (5) |
|
12.2.1 Acquiring a year's worth of Statcast data |
|
|
271 | (2) |
|
12.2.2 Hitters' spray tendencies and infield defense |
|
|
273 | (1) |
|
12.3 Launch Angles and Exit Velocities |
|
|
274 | (2) |
|
12.3.1 Scatterplot of launch angle vs. exit velocity |
|
|
275 | (1) |
|
12.4 Modeling Home Run Probabilities |
|
|
276 | (6) |
|
12.4.1 Generalized additive model |
|
|
278 | (1) |
|
12.4.2 Smooth predictions |
|
|
279 | (1) |
|
12.4.3 Using this model to estimate 2017 home run production |
|
|
280 | (2) |
|
12.5 Are Launch Angles Skills? |
|
|
282 | (8) |
|
12.5.1 Distribution of launch angle |
|
|
282 | (4) |
|
12.5.2 Half-season correlation of launch angle |
|
|
286 | (4) |
|
|
|
290 | (1) |
|
|
|
290 | (3) |
| Appendix A Retrosheet Files Reference |
|
293 | (10) |
|
A.1 Downloading Play-by-Play Files |
|
|
293 | (3) |
|
|
|
293 | (1) |
|
|
|
293 | (1) |
|
A.1.3 Using a special function for a particular season |
|
|
293 | (1) |
|
A.1.4 Reading the files into R |
|
|
294 | (1) |
|
A.1.5 The function parse_retrosheet_pbp() |
|
|
294 | (2) |
|
A.2 Retrosheet Event Files: a Short Reference |
|
|
296 | (3) |
|
A.2.1 Game and event identifiers |
|
|
297 | (1) |
|
A.2.2 The state of the game |
|
|
297 | (2) |
|
A.3 Parsing Retrosheet Pitch Sequences |
|
|
299 | (4) |
|
|
|
299 | (1) |
|
|
|
299 | (1) |
|
A.3.3 Evaluating every count |
|
|
300 | (3) |
| Appendix B Accessing and Using MLBAM Gameday and PITCHf/x Data |
|
303 | (14) |
|
|
|
303 | (1) |
|
B.2 Where Are the Data Stored? |
|
|
303 | (3) |
|
B.3 Working with PITCHf/x Data |
|
|
306 | (3) |
|
B.3.1 Obtaining data from on-line resources |
|
|
306 | (1) |
|
|
|
306 | (1) |
|
|
|
307 | (1) |
|
B.3.4 An R package for PITCHf/x data: pit chRx |
|
|
308 | (1) |
|
|
|
309 | (3) |
|
|
|
309 | (1) |
|
|
|
310 | (2) |
|
B.4.3 hip attributes (hit locations data) |
|
|
312 | (1) |
|
B.5 Notes About the Gameday and PITCHf/x Data |
|
|
312 | (1) |
|
|
|
313 | (4) |
|
B.6.1 Calculating the pitch trajectory |
|
|
313 | (1) |
|
B.6.2 Cross-referencing with other data sources |
|
|
314 | (1) |
|
|
|
314 | (3) |
| Appendix C Accessing and Using Statcast Data from Baseball-Savant |
|
317 | (6) |
|
|
|
317 | (1) |
|
C.2 Game Situation Variables |
|
|
317 | (1) |
|
|
|
318 | (1) |
|
|
|
318 | (1) |
|
C.5 Batted Ball Variables |
|
|
319 | (1) |
|
|
|
320 | (1) |
|
|
|
320 | (3) |
| References |
|
323 | (10) |
| Index |
|
333 | |