| Preface |
|
xv | |
|
|
|
1 | (28) |
|
|
|
2 | (1) |
|
1.2 The Lahman Database: Season-by-Season Data |
|
|
2 | (12) |
|
1.2.1 Bonds, Aaron, and Ruth home run trajectories |
|
|
2 | (2) |
|
1.2.2 Obtaining the database |
|
|
4 | (1) |
|
|
|
4 | (2) |
|
|
|
6 | (2) |
|
|
|
8 | (3) |
|
|
|
11 | (1) |
|
|
|
11 | (2) |
|
|
|
13 | (1) |
|
1.3 Retrosheet Game-by-Game Data |
|
|
14 | (4) |
|
1.3.1 The 1998 McGwire and Sosa home run race |
|
|
14 | (1) |
|
|
|
14 | (1) |
|
|
|
15 | (1) |
|
1.3.4 Obtaining the game logs from Retrosheet |
|
|
16 | (1) |
|
|
|
16 | (1) |
|
|
|
16 | (2) |
|
1.4 Retrosheet Play-by-Play Data |
|
|
18 | (3) |
|
|
|
18 | (1) |
|
|
|
19 | (1) |
|
|
|
20 | (1) |
|
|
|
21 | (4) |
|
1.5.1 MLBAM Gameday and PITCHf/x |
|
|
21 | (1) |
|
|
|
22 | (2) |
|
|
|
24 | (1) |
|
|
|
25 | (1) |
|
|
|
26 | (1) |
|
|
|
26 | (3) |
|
|
|
29 | (30) |
|
|
|
30 | (1) |
|
2.2 Installing R and RStudio |
|
|
30 | (1) |
|
|
|
31 | (5) |
|
2.3.1 Career of Warren Spahn |
|
|
31 | (1) |
|
2.3.2 Vectors: defining and calculations |
|
|
32 | (2) |
|
|
|
34 | (1) |
|
2.3.4 Vector index and logical variables |
|
|
35 | (1) |
|
2.4 Objects and Containers in R |
|
|
36 | (5) |
|
2.4.1 Character data and matrices |
|
|
37 | (1) |
|
|
|
38 | (2) |
|
|
|
40 | (1) |
|
2.5 Collection of R Commands |
|
|
41 | (2) |
|
|
|
41 | (1) |
|
|
|
42 | (1) |
|
2.6 Reading and Writing Data in R |
|
|
43 | (2) |
|
2.6.1 Importing data from a file |
|
|
43 | (2) |
|
|
|
45 | (1) |
|
|
|
45 | (5) |
|
|
|
45 | (2) |
|
2.7.2 Manipulations with data frames |
|
|
47 | (2) |
|
2.7.3 Merging and selecting from data frames |
|
|
49 | (1) |
|
|
|
50 | (1) |
|
2.9 Splitting, Applying, and Combining Data |
|
|
50 | (4) |
|
|
|
51 | (1) |
|
2.9.2 Using ddply in the plyr package |
|
|
52 | (2) |
|
|
|
54 | (1) |
|
|
|
55 | (1) |
|
|
|
55 | (4) |
|
|
|
59 | (28) |
|
|
|
59 | (1) |
|
|
|
60 | (2) |
|
|
|
60 | (1) |
|
3.2.2 Add axes labels and a title |
|
|
61 | (1) |
|
3.2.3 Other graphs of a factor |
|
|
62 | (1) |
|
|
|
62 | (1) |
|
|
|
63 | (2) |
|
3.5 Numeric Variable: Stripchart and Histogram |
|
|
65 | (2) |
|
3.6 Two Numeric Variables |
|
|
67 | (6) |
|
|
|
67 | (2) |
|
3.6.2 Building a graph, step-by-step |
|
|
69 | (4) |
|
3.7 A Numeric Variable and a Factor Variable |
|
|
73 | (3) |
|
3.7.1 Parallel stripcharts |
|
|
74 | (1) |
|
|
|
74 | (2) |
|
3.8 Comparing Ruth, Aaron, Bonds, and A-Rod |
|
|
76 | (3) |
|
|
|
76 | (2) |
|
3.8.2 Creating the player data frames |
|
|
78 | (1) |
|
3.8.3 Constructing the graph |
|
|
78 | (1) |
|
3.9 The 1998 Home Run Race |
|
|
79 | (3) |
|
|
|
79 | (2) |
|
3.9.2 Extracting the variables |
|
|
81 | (1) |
|
3.9.3 Constructing the graph |
|
|
82 | (1) |
|
|
|
82 | (1) |
|
|
|
83 | (4) |
|
4 The Relation Between Runs and Wins |
|
|
87 | (18) |
|
|
|
87 | (1) |
|
4.2 The Teams Table in Lahman's Database |
|
|
88 | (1) |
|
|
|
89 | (4) |
|
4.4 The Pythagorean Formula for Winning Percentage |
|
|
93 | (2) |
|
4.5 The Exponent in the Pythagorean Formula |
|
|
95 | (1) |
|
4.6 Good and Bad Predictions by the Pythagorean Formula |
|
|
96 | (3) |
|
4.7 How Many Runs for a Win? |
|
|
99 | (3) |
|
|
|
102 | (1) |
|
|
|
102 | (3) |
|
5 Value of Plays Using Run Expectancy |
|
|
105 | (24) |
|
5.1 The Runs Expectancy Matrix |
|
|
105 | (1) |
|
5.2 Runs Scored in the Remainder of the Inning |
|
|
106 | (1) |
|
|
|
107 | (3) |
|
5.4 Measuring Success of a Batting Play |
|
|
110 | (1) |
|
|
|
111 | (3) |
|
5.6 Opportunity and Success for All Hitters |
|
|
114 | (2) |
|
5.7 Position in the Batting Lineup |
|
|
116 | (3) |
|
5.8 Run Values of Different Base Hits |
|
|
119 | (4) |
|
5.8.1 Value of a home run |
|
|
119 | (2) |
|
|
|
121 | (2) |
|
5.9 Value of Base Stealing |
|
|
123 | (3) |
|
5.10 Further Reading and Software |
|
|
126 | (1) |
|
|
|
126 | (3) |
|
|
|
129 | (32) |
|
|
|
129 | (1) |
|
|
|
130 | (14) |
|
|
|
130 | (1) |
|
6.2.2 The verlander dataset |
|
|
130 | (2) |
|
6.2.3 Basic plotting with lattice |
|
|
132 | (1) |
|
6.2.4 Multipanel conditioning |
|
|
133 | (1) |
|
6.2.5 Superposing group elements |
|
|
134 | (1) |
|
6.2.6 Scatterplots and dot plots |
|
|
135 | (2) |
|
|
|
137 | (2) |
|
6.2.8 Building a graph, step-by-step |
|
|
139 | (5) |
|
|
|
144 | (13) |
|
|
|
144 | (1) |
|
6.3.2 The cabrera dataset |
|
|
145 | (1) |
|
|
|
146 | (2) |
|
|
|
148 | (1) |
|
6.3.5 Multipanel conditioning (faceting) |
|
|
149 | (1) |
|
|
|
150 | (1) |
|
6.3.7 Combining information |
|
|
151 | (1) |
|
6.3.8 Adding a smooth line with error bands |
|
|
151 | (2) |
|
6.3.9 Dealing with cluttered charts |
|
|
153 | (2) |
|
6.3.10 Adding a background image |
|
|
155 | (2) |
|
|
|
157 | (1) |
|
|
|
157 | (4) |
|
7 Balls and Strikes Effects |
|
|
161 | (26) |
|
|
|
161 | (1) |
|
7.2 Hitter's Counts and Pitcher's Counts |
|
|
162 | (11) |
|
|
|
162 | (1) |
|
7.2.2 An example for a single pitcher |
|
|
162 | (3) |
|
7.2.3 Pitch sequences on Retrosheet |
|
|
165 | (1) |
|
7.2.3.1 Functions for string manipulation |
|
|
165 | (2) |
|
7.2.3.2 Finding plate appearances going through a given count |
|
|
167 | (2) |
|
7.2.4 Expected run value by count |
|
|
169 | (1) |
|
7.2.5 The importance of the previous count |
|
|
170 | (3) |
|
|
|
173 | (11) |
|
7.3.1 Swinging tendencies by count |
|
|
173 | (1) |
|
7.3.1.1 Propensity to swing by location |
|
|
173 | (3) |
|
7.3.1.2 Effect of the ball/strike count |
|
|
176 | (2) |
|
7.3.2 Pitch selection by count |
|
|
178 | (3) |
|
7.3.3 Umpires' behavior by count |
|
|
181 | (3) |
|
|
|
184 | (1) |
|
|
|
185 | (2) |
|
|
|
187 | (24) |
|
|
|
187 | (1) |
|
8.2 Mickey Mantle's Batting Trajectory |
|
|
188 | (4) |
|
8.3 Comparing Trajectories |
|
|
192 | (10) |
|
8.3.1 Some preliminary work |
|
|
192 | (2) |
|
8.3.2 Computing career statistics |
|
|
194 | (1) |
|
8.3.3 Computing similarity scores |
|
|
195 | (2) |
|
8.3.4 Defining age, OBP, SLG, and OPS variables |
|
|
197 | (1) |
|
8.3.5 Fitting and plotting trajectories |
|
|
198 | (4) |
|
8.4 General Patterns of Peak Ages |
|
|
202 | (3) |
|
8.4.1 Computing all fitted trajectories |
|
|
202 | (1) |
|
8.4.2 Patterns of peak age over time |
|
|
203 | (1) |
|
8.4.3 Peak age and career at-bats |
|
|
204 | (1) |
|
8.5 Trajectories and Fielding Position |
|
|
205 | (3) |
|
|
|
208 | (1) |
|
|
|
209 | (2) |
|
|
|
211 | (26) |
|
|
|
211 | (1) |
|
9.2 Simulating a Half Inning |
|
|
212 | (11) |
|
|
|
212 | (1) |
|
9.2.2 Review of work in runs expectancy |
|
|
213 | (2) |
|
9.2.3 Computing the transition probabilities |
|
|
215 | (1) |
|
9.2.4 Simulating the Markov chain |
|
|
216 | (3) |
|
9.2.5 Beyond runs expectancy |
|
|
219 | (1) |
|
9.2.6 Transition probabilities for individual teams |
|
|
220 | (3) |
|
9.3 Simulating a Baseball Season |
|
|
223 | (8) |
|
9.3.1 The Bradley-Terry model |
|
|
223 | (1) |
|
9.3.2 Making up a schedule |
|
|
224 | (1) |
|
9.3.3 Simulating talents and computing win probabilities |
|
|
225 | (1) |
|
9.3.4 Simulating the regular season |
|
|
225 | (1) |
|
9.3.5 Simulating the post-season |
|
|
226 | (1) |
|
9.3.6 Function to simulate one season |
|
|
227 | (1) |
|
9.3.7 Simulating many seasons |
|
|
228 | (3) |
|
|
|
231 | (1) |
|
|
|
232 | (5) |
|
10 Exploring Streaky Performances |
|
|
237 | (22) |
|
|
|
237 | (1) |
|
|
|
238 | (4) |
|
10.2.1 Finding game hitting streaks |
|
|
238 | (2) |
|
10.2.2 Moving batting averages |
|
|
240 | (2) |
|
10.3 Streaks in Individual At-Bats |
|
|
242 | (7) |
|
10.3.1 Streaks of hits and outs |
|
|
242 | (1) |
|
10.3.2 Moving batting averages |
|
|
243 | (1) |
|
10.3.3 Finding hitting slumps for all players |
|
|
243 | (3) |
|
10.3.4 Were Suzuki and Ibanez unusually streaky? |
|
|
246 | (3) |
|
10.4 Local Patterns of Weighted On-Base Average |
|
|
249 | (6) |
|
|
|
255 | (2) |
|
|
|
257 | (2) |
|
11 Learning About Park Effects by Database Management Tools |
|
|
259 | (24) |
|
|
|
259 | (1) |
|
11.2 Installing MySQL and Creating a Database |
|
|
260 | (2) |
|
11.3 Connecting R to MySQL |
|
|
262 | (2) |
|
11.3.1 Connecting using package RMySQL |
|
|
262 | (1) |
|
11.3.2 Connecting using Package RODBC |
|
|
263 | (1) |
|
11.4 Filling a MySQL Game Log Database from R |
|
|
264 | (4) |
|
11.4.1 From Retrosheet to R |
|
|
265 | (1) |
|
|
|
265 | (3) |
|
11.5 Querying Data from R |
|
|
268 | (5) |
|
|
|
268 | (3) |
|
11.5.2 Coors Field and run scoring |
|
|
271 | (2) |
|
11.6 Baseball Data as MySQL Dumps |
|
|
273 | (2) |
|
|
|
273 | (1) |
|
11.6.2 Retrosheet database |
|
|
274 | (1) |
|
|
|
274 | (1) |
|
11.7 Calculating Basic Park Factors |
|
|
275 | (4) |
|
11.7.1 Loading the data in R |
|
|
275 | (1) |
|
11.7.2 Home run park factor |
|
|
276 | (1) |
|
11.7.3 Assumptions of the proposed approach |
|
|
277 | (1) |
|
11.7.4 Applying park factors |
|
|
278 | (1) |
|
|
|
279 | (1) |
|
|
|
279 | (4) |
|
12 Exploring Fielding Metrics with Contributed R Packages |
|
|
283 | (18) |
|
|
|
283 | (1) |
|
12.2 A Motivating Example: Comparing Fielding Metrics |
|
|
284 | (10) |
|
|
|
284 | (1) |
|
12.2.2 The fielding metrics |
|
|
285 | (1) |
|
12.2.3 Reading an Excel spreadsheet (XLConnect) |
|
|
286 | (1) |
|
12.2.4 Summarizing multiple columns (doBy) |
|
|
287 | (1) |
|
12.2.5 Finding the most similar string (stringdist) |
|
|
288 | (3) |
|
12.2.6 Applying a function on multiple columns (plyr) |
|
|
291 | (1) |
|
12.2.7 Weighted correlations (weights) |
|
|
291 | (1) |
|
12.2.8 Displaying correlation matrices (ellipse) |
|
|
292 | (1) |
|
12.2.9 Evaluating the fielding metrics (psych) |
|
|
293 | (1) |
|
12.3 Comparing Two Shortstops |
|
|
294 | (3) |
|
12.3.1 Reshaping the data (reshape2) |
|
|
296 | (1) |
|
12.3.2 Plotting the data (ggplot2 and directlabels) |
|
|
296 | (1) |
|
|
|
297 | (1) |
|
|
|
298 | (3) |
|
A Retrosheet Files Reference |
|
|
301 | (10) |
|
A.1 Downloading Play-by-Play Files |
|
|
301 | (3) |
|
|
|
301 | (1) |
|
|
|
302 | (1) |
|
A.1.3 Using a special function for a particular season |
|
|
302 | (1) |
|
A.1.4 Reading the files into R |
|
|
302 | (1) |
|
A.1.5 The function parse retrosheet.pbp |
|
|
302 | (2) |
|
A.2 Retrosheet Event Files: a Short Reference |
|
|
304 | (2) |
|
A.2.1 Game and event identifiers |
|
|
304 | (1) |
|
A.2.2 The state of the game |
|
|
305 | (1) |
|
A.3 Parsing Retrosheet Pitch Sequences |
|
|
306 | (5) |
|
|
|
306 | (1) |
|
|
|
306 | (1) |
|
A.3.3 Evaluating every count |
|
|
307 | (4) |
|
B Accessing and Using MLB AM Gameday and PITCHf/x Data |
|
|
311 | (14) |
|
|
|
311 | (1) |
|
B.2 Where are the Data Stored? |
|
|
312 | (2) |
|
B.3 Suitable Formats for PITCHf/x Data |
|
|
314 | (2) |
|
B.3.1 Obtaining data from on-line resources |
|
|
314 | (1) |
|
|
|
314 | (1) |
|
B.3.2.1 A wrapper function |
|
|
315 | (1) |
|
|
|
316 | (3) |
|
|
|
316 | (1) |
|
|
|
317 | (1) |
|
B.4.3 hip attributes (hit locations data) |
|
|
318 | (1) |
|
B.5 Special Notes About the Gameday and PITCHf/x Data |
|
|
319 | (1) |
|
|
|
320 | (5) |
|
B.6.1 Calculating the pitch trajectory |
|
|
320 | (1) |
|
B.6.2 An R package for getting and visualizing PITCHf/x data: pitchRx |
|
|
321 | (2) |
|
B.6.3 Cross-referencing with other data sources |
|
|
323 | (1) |
|
|
|
323 | (2) |
| Bibliography |
|
325 | (4) |
| Index |
|
329 | |