Muutke küpsiste eelistusi

Analyzing Baseball Data with R, Second Edition 2nd edition [Kõva köide]

, (Cleveland Indians, Ohio, USA), (Smith College, Northhampton, MA)
  • Formaat: Hardback, 342 pages, kõrgus x laius: 234x156 mm, kaal: 839 g
  • Sari: Chapman & Hall/CRC The R Series
  • Ilmumisaeg: 22-Nov-2018
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367024861
  • ISBN-13: 9780367024864
Teised raamatud teemal:
  • Kõva köide
  • Hind: 172,00 €*
  • * saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
  • See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Lisa soovinimekirja
  • Formaat: Hardback, 342 pages, kõrgus x laius: 234x156 mm, kaal: 839 g
  • Sari: Chapman & Hall/CRC The R Series
  • Ilmumisaeg: 22-Nov-2018
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 0367024861
  • ISBN-13: 9780367024864
Teised raamatud teemal:
Analyzing Baseball Data with R Second Edition introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the ggplot2 graphics functions and employ a tidyverse-friendly workflow throughout. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, catcher framing, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and launch angles and exit velocities. All the datasets and R code used in the text are available online.New to the second edition are a systematic adoption of the tidyverse and incorporation of Statcast player tracking data (made available by Baseball Savant). All code from the first edition has been revised according to the principles of the tidyverse. Tidyverse packages, including dplyr, ggplot2, tidyr, purrr, and broom are emphasized throughout the book. Two entirely new chapters are made possible by the availability of Statcast data: one explores the notion of catcher framing ability, and the other uses launch angle and exit velocity to estimate the probability of a home run. Through the book’s various examples, you will learn about modern sabermetrics and how to conduct your own baseball analyses.Max Marchi is a Baseball Analytics Analyst for the Cleveland Indians. He was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.Jim Albert is a Distinguished University Professor of statistics at Bowling Green State University. He has authored or coauthored several books including Curve Ball and Visualizing Baseball and was the editor of the Journal of Quantitative Analysis of Sports.Ben Baumer is an assistant professor of statistical & data sciences at Smith College. Previously a statistical analyst for the New York Mets, he is a co-author of The Sabermetric Revolution and Modern Data Science with R.

Arvustused

"Overall, the book meets its main aim of teaching the reader to analyze real data using R. It is well suited to baseball fans, who have a solid statistical background, and want to learn R or modernize their style of R programming. Baseball fans with a more basic statistical education will also learn from this book . . ." ~Tim Downie, Journal of Statistical Software

Preface xv
Chapter 1 The Baseball Datasets 1(32)
1.1 Introduction
1(1)
1.2 The Lahman Database: Season-by-Season Data
1(13)
1.2.1 Bonds, Aaron, Ruth, and Rodriguez home run trajectories
1(2)
1.2.2 Obtaining the database
3(2)
1.2.3 The Master table
5(1)
1.2.4 The Batting table
6(1)
1.2.5 The Pitching table
7(4)
1.2.6 The Fielding table
11(2)
1.2.7 The Teams table
13(1)
1.2.8 Baseball questions
13(1)
1.3 Retrosheet Game-by-Game Data
14(3)
1.3.1 The 1998 McGwire and Sosa home run race
14(1)
1.3.2 Retrosheet
15(1)
1.3.3 Game logs
15(1)
1.3.4 Obtaining the game logs from Retrosheet
16(1)
1.3.5 Game log example
16(1)
1.3.6 Baseball questions
17(1)
1.4 Retrosheet Play-by-Play Data
17(4)
1.4.1 Event files
17(2)
1.4.2 Event example
19(2)
1.4.3 Baseball questions
21(1)
1.5 Pitch-by-Pitch Data
21(4)
1.5.1 MLBAM Gameday and PITCHf/x
21(1)
1.5.2 PITCHf/x Example
22(2)
1.5.3 Baseball questions
24(1)
1.6 Player Movement and Off-the-Bat Data
25(2)
1.6.1 Statcast
25(1)
1.6.2 Baseball Savant data
25(2)
1.6.3 Baseball questions
27(1)
1.7 Summary
27(1)
1.8 Further Reading
28(1)
1.9 Exercises
28(5)
Chapter 2 Introduction to R 33(34)
2.1 Introduction
33(1)
2.2 Installing R and RStudio
33(1)
2.3 The Tidyverse
34(2)
2.3.1 dplyr
35(1)
2.3.2 The pipe
35(1)
2.3.3 ggplot2
36(1)
2.3.4 Other packages
36(1)
2.4 Data Frames
36(6)
2.4.1 Career of Warren Spahn
36(1)
2.4.2 Introduction
37(2)
2.4.3 Manipulations with data frames
39(2)
2.4.4 Merging and selecting from data frames
41(1)
2.5 Vectors
42(4)
2.5.1 Defining and computing with vectors
42(1)
2.5.2 Vector functions
43(2)
2.5.3 Vector index and logical variables
45(1)
2.6 Objects and Containers in R
46(6)
2.6.1 Character data and data frames
47(2)
2.6.2 Factors
49(2)
2.6.3 Lists
51(1)
2.7 Collection of R Commands
52(3)
2.7.1 R scripts
52(1)
2.7.2 R functions
53(2)
2.8 Reading and Writing Data in R
55(2)
2.8.1 Importing data from a file
55(1)
2.8.2 Saving datasets
56(1)
2.9 Packages
57(1)
2.10 Splitting, Applying, and Combining Data
58(4)
2.10.1 Iterating using map()
59(1)
2.10.2 Another example
60(2)
2.11 Getting Help
62(1)
2.12 Further Reading
63(1)
2.13 Exercises
63(4)
Chapter 3 Graphics 67(26)
3.1 Introduction
67(1)
3.2 Character Variable
67(3)
3.2.1 A bar graph
67(2)
3.2.2 Add axes labels and a title
69(1)
3.2.3 Other graphs of a character variable
69(1)
3.3 Saving Graphs
70(2)
3.4 Numeric Variable: One-Dimensional Scatterplot and Histogram
72(3)
3.5 Two Numeric Variables
75(6)
3.5.1 Scatterplot
75(1)
3.5.2 Building a graph, step-by-step
76(5)
3.6 A Numeric Variable and a Factor Variable
81(2)
3.6.1 Parallel stripcharts
81(1)
3.6.2 Parallel boxplots
81(2)
3.7 Comparing Ruth, Aaron, Bonds, and A-Rod
83(2)
3.7.1 Getting the data
83(1)
3.7.2 Creating the player data frames
84(1)
3.7.3 Constructing the graph
85(1)
3.8 The 1998 Home Run Race
85(3)
3.8.1 Getting the data
86(1)
3.8.2 Extracting the variables
86(1)
3.8.3 Constructing the graph
87(1)
3.9 Further Reading
88(1)
3.10 Exercises
88(5)
Chapter 4 The Relation Between Runs and Wins 93(18)
4.1 Introduction
93(1)
4.2 The Teams Table in the Lahman Database
93(2)
4.3 Linear Regression
95(4)
4.4 The Pythagorean Formula for Winning Percentage
99(7)
4.4.1 The Exponent in the Pythagorean model
101(1)
4.4.2 Good and bad predictions by the Pythagorean model
102(4)
4.5 How Many Runs for a Win?
106(2)
4.6 Further Reading
108(1)
4.7 Exercises
109(2)
Chapter 5 Value of Plays Using Run Expectancy 111(26)
5.1 The Run Expectancy Matrix
111(1)
5.2 Runs Scored in the Remainder of the Inning
112(1)
5.3 Creating the Matrix
113(3)
5.4 Measuring Success of a Batting Play
116(1)
5.5 Jose Altuve
117(3)
5.6 Opportunity and Success for All Hitters
120(3)
5.7 Position in the Batting Lineup
123(2)
5.8 Run Values of Different Base Hits
125(6)
5.8.1 Value of a home run
126(2)
5.8.2 Value of a single
128(3)
5.9 Value of Base Stealing
131(3)
5.10 Further Reading and Software
134(1)
5.11 Exercises
135(2)
Chapter 6 Balls and Strikes Effects 137(26)
6.1 Introduction
137(1)
6.2 Hitter's Counts and Pitcher's Counts
137(12)
6.2.1 An example for a single pitcher
137(2)
6.2.2 Pitch sequences from Retrosheet
139(5)
6.2.2.1 Functions for string manipulation
141(1)
6.2.2.2 Finding plate appearances going through a given count
142(2)
6.2.3 Expected run value by count
144(4)
6.2.4 The importance of the previous count
148(1)
6.3 Behaviors by Count
149(11)
6.3.1 Swinging tendencies by count
150(6)
6.3.1.1 Propensity to swing by location
150(3)
6.3.1.2 Effect of the ball/strike count
153(3)
6.3.2 Pitch selection by count
156(2)
6.3.3 Umpires' behavior by count
158(2)
6.4 Further Reading
160(1)
6.5 Exercises
161(2)
Chapter 7 Catcher Framing 163(16)
7.1 Introduction
163(1)
7.2 Acquiring Pitch-Level Data
164(1)
7.3 Where Is the Strike Zone?
165(2)
7.4 Modeling Called Strike Percentage
167(5)
7.4.1 Visualizing the estimates
168(1)
7.4.2 Visualizing the estimated surface
169(1)
7.4.3 Controlling for handedness
169(3)
7.5 Modeling Catcher Framing
172(4)
7.6 Further Reading
176(1)
7.7 Exercises
176(3)
Chapter 8 Career Trajectories 179(22)
8.1 Introduction
179(1)
8.2 Mickey Mantle's Batting Trajectory
180(5)
8.3 Comparing Trajectories
185(8)
8.3.1 Some preliminary work
185(1)
8.3.2 Computing career statistics
186(1)
8.3.3 Computing similarity scores
187(1)
8.3.4 Defining age, OBP, SLG, and OPS variables
188(1)
8.3.5 Fitting and plotting trajectories
189(4)
8.4 General Patterns of Peak Ages
193(4)
8.4.1 Computing all fitted trajectories
193(2)
8.4.2 Patterns of peak age over time
195(1)
8.4.3 Peak age and career at-bats
195(2)
8.5 Trajectories and Fielding Position
197(1)
8.6 Further Reading
198(1)
8.7 Exercises
199(2)
Chapter 9 Simulation 201(28)
9.1 Introduction
201(1)
9.2 Simulating a Half Inning
202(13)
9.2.1 Markov chains
202(1)
9.2.2 Review of work in run expectancy
203(1)
9.2.3 Computing the transition probabilities
204(2)
9.2.4 Simulating the Markov chain
206(4)
9.2.5 Beyond run expectancy
210(2)
9.2.6 Transition probabilities for individual teams
212(3)
9.3 Simulating a Baseball Season
215(11)
9.3.1 The Bradley-Terry model
215(1)
9.3.2 Making up a schedule
216(1)
9.3.3 Simulating talents and computing win probabilities
217(1)
9.3.4 Simulating the regular season
218(1)
9.3.5 Simulating the post-season
218(2)
9.3.6 Function to simulate one season
220(2)
9.3.7 Simulating many seasons
222(4)
9.4 Further Reading
226(1)
9.5 Exercises
226(3)
Chapter 10 Exploring Streaky Performances 229(22)
10.1 Introduction
229(1)
10.2 The Great Streak
230(4)
10.2.1 Finding game hitting streaks
230(2)
10.2.2 Moving batting averages
232(2)
10.3 Streaks in Individual At-Bats
234(8)
10.3.1 Streaks of hits and outs
234(2)
10.3.2 Moving batting averages
236(1)
10.3.3 Finding hitting slumps for all players
236(3)
10.3.4 Were Ichiro Suzuki and Mike Trout unusually streaky?
239(3)
10.4 Local Patterns of Statcast Launch Velocity
242(5)
10.5 Further Reading
247(1)
10.6 Exercises
248(3)
Chapter 11 Using a Database to Compute Park Factors 251(18)
11.1 Introduction
251(1)
11.2 Installing MySQL and Creating a Database
252(1)
11.3 Connecting R to MySQL
253(1)
11.3.1 Connecting using RMySQL
253(1)
11.3.2 Connecting R to other SQL backends
253(1)
11.4 Filling a MySQL Game Log Database from R
254(2)
11.4.1 From Retrosheet to R
254(1)
11.4.2 From R to MySQL
255(1)
11.5 Querying Data from R
256(4)
11.5.1 Introduction
256(2)
11.5.2 Coors Field and run scoring
258(2)
11.6 Building Your Own Baseball Database
260(2)
11.6.1 Lahman's database
260(1)
11.6.2 Retrosheet database
261(1)
11.6.3 PITCHf/x database
261(1)
11.6.4 Statcast database
261(1)
11.7 Calculating Basic Park Factors
262(5)
11.7.1 Loading the data into R
262(1)
11.7.2 Home run park factor
263(2)
11.7.3 Assumptions of the proposed approach
265(1)
11.7.4 Applying park factors
265(2)
11.8 Further Reading
267(1)
11.9 Exercises
267(2)
Chapter 12 Batted Ball Data from Statcast 269(24)
12.1 Introduction
269(1)
12.2 Spray Charts
269(5)
12.2.1 Acquiring a year's worth of Statcast data
271(2)
12.2.2 Hitters' spray tendencies and infield defense
273(1)
12.3 Launch Angles and Exit Velocities
274(2)
12.3.1 Scatterplot of launch angle vs. exit velocity
275(1)
12.4 Modeling Home Run Probabilities
276(6)
12.4.1 Generalized additive model
278(1)
12.4.2 Smooth predictions
279(1)
12.4.3 Using this model to estimate 2017 home run production
280(2)
12.5 Are Launch Angles Skills?
282(8)
12.5.1 Distribution of launch angle
282(4)
12.5.2 Half-season correlation of launch angle
286(4)
12.6 Further Reading
290(1)
12.7 Exercises
290(3)
Appendix A Retrosheet Files Reference 293(10)
A.1 Downloading Play-by-Play Files
293(3)
A.1.1 Introduction
293(1)
A.1.2 Setup
293(1)
A.1.3 Using a special function for a particular season
293(1)
A.1.4 Reading the files into R
294(1)
A.1.5 The function parse_retrosheet_pbp()
294(2)
A.2 Retrosheet Event Files: a Short Reference
296(3)
A.2.1 Game and event identifiers
297(1)
A.2.2 The state of the game
297(2)
A.3 Parsing Retrosheet Pitch Sequences
299(4)
A.3.1 Introduction
299(1)
A.3.2 Setup
299(1)
A.3.3 Evaluating every count
300(3)
Appendix B Accessing and Using MLBAM Gameday and PITCHf/x Data 303(14)
B.1 Introduction
303(1)
B.2 Where Are the Data Stored?
303(3)
B.3 Working with PITCHf/x Data
306(3)
B.3.1 Obtaining data from on-line resources
306(1)
B.3.2 Parsing in R
306(1)
B.3.3 Flattening XML
307(1)
B.3.4 An R package for PITCHf/x data: pit chRx
308(1)
B.4 Details on the Data
309(3)
B.4.1 at bat attributes
309(1)
B.4.2 pitch attributes
310(2)
B.4.3 hip attributes (hit locations data)
312(1)
B.5 Notes About the Gameday and PITCHf/x Data
312(1)
B.6 Miscellanea
313(4)
B.6.1 Calculating the pitch trajectory
313(1)
B.6.2 Cross-referencing with other data sources
314(1)
B.6.3 Online resources
314(3)
Appendix C Accessing and Using Statcast Data from Baseball-Savant 317(6)
C.1 Introduction
317(1)
C.2 Game Situation Variables
317(1)
C.3 Pitch Variables
318(1)
C.4 Play Event Variables
318(1)
C.5 Batted Ball Variables
319(1)
C.6 Derived Variables
320(1)
C.7 Defense Variables
320(3)
References 323(10)
Index 333
Max Marchi is a Baseball Analytics Analyst for the Cleveland Indians. He was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.

Jim Albert is a Distinguished University Professor of statistics at Bowling Green State University. He has authored or coauthored several books including Curve Ball and Visualizing Baseball and was the editor of the Journal of Quantitative Analysis of Sports.

Ben Baumer is an assistant professor of statistical & data sciences at Smith College. Previously a statistical analyst for the New York Mets, he is a co-author of The Sabermetric Revolution and Modern Data Science with R.