Muutke küpsiste eelistusi

Analyzing Baseball Data with R [Pehme köide]

(Cleveland Indians, Ohio, USA),
  • Formaat: Paperback / softback, 334 pages, kõrgus x laius: 235x156 mm, kaal: 486 g, 18 Tables, black and white; 50 Illustrations, black and white
  • Sari: Chapman & Hall/CRC: The R Series
  • Ilmumisaeg: 31-Oct-2013
  • Kirjastus: CRC Press Inc
  • ISBN-10: 1466570229
  • ISBN-13: 9781466570221
Teised raamatud teemal:
  • Pehme köide
  • Hind: 55,89 €*
  • * saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
  • See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 334 pages, kõrgus x laius: 235x156 mm, kaal: 486 g, 18 Tables, black and white; 50 Illustrations, black and white
  • Sari: Chapman & Hall/CRC: The R Series
  • Ilmumisaeg: 31-Oct-2013
  • Kirjastus: CRC Press Inc
  • ISBN-10: 1466570229
  • ISBN-13: 9781466570221
Teised raamatud teemal:
With its flexible capabilities and open-source platform, R has become a major tool for analyzing detailed, high-quality baseball data. Analyzing Baseball Data with R provides an introduction to R for sabermetricians, baseball enthusiasts, and students interested in exploring the rich sources of baseball data. It equips readers with the necessary skills and software tools to perform all of the analysis steps, from gathering the datasets and entering them in a convenient format to visualizing the data via graphs to performing a statistical analysis.









The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the traditional graphics functions in the base package and introduce more sophisticated graphical displays available through the lattice and ggplot2 packages. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and fielding measures. Each chapter contains exercises that encourage readers to perform their own analyses using R. All of the datasets and R code used in the text are available online.









This book helps readers answer questions about baseball teams, players, and strategy using large, publically available datasets. It offers detailed instructions on downloading the datasets and putting them into formats that simplify data exploration and analysis. Through the books various examples, readers will learn about modern sabermetrics and be able to conduct their own baseball analyses.

Arvustused

"There are some great resources out there for learning R and for learning how to analyze baseball data with it. In fact, a few pretty smart people wrote a fantastic book on the subject, coincidentally titled Analyzing Baseball Data with R. I cant say enough about this book as a reference, both for baseball analysis and for R. Go and buy it." Bill Petti, The Hardball Times, September 2015



"The authors present a potpourri of well-conceived case-studies that give insight into both the games complexity and Rs simplicity. Virtually no previous knowledge of statistical theory and software is required to master the data analyses and to follow the explications in this book The authors style of writing is pleasurable and bespeaks their passion for the game. Narratives and R commands are so smoothly intermingled that the source code hardly disturbs the flow of reading, and a wealth of graphs break up the grey. A great asset of the book is that it encourages the reader to learn the ropes of sabermetrics by actually running the example analyses on ones own computer." Journal of the Royal Statistical Society, Series A, 2015



"If you are interested in statistics, especially baseball statistics, you will find this book fascinating and very useful. It provides many details. websites, and useful descriptions for using the R programming environment. This is not only a book on statistics; there are many references to famous player statistics, making this a very enjoyable book to read. And even if you dont like baseball but still find statistics very exciting, then this book provides a great introduction to R that can be used for any other type of statistical data set." IEEE Insulation Magazine, November/December 2014



"I have spent most of the past decade working in baseball as a statistical analyst for the New York Mets. This type of employment can be highly valued, especially among quantitatively inclined college students who are coincidentally passionate baseball fans. It is from these students from whom I am most frequently asked, `what book would you recommend for someone who wants to get started in sabermetrics? Invariably, my response has been [ Jim Albert and Jay Bennetts] Curve Ball. I have a new response. I always felt that Curve Ball was the best place for a budding sabermetrician to start However, it later dawned on me that while Curve Ball provided a sound framework for thinking probabilistically about baseball, I devoted a huge proportion of my time at work to computer programming. In their new book, Albert and Max Marchi, a native Italian who now works for the Cleveland Indians, have closed the loop by offering the aspiring sabermetrician a blueprint. The reader who digests this book alongside her keyboard will emerge as a practicing sabermetricianhaving knowledge of the key ideas in sabermetric theory, a historical understanding of from whence those ideas came, and the practical ability to compute with baseball data. It is a sabermetric workshop in paperback." Ben S. Baumer, International Statistical Review (2014), 82

Preface xv
1 The Baseball Datasets
1(28)
1.1 Introduction
2(1)
1.2 The Lahman Database: Season-by-Season Data
2(12)
1.2.1 Bonds, Aaron, and Ruth home run trajectories
2(2)
1.2.2 Obtaining the database
4(1)
1.2.3 The Master table
4(2)
1.2.4 The Batting table
6(2)
1.2.5 The Pitching table
8(3)
1.2.6 The Fielding table
11(1)
1.2.7 The Teams table
11(2)
1.2.8 Baseball questions
13(1)
1.3 Retrosheet Game-by-Game Data
14(4)
1.3.1 The 1998 McGwire and Sosa home run race
14(1)
1.3.2 Retrosheet
14(1)
1.3.3 Game logs
15(1)
1.3.4 Obtaining the game logs from Retrosheet
16(1)
1.3.5 Game log example
16(1)
1.3.6 Baseball questions
16(2)
1.4 Retrosheet Play-by-Play Data
18(3)
1.4.1 Event files
18(1)
1.4.2 Event example
19(1)
1.4.3 Baseball questions
20(1)
1.5 Pitch-by-Pitch Data
21(4)
1.5.1 MLBAM Gameday and PITCHf/x
21(1)
1.5.2 PITCHf/x Example
22(2)
1.5.3 Baseball questions
24(1)
1.6 Summary
25(1)
1.7 Further Reading
26(1)
1.8 Exercises
26(3)
2 Introduction to R
29(30)
2.1 Introduction
30(1)
2.2 Installing R and RStudio
30(1)
2.3 Vectors
31(5)
2.3.1 Career of Warren Spahn
31(1)
2.3.2 Vectors: defining and calculations
32(2)
2.3.3 Vector functions
34(1)
2.3.4 Vector index and logical variables
35(1)
2.4 Objects and Containers in R
36(5)
2.4.1 Character data and matrices
37(1)
2.4.2 Factors
38(2)
2.4.3 Lists
40(1)
2.5 Collection of R Commands
41(2)
2.5.1 R scripts
41(1)
2.5.2 R functions
42(1)
2.6 Reading and Writing Data in R
43(2)
2.6.1 Importing data from a file
43(2)
2.6.2 Saving datasets
45(1)
2.7 Data Frames
45(5)
2.7.1 Introduction
45(2)
2.7.2 Manipulations with data frames
47(2)
2.7.3 Merging and selecting from data frames
49(1)
2.8 Packages
50(1)
2.9 Splitting, Applying, and Combining Data
50(4)
2.9.1 Using sapply
51(1)
2.9.2 Using ddply in the plyr package
52(2)
2.10 Getting Help
54(1)
2.11 Further Reading
55(1)
2.12 Exercises
55(4)
3 Traditional Graphics
59(28)
3.1 Introduction
59(1)
3.2 Factor Variable
60(2)
3.2.1 A bar graph
60(1)
3.2.2 Add axes labels and a title
61(1)
3.2.3 Other graphs of a factor
62(1)
3.3 Saving Graphs
62(1)
3.4 Dot plots
63(2)
3.5 Numeric Variable: Stripchart and Histogram
65(2)
3.6 Two Numeric Variables
67(6)
3.6.1 Scatterplot
67(2)
3.6.2 Building a graph, step-by-step
69(4)
3.7 A Numeric Variable and a Factor Variable
73(3)
3.7.1 Parallel stripcharts
74(1)
3.7.2 Parallel boxplots
74(2)
3.8 Comparing Ruth, Aaron, Bonds, and A-Rod
76(3)
3.8.1 Getting the data
76(2)
3.8.2 Creating the player data frames
78(1)
3.8.3 Constructing the graph
78(1)
3.9 The 1998 Home Run Race
79(3)
3.9.1 Getting the data
79(2)
3.9.2 Extracting the variables
81(1)
3.9.3 Constructing the graph
82(1)
3.10 Further Reading
82(1)
3.11 Exercises
83(4)
4 The Relation Between Runs and Wins
87(18)
4.1 Introduction
87(1)
4.2 The Teams Table in Lahman's Database
88(1)
4.3 Linear Regression
89(4)
4.4 The Pythagorean Formula for Winning Percentage
93(2)
4.5 The Exponent in the Pythagorean Formula
95(1)
4.6 Good and Bad Predictions by the Pythagorean Formula
96(3)
4.7 How Many Runs for a Win?
99(3)
4.8 Further Reading
102(1)
4.9 Exercises
102(3)
5 Value of Plays Using Run Expectancy
105(24)
5.1 The Runs Expectancy Matrix
105(1)
5.2 Runs Scored in the Remainder of the Inning
106(1)
5.3 Creating the Matrix
107(3)
5.4 Measuring Success of a Batting Play
110(1)
5.5 Albert Pujols
111(3)
5.6 Opportunity and Success for All Hitters
114(2)
5.7 Position in the Batting Lineup
116(3)
5.8 Run Values of Different Base Hits
119(4)
5.8.1 Value of a home run
119(2)
5.8.2 Value of a single
121(2)
5.9 Value of Base Stealing
123(3)
5.10 Further Reading and Software
126(1)
5.11 Exercises
126(3)
6 Advanced Graphics
129(32)
6.1 Introduction
129(1)
6.2 The lattice Package
130(14)
6.2.1 Introduction
130(1)
6.2.2 The verlander dataset
130(2)
6.2.3 Basic plotting with lattice
132(1)
6.2.4 Multipanel conditioning
133(1)
6.2.5 Superposing group elements
134(1)
6.2.6 Scatterplots and dot plots
135(2)
6.2.7 The panel function
137(2)
6.2.8 Building a graph, step-by-step
139(5)
6.3 The ggplot2 Package
144(13)
6.3.1 Introduction
144(1)
6.3.2 The cabrera dataset
145(1)
6.3.3 The first layer
146(2)
6.3.4 Grouping factors
148(1)
6.3.5 Multipanel conditioning (faceting)
149(1)
6.3.6 Adding elements
150(1)
6.3.7 Combining information
151(1)
6.3.8 Adding a smooth line with error bands
151(2)
6.3.9 Dealing with cluttered charts
153(2)
6.3.10 Adding a background image
155(2)
6.4 Further Reading
157(1)
6.5 Exercises
157(4)
7 Balls and Strikes Effects
161(26)
7.1 Introduction
161(1)
7.2 Hitter's Counts and Pitcher's Counts
162(11)
7.2.1 Introduction
162(1)
7.2.2 An example for a single pitcher
162(3)
7.2.3 Pitch sequences on Retrosheet
165(1)
7.2.3.1 Functions for string manipulation
165(2)
7.2.3.2 Finding plate appearances going through a given count
167(2)
7.2.4 Expected run value by count
169(1)
7.2.5 The importance of the previous count
170(3)
7.3 Behaviors by Count
173(11)
7.3.1 Swinging tendencies by count
173(1)
7.3.1.1 Propensity to swing by location
173(3)
7.3.1.2 Effect of the ball/strike count
176(2)
7.3.2 Pitch selection by count
178(3)
7.3.3 Umpires' behavior by count
181(3)
7.4 Further Reading
184(1)
7.5 Exercises
185(2)
8 Career Trajectories
187(24)
8.1 Introduction
187(1)
8.2 Mickey Mantle's Batting Trajectory
188(4)
8.3 Comparing Trajectories
192(10)
8.3.1 Some preliminary work
192(2)
8.3.2 Computing career statistics
194(1)
8.3.3 Computing similarity scores
195(2)
8.3.4 Defining age, OBP, SLG, and OPS variables
197(1)
8.3.5 Fitting and plotting trajectories
198(4)
8.4 General Patterns of Peak Ages
202(3)
8.4.1 Computing all fitted trajectories
202(1)
8.4.2 Patterns of peak age over time
203(1)
8.4.3 Peak age and career at-bats
204(1)
8.5 Trajectories and Fielding Position
205(3)
8.6 Further Reading
208(1)
8.7 Exercises
209(2)
9 Simulation
211(26)
9.1 Introduction
211(1)
9.2 Simulating a Half Inning
212(11)
9.2.1 Markov chains
212(1)
9.2.2 Review of work in runs expectancy
213(2)
9.2.3 Computing the transition probabilities
215(1)
9.2.4 Simulating the Markov chain
216(3)
9.2.5 Beyond runs expectancy
219(1)
9.2.6 Transition probabilities for individual teams
220(3)
9.3 Simulating a Baseball Season
223(8)
9.3.1 The Bradley-Terry model
223(1)
9.3.2 Making up a schedule
224(1)
9.3.3 Simulating talents and computing win probabilities
225(1)
9.3.4 Simulating the regular season
225(1)
9.3.5 Simulating the post-season
226(1)
9.3.6 Function to simulate one season
227(1)
9.3.7 Simulating many seasons
228(3)
9.4 Further Reading
231(1)
9.5 Exercises
232(5)
10 Exploring Streaky Performances
237(22)
10.1 Introduction
237(1)
10.2 The Great Streak
238(4)
10.2.1 Finding game hitting streaks
238(2)
10.2.2 Moving batting averages
240(2)
10.3 Streaks in Individual At-Bats
242(7)
10.3.1 Streaks of hits and outs
242(1)
10.3.2 Moving batting averages
243(1)
10.3.3 Finding hitting slumps for all players
243(3)
10.3.4 Were Suzuki and Ibanez unusually streaky?
246(3)
10.4 Local Patterns of Weighted On-Base Average
249(6)
10.5 Further Reading
255(2)
10.6 Exercises
257(2)
11 Learning About Park Effects by Database Management Tools
259(24)
11.1 Introduction
259(1)
11.2 Installing MySQL and Creating a Database
260(2)
11.3 Connecting R to MySQL
262(2)
11.3.1 Connecting using package RMySQL
262(1)
11.3.2 Connecting using Package RODBC
263(1)
11.4 Filling a MySQL Game Log Database from R
264(4)
11.4.1 From Retrosheet to R
265(1)
11.4.2 From R to MySQL
265(3)
11.5 Querying Data from R
268(5)
11.5.1 Introduction
268(3)
11.5.2 Coors Field and run scoring
271(2)
11.6 Baseball Data as MySQL Dumps
273(2)
11.6.1 Lahman's database
273(1)
11.6.2 Retrosheet database
274(1)
11.6.3 PITCHf/x database
274(1)
11.7 Calculating Basic Park Factors
275(4)
11.7.1 Loading the data in R
275(1)
11.7.2 Home run park factor
276(1)
11.7.3 Assumptions of the proposed approach
277(1)
11.7.4 Applying park factors
278(1)
11.8 Further Reading
279(1)
11.9 Exercises
279(4)
12 Exploring Fielding Metrics with Contributed R Packages
283(18)
12.1 Introduction
283(1)
12.2 A Motivating Example: Comparing Fielding Metrics
284(10)
12.2.1 Introduction
284(1)
12.2.2 The fielding metrics
285(1)
12.2.3 Reading an Excel spreadsheet (XLConnect)
286(1)
12.2.4 Summarizing multiple columns (doBy)
287(1)
12.2.5 Finding the most similar string (stringdist)
288(3)
12.2.6 Applying a function on multiple columns (plyr)
291(1)
12.2.7 Weighted correlations (weights)
291(1)
12.2.8 Displaying correlation matrices (ellipse)
292(1)
12.2.9 Evaluating the fielding metrics (psych)
293(1)
12.3 Comparing Two Shortstops
294(3)
12.3.1 Reshaping the data (reshape2)
296(1)
12.3.2 Plotting the data (ggplot2 and directlabels)
296(1)
12.4 Further Reading
297(1)
12.5 Exercises
298(3)
A Retrosheet Files Reference
301(10)
A.1 Downloading Play-by-Play Files
301(3)
A.1.1 Introduction
301(1)
A.1.2 Setup
302(1)
A.1.3 Using a special function for a particular season
302(1)
A.1.4 Reading the files into R
302(1)
A.1.5 The function parse retrosheet.pbp
302(2)
A.2 Retrosheet Event Files: a Short Reference
304(2)
A.2.1 Game and event identifiers
304(1)
A.2.2 The state of the game
305(1)
A.3 Parsing Retrosheet Pitch Sequences
306(5)
A.3.1 Introduction
306(1)
A.3.2 Setup
306(1)
A.3.3 Evaluating every count
307(4)
B Accessing and Using MLB AM Gameday and PITCHf/x Data
311(14)
B.1 Introduction
311(1)
B.2 Where are the Data Stored?
312(2)
B.3 Suitable Formats for PITCHf/x Data
314(2)
B.3.1 Obtaining data from on-line resources
314(1)
B.3.2 Parsing in R
314(1)
B.3.2.1 A wrapper function
315(1)
B.4 Details on the Data
316(3)
B.4.1 atbat attributes
316(1)
B.4.2 pitch attributes
317(1)
B.4.3 hip attributes (hit locations data)
318(1)
B.5 Special Notes About the Gameday and PITCHf/x Data
319(1)
B.6 Miscellanea
320(5)
B.6.1 Calculating the pitch trajectory
320(1)
B.6.2 An R package for getting and visualizing PITCHf/x data: pitchRx
321(2)
B.6.3 Cross-referencing with other data sources
323(1)
B.6.4 Online resources
323(2)
Bibliography 325(4)
Index 329
Max Marchi is a baseball analyst with the Cleveland Indians. He was previously a statistician at the Emilia-Romagna Regional Health Agency. He has been a regular contributor to The Hardball Times and Baseball Prospectus websites and has consulted for MLB clubs.



Jim Albert is a professor of statistics at Bowling Green State University. He has authored or coauthored several books and is the editor of the Journal of Quantitative Analysis of Sports. His interests include Bayesian modeling, statistics education, and the application of statistical thinking in sports.