Tasuta saatmine! | Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Analyzing Baseball Data with R [Pehme köide]

4.24/5 (174 hinnangut Goodreads-ist)

Max Marchi (Cleveland Indians, Ohio, USA), Jim Albert

Formaat: Paperback / softback, 334 pages, kõrgus x laius: 235x156 mm, kaal: 486 g, 18 Tables, black and white; 50 Illustrations, black and white
Sari: Chapman & Hall/CRC: The R Series
Ilmumisaeg: 31-Oct-2013
Kirjastus: CRC Press Inc
ISBN-10: 1466570229
ISBN-13: 9781466570221

Teised raamatud teemal:

Data mining

Pehme köide
Hind: 55,89 €*
* saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Lisa soovinimekirja

Formaat: Paperback / softback, 334 pages, kõrgus x laius: 235x156 mm, kaal: 486 g, 18 Tables, black and white; 50 Illustrations, black and white
Sari: Chapman & Hall/CRC: The R Series
Ilmumisaeg: 31-Oct-2013
Kirjastus: CRC Press Inc
ISBN-10: 1466570229
ISBN-13: 9781466570221

Teised raamatud teemal:

Data mining

Püsilink: https://www.kriso.ee/db/9781466570221.html

Märksõnad:

With its flexible capabilities and open-source platform, R has become a major tool for analyzing detailed, high-quality baseball data. Analyzing Baseball Data with R provides an introduction to R for sabermetricians, baseball enthusiasts, and students interested in exploring the rich sources of baseball data. It equips readers with the necessary skills and software tools to perform all of the analysis steps, from gathering the datasets and entering them in a convenient format to visualizing the data via graphs to performing a statistical analysis.

The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the traditional graphics functions in the base package and introduce more sophisticated graphical displays available through the lattice and ggplot2 packages. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and fielding measures. Each chapter contains exercises that encourage readers to perform their own analyses using R. All of the datasets and R code used in the text are available online.

This book helps readers answer questions about baseball teams, players, and strategy using large, publically available datasets. It offers detailed instructions on downloading the datasets and putting them into formats that simplify data exploration and analysis. Through the books various examples, readers will learn about modern sabermetrics and be able to conduct their own baseball analyses.

Arvustused

"There are some great resources out there for learning R and for learning how to analyze baseball data with it. In fact, a few pretty smart people wrote a fantastic book on the subject, coincidentally titled Analyzing Baseball Data with R. I cant say enough about this book as a reference, both for baseball analysis and for R. Go and buy it." Bill Petti, The Hardball Times, September 2015

"The authors present a potpourri of well-conceived case-studies that give insight into both the games complexity and Rs simplicity. Virtually no previous knowledge of statistical theory and software is required to master the data analyses and to follow the explications in this book The authors style of writing is pleasurable and bespeaks their passion for the game. Narratives and R commands are so smoothly intermingled that the source code hardly disturbs the flow of reading, and a wealth of graphs break up the grey. A great asset of the book is that it encourages the reader to learn the ropes of sabermetrics by actually running the example analyses on ones own computer." Journal of the Royal Statistical Society, Series A, 2015

"If you are interested in statistics, especially baseball statistics, you will find this book fascinating and very useful. It provides many details. websites, and useful descriptions for using the R programming environment. This is not only a book on statistics; there are many references to famous player statistics, making this a very enjoyable book to read. And even if you dont like baseball but still find statistics very exciting, then this book provides a great introduction to R that can be used for any other type of statistical data set." IEEE Insulation Magazine, November/December 2014

"I have spent most of the past decade working in baseball as a statistical analyst for the New York Mets. This type of employment can be highly valued, especially among quantitatively inclined college students who are coincidentally passionate baseball fans. It is from these students from whom I am most frequently asked, `what book would you recommend for someone who wants to get started in sabermetrics? Invariably, my response has been [ Jim Albert and Jay Bennetts] Curve Ball. I have a new response. I always felt that Curve Ball was the best place for a budding sabermetrician to start However, it later dawned on me that while Curve Ball provided a sound framework for thinking probabilistically about baseball, I devoted a huge proportion of my time at work to computer programming. In their new book, Albert and Max Marchi, a native Italian who now works for the Cleveland Indians, have closed the loop by offering the aspiring sabermetrician a blueprint. The reader who digests this book alongside her keyboard will emerge as a practicing sabermetricianhaving knowledge of the key ideas in sabermetric theory, a historical understanding of from whence those ideas came, and the practical ability to compute with baseball data. It is a sabermetric workshop in paperback." Ben S. Baumer, International Statistical Review (2014), 82

Preface

1 The Baseball Datasets

(28)

1.1 Introduction

(1)

1.2 The Lahman Database: Season-by-Season Data

(12)

1.2.1 Bonds, Aaron, and Ruth home run trajectories

(2)

1.2.2 Obtaining the database

(1)

1.2.3 The Master table

(2)

1.2.4 The Batting table

(2)

1.2.5 The Pitching table

(3)

1.2.6 The Fielding table

(1)

1.2.7 The Teams table

(2)

1.2.8 Baseball questions

(1)

1.3 Retrosheet Game-by-Game Data

(4)

1.3.1 The 1998 McGwire and Sosa home run race

(1)

1.3.2 Retrosheet

(1)

1.3.3 Game logs

(1)

1.3.4 Obtaining the game logs from Retrosheet

(1)

1.3.5 Game log example

(1)

1.3.6 Baseball questions

(2)

1.4 Retrosheet Play-by-Play Data

(3)

1.4.1 Event files

(1)

1.4.2 Event example

(1)

1.4.3 Baseball questions

(1)

1.5 Pitch-by-Pitch Data

(4)

1.5.1 MLBAM Gameday and PITCHf/x

(1)

1.5.2 PITCHf/x Example

(2)

1.5.3 Baseball questions

(1)

1.6 Summary

(1)

1.7 Further Reading

(1)

1.8 Exercises

(3)

2 Introduction to R

(30)

2.1 Introduction

(1)

2.2 Installing R and RStudio

(1)

2.3 Vectors

(5)

2.3.1 Career of Warren Spahn

(1)

2.3.2 Vectors: defining and calculations

(2)

2.3.3 Vector functions

(1)

2.3.4 Vector index and logical variables

(1)

2.4 Objects and Containers in R

(5)

2.4.1 Character data and matrices

(1)

2.4.2 Factors

(2)

2.4.3 Lists

(1)

2.5 Collection of R Commands

(2)

2.5.1 R scripts

(1)

2.5.2 R functions

(1)

2.6 Reading and Writing Data in R

(2)

2.6.1 Importing data from a file

(2)

2.6.2 Saving datasets

(1)

2.7 Data Frames

(5)

2.7.1 Introduction

(2)

2.7.2 Manipulations with data frames

(2)

2.7.3 Merging and selecting from data frames

(1)

2.8 Packages

(1)

2.9 Splitting, Applying, and Combining Data

(4)

2.9.1 Using sapply

(1)

2.9.2 Using ddply in the plyr package

(2)

2.10 Getting Help

(1)

2.11 Further Reading

(1)

2.12 Exercises

(4)

3 Traditional Graphics

(28)

3.1 Introduction

(1)

3.2 Factor Variable

(2)

3.2.1 A bar graph

(1)

3.2.2 Add axes labels and a title

(1)

3.2.3 Other graphs of a factor

(1)

3.3 Saving Graphs

(1)

3.4 Dot plots

(2)

3.5 Numeric Variable: Stripchart and Histogram

(2)

3.6 Two Numeric Variables

(6)

3.6.1 Scatterplot

(2)

3.6.2 Building a graph, step-by-step

(4)

3.7 A Numeric Variable and a Factor Variable

(3)

3.7.1 Parallel stripcharts

(1)

3.7.2 Parallel boxplots

(2)

3.8 Comparing Ruth, Aaron, Bonds, and A-Rod

(3)

3.8.1 Getting the data

(2)

3.8.2 Creating the player data frames

(1)

3.8.3 Constructing the graph

(1)

3.9 The 1998 Home Run Race

(3)

3.9.1 Getting the data

(2)

3.9.2 Extracting the variables

(1)

3.9.3 Constructing the graph

(1)

3.10 Further Reading

(1)

3.11 Exercises

(4)

4 The Relation Between Runs and Wins

(18)

4.1 Introduction

(1)

4.2 The Teams Table in Lahman's Database

(1)

4.3 Linear Regression

(4)

4.4 The Pythagorean Formula for Winning Percentage

(2)

4.5 The Exponent in the Pythagorean Formula

(1)

4.6 Good and Bad Predictions by the Pythagorean Formula

(3)

4.7 How Many Runs for a Win?

(3)

4.8 Further Reading

102

(1)

4.9 Exercises

102

(3)

5 Value of Plays Using Run Expectancy

105

(24)

5.1 The Runs Expectancy Matrix

105

(1)

5.2 Runs Scored in the Remainder of the Inning

106

(1)

5.3 Creating the Matrix

107

(3)

5.4 Measuring Success of a Batting Play

110

(1)

5.5 Albert Pujols

111

(3)

5.6 Opportunity and Success for All Hitters

114

(2)

5.7 Position in the Batting Lineup

116

(3)

5.8 Run Values of Different Base Hits

119

(4)

5.8.1 Value of a home run

119

(2)

5.8.2 Value of a single

121

(2)

5.9 Value of Base Stealing

123

(3)

5.10 Further Reading and Software

126

(1)

5.11 Exercises

126

(3)

6 Advanced Graphics

129

(32)

6.1 Introduction

129

(1)

6.2 The lattice Package

130

(14)

6.2.1 Introduction

130

(1)

6.2.2 The verlander dataset

130

(2)

6.2.3 Basic plotting with lattice

132

(1)

6.2.4 Multipanel conditioning

133

(1)

6.2.5 Superposing group elements

134

(1)

6.2.6 Scatterplots and dot plots

135

(2)

6.2.7 The panel function

137

(2)

6.2.8 Building a graph, step-by-step

139

(5)

6.3 The ggplot2 Package

144

(13)

6.3.1 Introduction

144

(1)

6.3.2 The cabrera dataset

145

(1)

6.3.3 The first layer

146

(2)

6.3.4 Grouping factors

148

(1)

6.3.5 Multipanel conditioning (faceting)

149

(1)

6.3.6 Adding elements

150

(1)

6.3.7 Combining information

151

(1)

6.3.8 Adding a smooth line with error bands

151

(2)

6.3.9 Dealing with cluttered charts

153

(2)

6.3.10 Adding a background image

155

(2)

6.4 Further Reading

157

(1)

6.5 Exercises

157

(4)

7 Balls and Strikes Effects

161

(26)

7.1 Introduction

161

(1)

7.2 Hitter's Counts and Pitcher's Counts

162

(11)

7.2.1 Introduction

162

(1)

7.2.2 An example for a single pitcher

162

(3)

7.2.3 Pitch sequences on Retrosheet

165

(1)

7.2.3.1 Functions for string manipulation

165

(2)

7.2.3.2 Finding plate appearances going through a given count

167

(2)

7.2.4 Expected run value by count

169

(1)

7.2.5 The importance of the previous count

170

(3)

7.3 Behaviors by Count

173

(11)

7.3.1 Swinging tendencies by count

173

(1)

7.3.1.1 Propensity to swing by location

173

(3)

7.3.1.2 Effect of the ball/strike count

176

(2)

7.3.2 Pitch selection by count

178

(3)

7.3.3 Umpires' behavior by count

181

(3)

7.4 Further Reading

184

(1)

7.5 Exercises

185

(2)

8 Career Trajectories

187

(24)

8.1 Introduction

187

(1)

8.2 Mickey Mantle's Batting Trajectory

188

(4)

8.3 Comparing Trajectories

192

(10)

8.3.1 Some preliminary work

192

(2)

8.3.2 Computing career statistics

194

(1)

8.3.3 Computing similarity scores

195

(2)

8.3.4 Defining age, OBP, SLG, and OPS variables

197

(1)

8.3.5 Fitting and plotting trajectories

198

(4)

8.4 General Patterns of Peak Ages

202

(3)

8.4.1 Computing all fitted trajectories

202

(1)

8.4.2 Patterns of peak age over time

203

(1)

8.4.3 Peak age and career at-bats

204

(1)

8.5 Trajectories and Fielding Position

205

(3)

8.6 Further Reading

208

(1)

8.7 Exercises

209

(2)

9 Simulation

211

(26)

9.1 Introduction

211

(1)

9.2 Simulating a Half Inning

212

(11)

9.2.1 Markov chains

212

(1)

9.2.2 Review of work in runs expectancy

213

(2)

9.2.3 Computing the transition probabilities

215

(1)

9.2.4 Simulating the Markov chain

216

(3)

9.2.5 Beyond runs expectancy

219

(1)

9.2.6 Transition probabilities for individual teams

220

(3)

9.3 Simulating a Baseball Season

223

(8)

9.3.1 The Bradley-Terry model

223

(1)

9.3.2 Making up a schedule

224

(1)

9.3.3 Simulating talents and computing win probabilities

225

(1)

9.3.4 Simulating the regular season

225

(1)

9.3.5 Simulating the post-season

226

(1)

9.3.6 Function to simulate one season

227

(1)

9.3.7 Simulating many seasons

228

(3)

9.4 Further Reading

231

(1)

9.5 Exercises

232

(5)

10 Exploring Streaky Performances

237

(22)

10.1 Introduction

237

(1)

10.2 The Great Streak

238

(4)

10.2.1 Finding game hitting streaks

238

(2)

10.2.2 Moving batting averages

240

(2)

10.3 Streaks in Individual At-Bats

242

(7)

10.3.1 Streaks of hits and outs

242

(1)

10.3.2 Moving batting averages

243

(1)

10.3.3 Finding hitting slumps for all players

243

(3)

10.3.4 Were Suzuki and Ibanez unusually streaky?

246

(3)

10.4 Local Patterns of Weighted On-Base Average

249

(6)

10.5 Further Reading

255

(2)

10.6 Exercises

257

(2)

11 Learning About Park Effects by Database Management Tools

259

(24)

11.1 Introduction

259

(1)

11.2 Installing MySQL and Creating a Database

260

(2)

11.3 Connecting R to MySQL

262

(2)

11.3.1 Connecting using package RMySQL

262

(1)

11.3.2 Connecting using Package RODBC

263

(1)

11.4 Filling a MySQL Game Log Database from R

264

(4)

11.4.1 From Retrosheet to R

265

(1)

11.4.2 From R to MySQL

265

(3)

11.5 Querying Data from R

268

(5)

11.5.1 Introduction

268

(3)

11.5.2 Coors Field and run scoring

271

(2)

11.6 Baseball Data as MySQL Dumps

273

(2)

11.6.1 Lahman's database

273

(1)

11.6.2 Retrosheet database

274

(1)

11.6.3 PITCHf/x database

274

(1)

11.7 Calculating Basic Park Factors

275

(4)

11.7.1 Loading the data in R

275

(1)

11.7.2 Home run park factor

276

(1)

11.7.3 Assumptions of the proposed approach

277

(1)

11.7.4 Applying park factors

278

(1)

11.8 Further Reading

279

(1)

11.9 Exercises

279

(4)

12 Exploring Fielding Metrics with Contributed R Packages

283

(18)

12.1 Introduction

283

(1)

12.2 A Motivating Example: Comparing Fielding Metrics

284

(10)

12.2.1 Introduction

284

(1)

12.2.2 The fielding metrics

285

(1)

12.2.3 Reading an Excel spreadsheet (XLConnect)

286

(1)

12.2.4 Summarizing multiple columns (doBy)

287

(1)

12.2.5 Finding the most similar string (stringdist)

288

(3)

12.2.6 Applying a function on multiple columns (plyr)

291

(1)

12.2.7 Weighted correlations (weights)

291

(1)

12.2.8 Displaying correlation matrices (ellipse)

292

(1)

12.2.9 Evaluating the fielding metrics (psych)

293

(1)

12.3 Comparing Two Shortstops

294

(3)

12.3.1 Reshaping the data (reshape2)

296

(1)

12.3.2 Plotting the data (ggplot2 and directlabels)

296

(1)

12.4 Further Reading

297

(1)

12.5 Exercises

298

(3)

A Retrosheet Files Reference

301

(10)

A.1 Downloading Play-by-Play Files

301

(3)

A.1.1 Introduction

301

(1)

A.1.2 Setup

302

(1)

A.1.3 Using a special function for a particular season

302

(1)

A.1.4 Reading the files into R

302

(1)

A.1.5 The function parse retrosheet.pbp

302

(2)

A.2 Retrosheet Event Files: a Short Reference

304

(2)

A.2.1 Game and event identifiers

304

(1)

A.2.2 The state of the game

305

(1)

A.3 Parsing Retrosheet Pitch Sequences

306

(5)

A.3.1 Introduction

306

(1)

A.3.2 Setup

306

(1)

A.3.3 Evaluating every count

307

(4)

B Accessing and Using MLB AM Gameday and PITCHf/x Data

311

(14)

B.1 Introduction

311

(1)

B.2 Where are the Data Stored?

312

(2)

B.3 Suitable Formats for PITCHf/x Data

314

(2)

B.3.1 Obtaining data from on-line resources

314

(1)

B.3.2 Parsing in R

314

(1)

B.3.2.1 A wrapper function

315

(1)

B.4 Details on the Data

316

(3)

B.4.1 atbat attributes

316

(1)

B.4.2 pitch attributes

317

(1)

B.4.3 hip attributes (hit locations data)

318

(1)

B.5 Special Notes About the Gameday and PITCHf/x Data

319

(1)

B.6 Miscellanea

320

(5)

B.6.1 Calculating the pitch trajectory

320

(1)

B.6.2 An R package for getting and visualizing PITCHf/x data: pitchRx

321

(2)

B.6.3 Cross-referencing with other data sources

323

(1)

B.6.4 Online resources

323

(2)

Bibliography

325

(4)

Index

329

Max Marchi is a baseball analyst with the Cleveland Indians. He was previously a statistician at the Emilia-Romagna Regional Health Agency. He has been a regular contributor to The Hardball Times and Baseball Prospectus websites and has consulted for MLB clubs.

Jim Albert is a professor of statistics at Bowling Green State University. He has authored or coauthored several books and is the editor of the Journal of Quantitative Analysis of Sports. His interests include Bayesian modeling, statistics education, and the application of statistical thinking in sports.

Analyzing Baseball Data with R [Pehme köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv