Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Statistics for Data Scientists: An Introduction to Probability, Statistics, and Data Analysis

3.00/5 (2 hinnangut Goodreads-ist)

Maurits Kaptein, Edwin van den Heuvel

Formaat: EPUB+DRM
Sari: Undergraduate Topics in Computer Science
Ilmumisaeg: 02-Feb-2022
Kirjastus: Springer Nature Switzerland AG
Keel: eng
ISBN-13: 9783030105310

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 49,39 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: EPUB+DRM
Sari: Undergraduate Topics in Computer Science
Ilmumisaeg: 02-Feb-2022
Kirjastus: Springer Nature Switzerland AG
Keel: eng
ISBN-13: 9783030105310

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This book provides an undergraduate introduction to analysing data for data science, computer science, and quantitative social science students. It uniquely combines a hands-on approach to data analysis – supported by numerous real data examples and reusable [ R] code – with a rigorous treatment of probability and statistical principles.

Where contemporary undergraduate textbooks in probability theory or statistics often miss applications and an introductory treatment of modern methods (bootstrapping, Bayes, etc.), and where applied data analysis books often miss a rigorous theoretical treatment, this book provides an accessible but thorough introduction into data analysis, using statistical methods combining the two viewpoints. The book further focuses on methods for dealing with large data-sets and streaming-data and hence provides a single-course introduction of statistical methods for data science.

Arvustused

Having taught data analytics at the introductory graduate level, I welcome the authors textbook as an essential resource for training well-grounded entry-level data scientists. A data scientist shall provide competent data science professional services to a client. Training in both the theory and practice of data analytics is a requirement for such competence. The authors textbook definitely provides a valuable resource for such training. (Harry J. Foxwell, Computing Reviews, July 7, 2022)

1 A First Look at Data

(38)

1.1 Overview and Learning Goals

(1)

1.2 Getting Started with R

(8)

1.2.1 Opening a Dataset: face-data. csv

(4)

1.2.2 Some Useful Commands for Exploring a Dataset

(2)

1.2.3 Scalars, Vectors, Matrices, Data.frames, Objects

(2)

1.3 Measurement Levels

(3)

1.3.1 Outliers and Unrealistic Values

(2)

1.4 Describing Data

(7)

1.4.1 Frequency

(1)

1.4.2 Central Tendency

(3)

1.4.3 Dispersion, Skewness, and Kurtosis

(2)

1.4.4 A Note on Aggregated Data

(1)

1.5 Visualizing Data

(11)

1.5.1 Describing Nominal/ordinal Variables

(2)

1.5.2 Describing Interval/ratio Variables

(2)

1.5.3 Relations Between Variables

(1)

1.5.4 Multi-panel Plots

(1)

1.5.5 Plotting Mathematical Functions

(3)

1.5.6 Frequently Used Arguments

(1)

1.6 Other R Plotting Systems (And Installing Packages)

(8)

1.6.1 Lattice

(1)

1.6.2 GGplot2

(1)

Problems

(4)

References

(2)

2 Sampling Plans and Estimates

(42)

2.1 Introduction

(2)

2.2 Definitions and Standard Terminology

(3)

2.3 Non-representative Sampling

(1)

2.3.1 Convenience Sampling

(1)

2.3.2 Haphazard Sampling

(1)

2.3.3 Purposive Sampling

(1)

2.4 Representative Sampling

(8)

2.4.1 Simple Random Sampling

(3)

2.4.2 Systematic Sampling

(1)

2.4.3 Stratified Sampling

(1)

2.4.4 Cluster Sampling

(2)

2.5 Evaluating Estimators Given Different Sampling Plans

(11)

2.5.1 Generic Formulation of Sampling Plans

(1)

2.5.2 Bias, Standard Error, and Mean Squared Error

(3)

2.5.3 Illustration of a Comparison of Sampling Plans

(2)

2.5.4 Comparing Sampling Plans Using R

(5)

2.6 Estimation of the Population Mean

(8)

2.6.1 Simple Random Sampling

(3)

2.6.2 Systematic Sampling

(1)

2.6.3 Stratified Sampling

(1)

2.6.4 Cluster Sampling

(3)

2.7 Estimation of the Population Proportion

(1)

2.8 Estimation of the Population Variance

(2)

2.8.1 Estimation of the MSE

(1)

2.9 Conclusions

(6)

Problems

(4)

References

(2)

3 Probability Theory

(22)

3.1 Introduction

(1)

3.2 Definitions of Probability

(2)

3.3 Probability Axioms

(2)

3.3.1 Example: Using the Probability Axioms

(1)

3.4 Conditional Probability

(3)

3.4.1 Example: Using Conditional Probabilities

(2)

3.4.2 Computing Probabilities Using R

(1)

3.5 Measures of Risk

(4)

3.5.1 Risk Difference

(1)

3.5.2 Relative Risk

(1)

3.5.3 Odds Ratio

(1)

3.5.4 Example: Using Risk Measures

(1)

3.6 Sampling from Populations: Different Study Designs

(3)

3.6.1 Cross-Sectional Study

(1)

3.6.2 Cohort Study

(1)

3.6.3 Case-Control Study

(1)

3.7 Simpson's Paradox

(2)

3.8 Conclusion

(5)

Problems

(4)

References

102

(1)

4 Random Variables and Distributions

103

(38)

4.1 Introduction

103

(1)

4.2 Probability Density Functions

104

(8)

4.2.1 Normal Density Function

105

(3)

4.2.2 Lognormal Density Function

108

(1)

4.2.3 Uniform Density Function

109

(1)

4.2.4 Exponential Density Function

110

(2)

4.3 Distribution Functions and Continuous Random Variables

112

(4)

4.4 Expected Values of Continuous Random Variables

116

(3)

4.5 Distributions of Discrete Random Variables

119

(2)

4.6 Expected Values of Discrete Random Variables

121

(1)

4.7 Well-Known Discrete Distributions

122

(5)

4.7.1 Bernoulli Probability Mass Function

122

(1)

4.7.2 Binomial Probability Mass Function

122

(2)

4.7.3 Poisson Probability Mass Function

124

(1)

4.7.4 Negative Binomial Probability Mass Function

125

(1)

4.7.5 Overview of Moments for Weil-Known Discrete Distributions

126

(1)

4.8 Working with Distributions in R

127

(5)

4.8.1 R Built-in Functions

127

(1)

4.8.2 Using Monte-Carlo Methods

128

(3)

4.8.3 Obtaining Draws from Distributions: Inverse Transform Sampling

131

(1)

4.9 Relationships Between Distributions

132

(2)

4.9.1 Binomial--Poisson

133

(1)

4.9.2 Binomial--Normal

133

(1)

4.10 Calculation Rules for Random Variables

134

(2)

4.10.1 Rules for Single Random Variables

134

(1)

4.10.2 Rules for Two Random Variables

135

(1)

4.11 Conclusion

136

(5)

Problems

136

(4)

References

140

(1)

5 Estimation

141

(30)

5.1 Introduction

141

(1)

5.2 From Population Characteristics to Sample Statistics

142

(3)

5.2.1 Population Characteristics

143

(1)

5.2.2 Sample Statistics Under Simple Random Sampling

144

(1)

5.3 Distributions of Sample Statistic Tn

145

(9)

5.3.1 Distribution of the Sample Maximum or Minimum

146

(1)

5.3.2 Distribution of the Sample Average X

147

(2)

5.3.3 Distribution of the Sample Variance S2

149

(1)

5.3.4 The Central Limit Theorem

149

(3)

5.3.5 Asymptotic Confidence Intervals

152

(2)

5.4 Normally Distributed Populations

154

(5)

5.4.1 Confidence Intervals for Normal Populations

156

(3)

5.4.2 Lognormally Distributed Populations

159

(1)

5.5 Methods of Estimation

159

(12)

5.5.1 Method of Moments

160

(2)

5.5.2 Maximum Likelihood Estimation

162

(5)

Problems

167

(2)

Reference

169

(2)

6 Multiple Random Variables

171

(70)

6.1 Introduction

171

(1)

6.2 Multivariate Distributions

172

(7)

6.2.1 Definition of Independence

173

(1)

6.2.2 Discrete Random Variables

174

(3)

6.2.3 Continuous Random Variables

177

(2)

6.3 Constructing Bivariate Probability Distributions

179

(4)

6.3.1 Using Sums of Random Variables

179

(1)

6.3.2 Using the Farlie-Gumbel-Morgenstern Family of Distributions

180

(1)

6.3.3 Using Mixtures of Probability Distributions

181

(2)

6.3.4 Using the Frechet Family of Distributions

183

(1)

6.4 Properties of Multivariate Distributions

183

(8)

6.4.1 Expectations

184

(2)

6.4.2 Covariances

186

(5)

6.5 Measures of Association

191

(8)

6.5.1 Pearson's Correlation Coefficient

191

(4)

6.5.2 Kendall's Tau Correlation

195

(1)

6.5.3 Spearman's Rho Correlation

196

(1)

6.5.4 Cohen's Kappa Statistic

197

(2)

6.6 Estimators of Measures of Association

199

(14)

6.6.1 Pearson's Correlation Coefficient

199

(3)

6.6.2 Kendall's Tau Correlation Coefficient

202

(2)

6.6.3 Spearman's Rho Correlation Coefficient

204

(3)

6.6.4 Should We Use Pearson's Rho, Spearman's Rho or Kendall's Tau Correlation?

207

(2)

6.6.5 Cohen's Kappa Statistic

209

(2)

6.6.6 Risk Difference, Relative Risk, and Odds Ratio

211

(2)

6.7 Other Sample Statistics for Association

213

(10)

6.7.1 Nominal Association Statistics

213

(4)

6.7.2 Ordinal Association Statistics

217

(2)

6.7.3 Binary Association Statistics

219

(4)

6.8 Exploring Multiple Variables Using R

223

(12)

6.8.1 Associations Between Continuous Variables

223

(3)

6.8.2 Association Between Binary Variables

226

(6)

6.8.3 Association Between Categorical Variables

232

(3)

6.9 Conclusions

235

(6)

Problems

235

(3)

References

238

(3)

7 Making Decisions in Uncertainty

241

(46)

7.1 Introduction

241

(1)

7.2 Bootstrapping

242

(9)

7.2.1 The Basic Idea Behind the Bootstrap

243

(2)

7.2.2 Applying the Bootstrap: The Non-parametric Bootstrap

245

(2)

7.2.3 Applying the Bootstrap: The Parametric Bootstrap

247

(1)

7.2.4 Applying the Bootstrap: Bootstrapping Massive Datasets

248

(3)

7.2.5 A Critical Discussion of the Bootstrap

251

(1)

7.3 Hypothesis Testing

251

(31)

7.3.1 The One-Sided z-Test for a Single Mean

253

(3)

7.3.2 The Two-Sided z-Test for a Single Mean

256

(2)

7.3.3 Confidence Intervals and Hypothesis Testing

258

(1)

7.3.4 The t-Tests for Means

259

(4)

7.3.5 Non-parametric Tests for Medians

263

(6)

7.3.6 Tests for Equality of Variation from Two Independent Samples

269

(2)

7.3.7 Tests for Independence Between Two Variables

271

(3)

7.3.8 Tests for Normality

274

(2)

7.3.9 Tests for Outliers

276

(4)

7.3.10 Equivalence Testing

280

(2)

7.4 Conclusions

282

(5)

Problems

283

(2)

References

285

(2)

8 Bayesian Statistics

287

8.1 Introduction

287

(1)

8.2 Bayes' Theorem for Population Parameters

288

(5)

8.2.1 Bayes' Law for Multiple Events

290

(1)

8.2.2 Bayes' Law for Competing Hypotheses

290

(1)

8.2.3 Bayes' Law for Statistical Models

291

(1)

8.2.4 The Fundamentals of Bayesian Data Analysis

292

(1)

8.3 Bayesian Data Analysis by Example

293

(8)

8.3.1 Estimating the Parameter of a Bernoulli Population

293

(2)

8.3.2 Estimating the Parameters of a Normal Population

295

(1)

8.3.3 Bayesian Analysis for Normal Populations Based on Single Observation

296

(2)

8.3.4 Bayesian Analysis for Normal Populations Based on Multiple Observations

298

(1)

8.3.5 Bayesian Analysis for Normal Populations with Unknown Mean and Variance

299

(2)

8.4 Bayesian Decision-Making in Uncertainty

301

(6)

8.4.1 Providing Point Estimates of Parameters

301

(2)

8.4.2 Providing Interval Estimates of the Parameters

303

(2)

8.4.3 Testing Hypotheses

305

(2)

8.5 Challenges Involved in the Bayesian Approach

307

(6)

8.5.1 Choosing a Prior

308

(3)

8.5.2 Bayesian Computation

311

(2)

8.6 Software for Bayesian Analysis

313

(4)

8.6.1 A Simple Bernoulli Model Using Stan

314

(3)

8.7 Bayesian and Frequentist Thinking Compared

317

(1)

8.8 Conclusion

318

Problems

319

(1)

References

320

Prof. Dr. Maurits Kaptein works on statistical methods for sequential experimentation. He has extensive experience in research and education in the fields of statistics, machine learning, and research methodology. Maurits works for the Jheronimus Academy of Data Science and for the University of Tilburg. His work has been published in influential journals such as Bayesian Analysis and the Journal of Interactive Marketing.

Prof. Dr. Edwin van den Heuvel works on statistical methods for analyzing cross-sectional and longitudinal data from experimental and observational studies in the domain of health and life sciences. He has been teaching many different topics on statistics to (PhD, master, and bachelor) students from different backgrounds (medicine, engineering, mathematics, etc.) He is full-time professor in statistics at Eindhoven University of Technology and has affiliations at other universities. He publishes mostly in peer-reviewed influential statistical, epidemiological, and medical journals.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97830301053106e.html

Märksõnad:

E-raamat: Statistics for Data Scientists: An Introduction to Probability, Statistics, and Data Analysis

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv