Muutke küpsiste eelistusi

E-raamat: Statistics for Data Scientists: An Introduction to Probability, Statistics, and Data Analysis

  • Formaat - EPUB+DRM
  • Hind: 49,39 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book provides an undergraduate introduction to analysing data for data science, computer science, and quantitative social science students. It uniquely combines a hands-on approach to data analysis – supported by numerous real data examples and reusable [ R] code – with a rigorous treatment of probability and statistical principles. 

Where contemporary undergraduate textbooks in probability theory or statistics often miss applications and an introductory treatment of modern methods (bootstrapping, Bayes, etc.), and where applied data analysis books often miss a rigorous theoretical treatment, this book provides an accessible but thorough introduction into data analysis, using statistical methods combining the two viewpoints. The book further focuses on methods for dealing with large data-sets and streaming-data and hence provides a single-course introduction of statistical methods for data science.

Arvustused

Having taught data analytics at the introductory graduate level, I welcome the authors textbook as an essential resource for training well-grounded entry-level data scientists. A data scientist shall provide competent data science professional services to a client. Training in both the theory and practice of data analytics is a requirement for such competence. The authors textbook definitely provides a valuable resource for such training. (Harry J. Foxwell, Computing Reviews, July 7, 2022)

1 A First Look at Data
1(38)
1.1 Overview and Learning Goals
1(1)
1.2 Getting Started with R
2(8)
1.2.1 Opening a Dataset: face-data. csv
2(4)
1.2.2 Some Useful Commands for Exploring a Dataset
6(2)
1.2.3 Scalars, Vectors, Matrices, Data.frames, Objects
8(2)
1.3 Measurement Levels
10(3)
1.3.1 Outliers and Unrealistic Values
11(2)
1.4 Describing Data
13(7)
1.4.1 Frequency
13(1)
1.4.2 Central Tendency
14(3)
1.4.3 Dispersion, Skewness, and Kurtosis
17(2)
1.4.4 A Note on Aggregated Data
19(1)
1.5 Visualizing Data
20(11)
1.5.1 Describing Nominal/ordinal Variables
21(2)
1.5.2 Describing Interval/ratio Variables
23(2)
1.5.3 Relations Between Variables
25(1)
1.5.4 Multi-panel Plots
26(1)
1.5.5 Plotting Mathematical Functions
27(3)
1.5.6 Frequently Used Arguments
30(1)
1.6 Other R Plotting Systems (And Installing Packages)
31(8)
1.6.1 Lattice
31(1)
1.6.2 GGplot2
32(1)
Problems
33(4)
References
37(2)
2 Sampling Plans and Estimates
39(42)
2.1 Introduction
39(2)
2.2 Definitions and Standard Terminology
41(3)
2.3 Non-representative Sampling
44(1)
2.3.1 Convenience Sampling
44(1)
2.3.2 Haphazard Sampling
44(1)
2.3.3 Purposive Sampling
45(1)
2.4 Representative Sampling
45(8)
2.4.1 Simple Random Sampling
46(3)
2.4.2 Systematic Sampling
49(1)
2.4.3 Stratified Sampling
50(1)
2.4.4 Cluster Sampling
51(2)
2.5 Evaluating Estimators Given Different Sampling Plans
53(11)
2.5.1 Generic Formulation of Sampling Plans
53(1)
2.5.2 Bias, Standard Error, and Mean Squared Error
54(3)
2.5.3 Illustration of a Comparison of Sampling Plans
57(2)
2.5.4 Comparing Sampling Plans Using R
59(5)
2.6 Estimation of the Population Mean
64(8)
2.6.1 Simple Random Sampling
64(3)
2.6.2 Systematic Sampling
67(1)
2.6.3 Stratified Sampling
68(1)
2.6.4 Cluster Sampling
69(3)
2.7 Estimation of the Population Proportion
72(1)
2.8 Estimation of the Population Variance
73(2)
2.8.1 Estimation of the MSE
74(1)
2.9 Conclusions
75(6)
Problems
75(4)
References
79(2)
3 Probability Theory
81(22)
3.1 Introduction
81(1)
3.2 Definitions of Probability
82(2)
3.3 Probability Axioms
84(2)
3.3.1 Example: Using the Probability Axioms
85(1)
3.4 Conditional Probability
86(3)
3.4.1 Example: Using Conditional Probabilities
87(2)
3.4.2 Computing Probabilities Using R
89(1)
3.5 Measures of Risk
89(4)
3.5.1 Risk Difference
90(1)
3.5.2 Relative Risk
91(1)
3.5.3 Odds Ratio
91(1)
3.5.4 Example: Using Risk Measures
92(1)
3.6 Sampling from Populations: Different Study Designs
93(3)
3.6.1 Cross-Sectional Study
93(1)
3.6.2 Cohort Study
94(1)
3.6.3 Case-Control Study
95(1)
3.7 Simpson's Paradox
96(2)
3.8 Conclusion
98(5)
Problems
98(4)
References
102(1)
4 Random Variables and Distributions
103(38)
4.1 Introduction
103(1)
4.2 Probability Density Functions
104(8)
4.2.1 Normal Density Function
105(3)
4.2.2 Lognormal Density Function
108(1)
4.2.3 Uniform Density Function
109(1)
4.2.4 Exponential Density Function
110(2)
4.3 Distribution Functions and Continuous Random Variables
112(4)
4.4 Expected Values of Continuous Random Variables
116(3)
4.5 Distributions of Discrete Random Variables
119(2)
4.6 Expected Values of Discrete Random Variables
121(1)
4.7 Well-Known Discrete Distributions
122(5)
4.7.1 Bernoulli Probability Mass Function
122(1)
4.7.2 Binomial Probability Mass Function
122(2)
4.7.3 Poisson Probability Mass Function
124(1)
4.7.4 Negative Binomial Probability Mass Function
125(1)
4.7.5 Overview of Moments for Weil-Known Discrete Distributions
126(1)
4.8 Working with Distributions in R
127(5)
4.8.1 R Built-in Functions
127(1)
4.8.2 Using Monte-Carlo Methods
128(3)
4.8.3 Obtaining Draws from Distributions: Inverse Transform Sampling
131(1)
4.9 Relationships Between Distributions
132(2)
4.9.1 Binomial--Poisson
133(1)
4.9.2 Binomial--Normal
133(1)
4.10 Calculation Rules for Random Variables
134(2)
4.10.1 Rules for Single Random Variables
134(1)
4.10.2 Rules for Two Random Variables
135(1)
4.11 Conclusion
136(5)
Problems
136(4)
References
140(1)
5 Estimation
141(30)
5.1 Introduction
141(1)
5.2 From Population Characteristics to Sample Statistics
142(3)
5.2.1 Population Characteristics
143(1)
5.2.2 Sample Statistics Under Simple Random Sampling
144(1)
5.3 Distributions of Sample Statistic Tn
145(9)
5.3.1 Distribution of the Sample Maximum or Minimum
146(1)
5.3.2 Distribution of the Sample Average X
147(2)
5.3.3 Distribution of the Sample Variance S2
149(1)
5.3.4 The Central Limit Theorem
149(3)
5.3.5 Asymptotic Confidence Intervals
152(2)
5.4 Normally Distributed Populations
154(5)
5.4.1 Confidence Intervals for Normal Populations
156(3)
5.4.2 Lognormally Distributed Populations
159(1)
5.5 Methods of Estimation
159(12)
5.5.1 Method of Moments
160(2)
5.5.2 Maximum Likelihood Estimation
162(5)
Problems
167(2)
Reference
169(2)
6 Multiple Random Variables
171(70)
6.1 Introduction
171(1)
6.2 Multivariate Distributions
172(7)
6.2.1 Definition of Independence
173(1)
6.2.2 Discrete Random Variables
174(3)
6.2.3 Continuous Random Variables
177(2)
6.3 Constructing Bivariate Probability Distributions
179(4)
6.3.1 Using Sums of Random Variables
179(1)
6.3.2 Using the Farlie-Gumbel-Morgenstern Family of Distributions
180(1)
6.3.3 Using Mixtures of Probability Distributions
181(2)
6.3.4 Using the Frechet Family of Distributions
183(1)
6.4 Properties of Multivariate Distributions
183(8)
6.4.1 Expectations
184(2)
6.4.2 Covariances
186(5)
6.5 Measures of Association
191(8)
6.5.1 Pearson's Correlation Coefficient
191(4)
6.5.2 Kendall's Tau Correlation
195(1)
6.5.3 Spearman's Rho Correlation
196(1)
6.5.4 Cohen's Kappa Statistic
197(2)
6.6 Estimators of Measures of Association
199(14)
6.6.1 Pearson's Correlation Coefficient
199(3)
6.6.2 Kendall's Tau Correlation Coefficient
202(2)
6.6.3 Spearman's Rho Correlation Coefficient
204(3)
6.6.4 Should We Use Pearson's Rho, Spearman's Rho or Kendall's Tau Correlation?
207(2)
6.6.5 Cohen's Kappa Statistic
209(2)
6.6.6 Risk Difference, Relative Risk, and Odds Ratio
211(2)
6.7 Other Sample Statistics for Association
213(10)
6.7.1 Nominal Association Statistics
213(4)
6.7.2 Ordinal Association Statistics
217(2)
6.7.3 Binary Association Statistics
219(4)
6.8 Exploring Multiple Variables Using R
223(12)
6.8.1 Associations Between Continuous Variables
223(3)
6.8.2 Association Between Binary Variables
226(6)
6.8.3 Association Between Categorical Variables
232(3)
6.9 Conclusions
235(6)
Problems
235(3)
References
238(3)
7 Making Decisions in Uncertainty
241(46)
7.1 Introduction
241(1)
7.2 Bootstrapping
242(9)
7.2.1 The Basic Idea Behind the Bootstrap
243(2)
7.2.2 Applying the Bootstrap: The Non-parametric Bootstrap
245(2)
7.2.3 Applying the Bootstrap: The Parametric Bootstrap
247(1)
7.2.4 Applying the Bootstrap: Bootstrapping Massive Datasets
248(3)
7.2.5 A Critical Discussion of the Bootstrap
251(1)
7.3 Hypothesis Testing
251(31)
7.3.1 The One-Sided z-Test for a Single Mean
253(3)
7.3.2 The Two-Sided z-Test for a Single Mean
256(2)
7.3.3 Confidence Intervals and Hypothesis Testing
258(1)
7.3.4 The t-Tests for Means
259(4)
7.3.5 Non-parametric Tests for Medians
263(6)
7.3.6 Tests for Equality of Variation from Two Independent Samples
269(2)
7.3.7 Tests for Independence Between Two Variables
271(3)
7.3.8 Tests for Normality
274(2)
7.3.9 Tests for Outliers
276(4)
7.3.10 Equivalence Testing
280(2)
7.4 Conclusions
282(5)
Problems
283(2)
References
285(2)
8 Bayesian Statistics
287
8.1 Introduction
287(1)
8.2 Bayes' Theorem for Population Parameters
288(5)
8.2.1 Bayes' Law for Multiple Events
290(1)
8.2.2 Bayes' Law for Competing Hypotheses
290(1)
8.2.3 Bayes' Law for Statistical Models
291(1)
8.2.4 The Fundamentals of Bayesian Data Analysis
292(1)
8.3 Bayesian Data Analysis by Example
293(8)
8.3.1 Estimating the Parameter of a Bernoulli Population
293(2)
8.3.2 Estimating the Parameters of a Normal Population
295(1)
8.3.3 Bayesian Analysis for Normal Populations Based on Single Observation
296(2)
8.3.4 Bayesian Analysis for Normal Populations Based on Multiple Observations
298(1)
8.3.5 Bayesian Analysis for Normal Populations with Unknown Mean and Variance
299(2)
8.4 Bayesian Decision-Making in Uncertainty
301(6)
8.4.1 Providing Point Estimates of Parameters
301(2)
8.4.2 Providing Interval Estimates of the Parameters
303(2)
8.4.3 Testing Hypotheses
305(2)
8.5 Challenges Involved in the Bayesian Approach
307(6)
8.5.1 Choosing a Prior
308(3)
8.5.2 Bayesian Computation
311(2)
8.6 Software for Bayesian Analysis
313(4)
8.6.1 A Simple Bernoulli Model Using Stan
314(3)
8.7 Bayesian and Frequentist Thinking Compared
317(1)
8.8 Conclusion
318
Problems
319(1)
References
320
Prof. Dr. Maurits Kaptein works on statistical methods for sequential experimentation. He has extensive experience in research and education in the fields of statistics, machine learning, and research methodology. Maurits works for the Jheronimus Academy of Data Science and for the University of Tilburg. His work has been published in influential journals such as Bayesian Analysis and the Journal of Interactive Marketing.

Prof. Dr. Edwin van den Heuvel works on statistical methods for analyzing cross-sectional and longitudinal data from experimental and observational studies in the domain of health and life sciences. He has been teaching many different topics on statistics to (PhD, master, and bachelor) students from different backgrounds (medicine, engineering, mathematics, etc.) He is full-time professor in statistics at Eindhoven University of Technology and has affiliations at other universities. He publishes mostly in peer-reviewed influential statistical, epidemiological, and medical journals.