Muutke küpsiste eelistusi

E-raamat: Statistical Inference and Machine Learning for Big Data

  • Formaat - EPUB+DRM
  • Hind: 148,19 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book presents a variety of advanced statistical methods at a level suitable for advanced undergraduate and graduate students as well as for others interested in familiarizing themselves with these important subjects. It proceeds to illustrate these methods in the context of real-life applications in a variety of areas such as genetics, medicine, and environmental problems.





The book begins in Part I by outlining various data types and by indicating how these are normally represented graphically and subsequently analyzed. In Part II, the basic tools in probability and statistics are introduced with special reference to symbolic data analysis. The most useful and relevant results pertinent to this book are retained. In Part III, the focus is on the tools of machine learning whereas in Part IV the computational aspects of BIG DATA are presented.





This book would serve as a handy desk reference for statistical methods at the undergraduate and graduate level as well as be useful in courses which aim to provide an overview of modern statistics and its applications.
I Introduction to Big Data
1(14)
1 Examples of Big Data
5(10)
1.1 Multivariate Data
5(3)
1.2 Categorical Data
8(2)
1.3 Environmental Data
10(1)
1.4 Genetic Data
10(1)
1.5 Time Series Data
11(1)
1.6 Ranking Data
11(1)
1.7 Social Network Data
12(1)
1.8 Symbolic Data
13(1)
1.9 Image Data
13(2)
II Statistical Inference for Big Data
15(260)
2 Basic Concepts in Probability
17(20)
2.1 Pearson System of Distributions
21(6)
2.2 Modes of Convergence
27(6)
2.3 Multivariate Central Limit Theorem
33(1)
2.4 Markov Chains
34(3)
3 Basic Concepts in Statistics
37(26)
3.1 Parametric Estimation
37(9)
3.2 Hypothesis Testing
46(11)
3.3 Classical Bayesian Statistics
57(6)
4 Multivariate Methods
63(32)
4.1 Matrix Algebra
63(1)
4.2 Multivariate Analysis as a Generalization of Univariate Analysis
64(7)
4.2.1 The General Linear Model
67(1)
4.2.2 One Sample Problem
68(1)
4.2.3 Two-Sample Problem
69(2)
4.3 Structure in Multivariate Data Analysis
71(24)
4.3.1 Principal Component Analysis
71(3)
4.3.2 Factor Analysis
74(2)
4.3.3 Canonical Correlation
76(3)
4.3.4 Linear Discriminant Analysis
79(1)
4.3.5 Multidimensional Scaling
80(7)
4.3.6 Copula Methods
87(8)
5 Nonparametric Statistics
95(76)
5.1 Goodness-of-Fit Tests
96(2)
5.2 Linear Rank Statistics
98(14)
5.3 U Statistics
112(2)
5.4 Hoeffding's Combinatorial Central Limit Theorem
114(2)
5.5 Nonparametric Tests
116(7)
5.5.1 One-Sample Tests of Location
116(3)
5.5.2 Confidence Interval for the Median
119(1)
5.5.3 Wilcoxon Signed Rank Test
120(3)
5.6 Multi-Sample Tests
123(4)
5.6.1 Two-Sample Tests for Location
124(1)
5.6.2 Multi-Sample Test for Location
125(1)
5.6.3 Tests for Dispersion
126(1)
5.7 Compatibility
127(1)
5.8 Tests for Ordered Alternatives
128(4)
5.9 A Unified Theory of Hypothesis Testing
132(10)
5.9.1 Umbrella Alternatives
132(4)
5.9.2 Tests for Trend in Proportions
136(6)
5.10 Randomized Block Designs
142(2)
5.11 Density Estimation
144(10)
5.11.1 Univariate Kernel Density Estimation
145(4)
5.11.2 The Rank Transform
149(1)
5.11.3 Multivariate Kernel Density Estimation
149(5)
5.12 Spatial Data Analysis
154(8)
5.12.1 Spatial Prediction
156(4)
5.12.2 Point Poisson Kriging of Areal Data
160(2)
5.13 Efficiency
162(7)
5.13.1 Pitman Efficiency
162(6)
5.13.2 Application of Le Cam's Lemmas
168(1)
5.14 Permutation Methods
169(2)
6 Exponential Tilting and Its Applications
171(24)
6.1 Neyman Smooth Tests
171(4)
6.2 Smooth Models for Discrete Distributions
175(4)
6.3 Rejection Sampling
179(5)
6.4 Tweedie's Formula: Univariate Case
184(4)
6.5 Tweedie's Formula: Multivariate Case
188(1)
6.6 The Saddlepoint Approximation and Notions of Information
189(6)
7 Counting Data Analysis
195(20)
7.1 Inference for Generalized Linear Models
198(2)
7.2 Inference for Contingency Tables
200(4)
7.3 Two-Way Ordered Classifications
204(5)
7.4 Survival Analysis
209(6)
7.4.1 Kaplan-Meier Estimator
211(3)
7.4.2 Modeling Survival Data
214(1)
8 Time Series Methods
215(14)
8.1 Classical Methods of Analysis
215(9)
8.2 State Space Modeling
224(5)
9 Estimating Equations
229(18)
9.1 Composite Likelihood
234(2)
9.2 Empirical Likelihood
236(11)
9.2.1 Application to One-Sample Ranking Problems
239(4)
9.2.2 Application to Two-Sample Ranking Problems
243(4)
10 Symbolic Data Analysis
247(28)
10.1 Introduction
247(1)
10.2 Some Examples
247(1)
10.3 Interval Data
248(5)
10.3.1 Frequency
248(3)
10.3.2 Sample Mean and Sample Variance
251(2)
10.3.3 Realization In SODAS
253(1)
10.4 Multi-nominal Data
253(3)
10.4.1 Frequency
253(3)
10.5 Symbolic Regression
256(2)
10.5.1 Symbolic Regression for Interval Data
256(1)
10.5.2 Symbolic Regression for Modal Data
257(1)
10.5.3 Symbolic Regression in SODAS
257(1)
10.6 Cluster Analysis
258(1)
10.7 Factor Analysis
259(1)
10.8 Factorial Discriminant Analysis
260(1)
10.9 Application to Parkinson's Disease
260(7)
10.9.1 Data Processing
261(1)
10.9.2 Result Analysis
262(1)
10.9.2.1 Viewer
262(1)
10.9.2.2 Descriptive Statistics
262(1)
10.9.2.3 Symbolic Regression Analysis
263(1)
10.9.2.4 Symbolic Clustering
263(1)
10.9.2.5 Principal Component Analysis
264(3)
10.9.3 Comparison with Classical Method
267(1)
10.10 Application to Cardiovascular Disease Analysis
267(8)
10.10.1 Results of the Analysis
269(4)
10.10.2 Comparison with the Classical Method
273(2)
III Machine Learning for Big Data
275(108)
11 Tools for Machine Learning
277(52)
11.1 Regression Models
277(2)
11.2 Simple Linear Regression
279(10)
11.2.1 Least Squares Method
280(2)
11.2.2 Statistical Inference on Regression Coefficients
282(2)
11.2.3 Verifying the Assumptions on the Error Terms
284(5)
11.3 Multiple Linear Regression
289(7)
11.3.1 Multiple Linear Regression Model
289(1)
11.3.2 Normal Equations
290(1)
11.3.3 Statistical Inference on Regression Coefficients
291(1)
11.3.4 Model Fit Evaluation
292(4)
11.4 Regression in Machine Learning
296(10)
11.4.1 Optimization for Linear Regression in Machine Learning
298(2)
11.4.1.1 Gradient Descent
300(1)
11.4.1.2 Feature Standardization
301(2)
11.4.1.3 Computing Cost on a Test Set
303(3)
11.5 Classification Models
306(23)
11.5.1 Logistic Regression
307(1)
11.5.1.1 Optimization with Maximal Likelihood for Logistic Regression
308(2)
11.5.1.2 Statistical Inference
310(1)
11.5.2 Logistic Regression for Binary Classification
311(1)
11.5.2.1 Kullback-Leibler Divergence
312(4)
11.5.3 Logistic Regression with Multiple Response Classes
316(1)
11.5.4 Regularization for Regression Models in Machine Learning
317(2)
11.5.4.1 Ridge Regression
319(1)
11.5.4.2 Lasso Regression
320(1)
11.5.4.3 The Choice of Regularization Method
321(1)
11.5.5 Support Vector Machines (SVM)
321(1)
11.5.5.1 Introduction
321(1)
11.5.5.2 Finding the Optimal Hyperplane
322(3)
11.5.5.3 SVM for Nonlinearly Separable Data Sets
325(1)
11.5.5.4 Illustrating SVM
325(4)
12 Neural Networks
329(54)
12.1 Feed-Forward Networks
329(21)
12.1.1 Motivation
330(3)
12.1.2 Introduction to Neural Networks
333(1)
12.1.3 Building a Deep Feed-Forward Network
334(6)
12.1.4 Learning in Deep Networks
340(1)
12.1.4.1 Quantitative Model
341(1)
12.1.4.2 Binary Classification Model
342(1)
12.1.5 Generalization
342(3)
12.1.5.1 A Machine Learning Approach to Generalization
345(5)
12.2 Recurrent Neural Networks
350(16)
12.2.1 Building a Recurrent Neural Network
350(2)
12.2.2 Learning in Recurrent Networks
352(2)
12.2.3 Most Common Design Structures of RNNs
354(3)
12.2.4 Deep RNN
357(2)
12.2.5 Bidirectional RNN
359(2)
12.2.6 Long-Term Dependencies and LSTM RNN
361(3)
12.2.7 Reduction for Exploding Gradients
364(2)
12.3 Convolution Neural Networks
366(10)
12.3.1 Convolution Operator for Arrays
368(1)
12.3.1.1 Properties of the Convolution Operator
369(3)
12.3.2 Convolution Layers
372(3)
12.3.3 Pooling Layers
375(1)
12.4 Text Analytics
376(7)
12.4.1 Introduction
376(2)
12.4.2 General Architecture
378(5)
IV Computational Methods for Statistical Inference
383(44)
13 Bayesian Computation Methods
385(42)
13.1 Data Augmentation Methods
385(2)
13.2 Metropolis-Hastings Algorithm
387(2)
13.3 Gibbs Sampling
389(1)
13.4 EM Algorithm
390(10)
13.4.1 Application to Ranking
391(7)
13.4.2 Extension to Several Populations
398(2)
13.5 Variational Bayesian Methods
400(4)
13.5.1 Optimization of the Variational Distribution
402(2)
13.6 Bayesian Nonparametric Methods
404(23)
13.6.1 Dirichlet Prior
404(4)
13.6.2 The Poisson-Dirichlet Prior
408(1)
13.6.3 Simulation of Bayesian Posterior Distributions
408(2)
13.6.4 Other Applications
410(17)
Index 427