Muutke küpsiste eelistusi

Statistical Foundations of Data Science [Kõva köide]

(Princeton University, New Jersey, USA), , (Rutgers University, Piscataway, USA), (Pennsylvania State University, University Park, USA)
  • Formaat: Hardback, 774 pages, kõrgus x laius: 234x156 mm, kaal: 1260 g, 100 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Data Science Series
  • Ilmumisaeg: 17-Aug-2020
  • Kirjastus: CRC Press Inc
  • ISBN-10: 1466510846
  • ISBN-13: 9781466510845
Teised raamatud teemal:
  • Kõva köide
  • Hind: 93,79 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 132,79 €
  • Säästad 29%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 774 pages, kõrgus x laius: 234x156 mm, kaal: 1260 g, 100 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Data Science Series
  • Ilmumisaeg: 17-Aug-2020
  • Kirjastus: CRC Press Inc
  • ISBN-10: 1466510846
  • ISBN-13: 9781466510845
Teised raamatud teemal:
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Arvustused

"This book delivers a very comprehensive summary of the development of statistical foundations of data science. The authors no doubt are doing frontier research and have made several crucial contributions to the field. Therefore, the book offers a very good account of the most cutting-edge development. The book is suitable for both master and Ph.D. students in statistics, and also for researchers in both applied and theoretical data science. Researchers can take this book as an index of topics, as it summarizes in brief many significant research articles in an accessible way. Each chapter can be read independently by experienced researchers. It provides a nice cover of key concepts in those topics and researchers can benefit from reading the specific chapters and paragraphs to get a big picture rather than diving into many technical articles. There are altogether 14 chapters. It can serve as a textbook for two semesters. The book also provides handy codes and data sets, which is a great treasure for practitioners." ~Journal of Time Series Analysis

"This textcollaboratively authored by renowned statisticians Fan (Princeton Univ.), Li (Pennsylvania State Univ.), Zhang (Rutgers Univ.), and Zhou (Univ. of Minnesota)laboriously compiles and explains theoretical and methodological achievements in data science and big data analytics. Amid today's flood of coding-based cookbooks for data science, this book is a rare monograph addressing recent advances in mathematical and statistical principles and the methods behind regularized regression, analysis of high-dimensional data, and machine learning. The pinnacle achievement of the book is its comprehensive exploration of sparsity for model selection in statistical regression, considering models such as generalized linear regression, penalized least squares, quantile and robust regression, and survival regression. The authors discuss sparsity not only in terms of various types of penalties but also as an important feature of numerical optimization algorithms, now used in manifold applications including deep learning. The text extensively probes contemporary high-dimensional data modeling methods such as feature screening, covariate regularization, graphical modeling, and principal component and factor analysis. The authors conclude by introducing contemporary statistical machine learning, spanning a range of topics in supervised and unsupervised learning techniques and deep learning. This book is a must-have bookshelf item for those with a thirst for learning about the theoretical rigor of data science." ~Choice Review, S-T. Kim, North Carolina A&T State University, August 2021

Preface xvii
1 Introduction
1(20)
1.1 Rise of Big Data and Dimensionality
1(8)
1.1.1 Biological sciences
2(2)
1.1.2 Health sciences
4(1)
1.1.3 Computer and information sciences
5(2)
1.1.4 Economics and finance
7(2)
1.1.5 Business and program evaluation
9(1)
1.1.6 Earth sciences and astronomy
9(1)
1.2 Impact of Big Data
9(2)
1.3 Impact of Dimensionality
11(7)
1.3.1 Computation
11(1)
1.3.2 Noise accumulation
12(2)
1.3.3 Spurious correlation
14(3)
1.3.4 Statistical theory
17(1)
1.4 Aim of High-dimensional Statistical Learning
18(1)
1.5 What Big Data Can Do
19(1)
1.6 Scope of the Book
19(2)
2 Multiple and Nonparametric Regression
21(34)
2.1 Introduction
21(1)
2.2 Multiple Linear Regression
21(6)
2.2.1 The Gauss-Markov theorem
23(3)
2.2.2 Statistical tests
26(1)
2.3 Weighted Least-Squares
27(2)
2.4 Box-Cox Transformation
29(1)
2.5 Model Building and Basis Expansions
30(7)
2.5.1 Polynomial regression
31(1)
2.5.2 Spline regression
32(3)
2.5.3 Multiple covariates
35(2)
2.6 Ridge Regression
37(5)
2.6.1 Bias-variance tradeoff
37(1)
2.6.2 2 penalized least squares
38(1)
2.6.3 Bayesian interpretation
38(1)
2.6.4 Ridge regression solution path
39(2)
2.6.5 Kernel ridge regression
41(1)
2.7 Regression in Reproducing Kernel Hilbert Space
42(5)
2.8 Leave-one-out and Generalized Cross-validation
47(2)
2.9 Exercises
49(6)
3 Introduction to Penalized Least-Squares
55(66)
3.1 Classical Variable Selection Criteria
55(4)
3.1.1 Subset selection
55(1)
3.1.2 Relation with penalized regression
56(1)
3.1.3 Selection of regularization parameters
57(2)
3.2 Folded-concave Penalized Least Squares
59(7)
3.2.1 Orthonormal designs
61(1)
3.2.2 Penalty functions
62(1)
3.2.3 Thresholding by SCAD and MCP
63(1)
3.2.4 Risk properties
64(1)
3.2.5 Characterization of folded-concave PLS
65(1)
3.3 Lasso and L1 Regularization
66(15)
3.3.1 Nonnegative garrote
66(2)
3.3.2 Lasso
68(3)
3.3.3 Adaptive Lasso
71(1)
3.3.4 Elastic Net
72(2)
3.3.5 Dantzig selector
74(3)
3.3.6 SLOPE and sorted penalties
77(1)
3.3.7 Concentration inequalities and uniform convergence
78(3)
3.3.8 A brief history of model selection
81(1)
3.4 Bayesian Variable Selection
81(3)
3.4.1 Bayesian view of the PLS
81(2)
3.4.2 A Bayesian framework for selection
83(1)
3.5 Numerical Algorithms
84(15)
3.5.1 Quadratic programs
84(2)
3.5.2 Least angle regression*
86(3)
3.5.3 Local quadratic approximations
89(2)
3.5.4 Local linear algorithm
91(1)
3.5.5 Penalized linear unbiased selection*
92(1)
3.5.6 Cyclic coordinate descent algorithms
93(1)
3.5.7 Iterative shrinkage-thresholding algorithms
94(2)
3.5.8 Projected proximal gradient method
96(1)
3.5.9 ADMM
96(1)
3.5.10 Iterative local adaptive majorization and minimization
97(1)
3.5.11 Other methods and timeline
98(1)
3.6 Regularization Parameters for PLS
99(4)
3.6.1 Degrees of freedom
100(2)
3.6.2 Extension of information criteria
102(1)
3.6.3 Application to PLS estimators
102(1)
3.7 Residual Variance and Refitted Cross-validation
103(3)
3.7.1 Residual variance of Lasso
103(1)
3.7.2 Refitted cross-validation
104(2)
3.8 Extensions to Nonparametric Modeling
106(3)
3.8.1 Structured nonparametric models
106(1)
3.8.2 Group penalty
107(2)
3.9 Applications
109(5)
3.10 Bibliographical Notes
114(1)
3.11 Exercises
115(6)
4 Penalized Least Squares: Properties
121(106)
4.1 Performance Benchmarks
121(18)
4.1.1 Performance measures
122(3)
4.1.2 Impact of model uncertainty
125(1)
4.1.2.1 Bayes lower bounds for orthogonal design
126(4)
4.1.2.2 Minimax lower bounds for general design
130(6)
4.1.3 Performance goals, sparsity and sub-Gaussian noise
136(3)
4.2 Penalized L0 Selection
139(6)
4.3 Lasso and Dantzig Selector
145(38)
4.3.1 Selection consistency
146(4)
4.3.2 Prediction and coefficient estimation errors
150(11)
4.3.3 Model size and least squares after selection
161(6)
4.3.4 Properties of the Dantzig selector
167(8)
4.3.5 Regularity conditions on the design matrix
175(8)
4.4 Properties of Concave PLS
183(23)
4.4.1 Properties of penalty functions
185(5)
4.4.2 Local and oracle solutions
190(5)
4.4.3 Properties of local solutions
195(5)
4.4.4 Global and approximate global solutions
200(6)
4.5 Smaller and Sorted Penalties
206(18)
4.5.1 Sorted concave penalties and their local approximation
207(4)
4.5.2 Approximate PLS with smaller and sorted penalties
211(9)
4.5.3 Properties of LLA and LCA
220(4)
4.6 Bibliographical Notes
224(1)
4.7 Exercises
225(2)
5 Generalized Linear Models and Penalized Likelihood
227(60)
5.1 Generalized Linear Models
227(11)
5.1.1 Exponential family
227(3)
5.1.2 Elements of generalized linear models
230(1)
5.1.3 Maximum likelihood
231(1)
5.1.4 Computing MLE: Iteratively reweighed least squares
232(2)
5.1.5 Deviance and analysis of deviance
234(2)
5.1.6 Residuals
236(2)
5.2 Examples
238(5)
5.2.1 Bernoulli and binomial models
238(3)
5.2.2 Models for count responses
241(2)
5.2.3 Models for nonnegative continuous responses
243(1)
5.2.4 Normal error models
243(1)
5.3 Sparest Solution in High Confidence Set
243(3)
5.3.1 A general setup
244(1)
5.3.2 Examples
244(1)
5.3.3 Properties
245(1)
5.4 Variable Selection via Penalized Likelihood
246(3)
5.5 Algorithms
249(3)
5.5.1 Local quadratic approximation
249(1)
5.5.2 Local linear approximation
250(1)
5.5.3 Coordinate descent
251(1)
5.5.4 Iterative local adaptive majorization and minimization
252(1)
5.6 Tuning Parameter Selection
252(2)
5.7 An Application
254(2)
5.8 Sampling Properties in Low-dimension
256(8)
5.8.1 Notation and regularity conditions
257(1)
5.8.2 The oracle property
258(2)
5.8.3 Sampling properties with diverging dimensions
260(2)
5.8.4 Asymptotic properties of GIC selectors
262(2)
5.9 Properties under Ultrahigh Dimensions
264(10)
5.9.1 The Lasso penalized estimator and its risk property
264(4)
5.9.2 Strong oracle property
268(5)
5.9.3 Numeric studies
273(1)
5.10 Risk Properties
274(4)
5.11 Bibliographical Notes
278(2)
5.12 Exercises
280(7)
6 Penalized M-estimators
287(34)
6.1 Penalized Quantile Regression
287(7)
6.1.1 Quantile regression
287(2)
6.1.2 Variable selection in quantile regression
289(2)
6.1.3 A fast algorithm for penalized quantile regression
291(3)
6.2 Penalized Composite Quantile Regression
294(3)
6.3 Variable Selection in Robust Regression
297(4)
6.3.1 Robust regression
297(2)
6.3.2 Variable selection in Huber regression
299(2)
6.4 Rank Regression and Its Variable Selection
301(2)
6.4.1 Rank regression
302(1)
6.4.2 Penalized weighted rank regression
302(1)
6.5 Variable Selection for Survival Data
303(5)
6.5.1 Partial likelihood
305(1)
6.5.2 Variable selection via penalized partial likelihood and its properties
306(2)
6.6 Theory of Folded-concave Penalized M-estimator
308(9)
6.6.1 Conditions on penalty and restricted strong convexity
309(1)
6.6.2 Statistical accuracy of penalized M-estimator with folded concave penalties
310(4)
6.6.3 Computational accuracy
314(3)
6.7 Bibliographical Notes
317(2)
6.8 Exercises
319(2)
7 High Dimensional Inference
321(60)
7.1 Inference in Linear Regression
322(8)
7.1.1 Debias of regularized regression estimators
323(2)
7.1.2 Choices of weights
325(2)
7.1.3 Inference for the noise level
327(3)
7.2 Inference in Generalized Linear Models
330(9)
7.2.1 Desparsified Lasso
331(1)
7.2.2 Decorrelated score estimator
332(3)
7.2.3 Test of linear hypotheses
335(2)
7.2.4 Numerical comparison
337(1)
7.2.5 An application
338(1)
7.3 Asymptotic Efficiency*
339(16)
7.3.1 Statistical efficiency and Fisher information
340(5)
7.3.2 Linear regression with random design
345(6)
7.3.3 Partial linear regression
351(4)
7.4 Gaussian Graphical Models
355(13)
7.4.1 Inference via penalized least squares
356(5)
7.4.2 Sample size in regression and graphical models
361(7)
7.5 General Solutions*
368(8)
7.5.1 Local semi-LD decomposition
368(2)
7.5.2 Data swap
370(4)
7.5.3 Gradient approximation
374(2)
7.6 Bibliographical Notes
376(1)
7.7 Exercises
377(4)
8 Feature Screening
381(50)
8.1 Correlation Screening
381(5)
8.1.1 Sure screening property
382(2)
8.1.2 Connection to multiple comparison
384(1)
8.1.3 Iterative SIS
385(1)
8.2 Generalized and Rank Correlation Screening
386(3)
8.3 Feature Screening for Parametric Models
389(6)
8.3.1 Generalized linear models
389(2)
8.3.2 A unified strategy for parametric feature screening
391(3)
8.3.3 Conditional sure independence screening
394(1)
8.4 Nonparametric Screening
395(6)
8.4.1 Additive models
395(1)
8.4.2 Varying coefficient models
396(4)
8.4.3 Heterogeneous nonparametric models
400(1)
8.5 Model-free Feature Screening
401(8)
8.5.1 Sure independent ranking screening procedure
401(2)
8.5.2 Feature screening via distance correlation
403(3)
8.5.3 Feature screening for high-dimensional categorial data
406(3)
8.6 Screening and Selection
409(8)
8.6.1 Feature screening via forward regression
409(1)
8.6.2 Sparse maximum likelihood estimate
410(2)
8.6.3 Feature screening via partial correlation
412(5)
8.7 Refitted Cross-Validation
417(6)
8.7.1 RCV algorithm
417(1)
8.7.2 RCV in linear models
418(2)
8.7.3 RCV in nonparametric regression
420(3)
8.8 An Illustration
423(3)
8.9 Bibliographical Notes
426(2)
8.10 Exercises
428(3)
9 Covariance Regularization and Graphical Models
431(40)
9.1 Basic Facts about Matrices
431(4)
9.2 Sparse Covariance Matrix Estimation
435(8)
9.2.1 Covariance regularization by thresholding and banding
435(3)
9.2.2 Asymptotic properties
438(3)
9.2.3 Nearest positive definite matrices
441(2)
9.3 Robust Covariance Inputs
443(3)
9.4 Sparse Precision Matrix and Graphical Models
446(10)
9.4.1 Gaussian graphical models
446(1)
9.4.2 Penalized likelihood and M-estimation
447(1)
9.4.3 Penalized least-squares
448(3)
9.4.4 CLIME and its adaptive version
451(5)
9.5 Latent Gaussian Graphical Models
456(4)
9.6 Technical Proofs
460(5)
9.6.1 Proof of Theorem 9.1
460(1)
9.6.2 Proof of Theorem 9.3
461(1)
9.6.3 Proof of Theorem 9.4
462(1)
9.6.4 Proof of Theorem 9.6
463(2)
9.7 Bibliographical Notes
465(1)
9.8 Exercises
466(5)
10 Covariance Learning and Factor Models
471(40)
10.1 Principal Component Analysis
471(3)
10.1.1 Introduction to PCA
471(2)
10.1.2 Power method
473(1)
10.2 Factor Models and Structured Covariance Learning
474(9)
10.2.1 Factor model and high-dimensional PCA
475(3)
10.2.2 Extracting latent factors and POET
478(2)
10.2.3 Methods for selecting number of factors
480(3)
10.3 Covariance and Precision Learning with Known Factors
483(5)
10.3.1 Factor model with observable factors
483(2)
10.3.2 Robust initial estimation of covariance matrix
485(3)
10.4 Augmented Factor Models and Projected PCA
488(3)
10.5 Asymptotic Properties
491(4)
10.5.1 Properties for estimating loading matrix
491(2)
10.5.2 Properties for estimating covariance matrices
493(1)
10.5.3 Properties for estimating realized latent factors
494(1)
10.5.4 Properties for estimating idiosyncratic components
495(1)
10.6 Technical Proofs
495(11)
10.6.1 Proof of Theorem 10.1
495(5)
10.6.2 Proof of Theorem 10.2
500(1)
10.6.3 Proof of Theorem 10.3
501(3)
10.6.4 Proof of Theorem 10.4
504(2)
10.7 Bibliographical Notes
506(1)
10.8 Exercises
507(4)
11 Applications of Factor Models and PCA
511(42)
11.1 Factor-adjusted Regularized Model Selection
511(7)
11.1.1 Importance of factor adjustments
512(1)
11.1.2 FarmSelect
513(1)
11.1.3 Application to forecasting bond risk premia
514(2)
11.1.4 Application to a neuroblastoma data
516(2)
11.1.5 Asymptotic theory for FarmSelect
518(1)
11.2 Factor-adjusted Robust Multiple Testing
518(10)
11.2.1 False discovery rate control
519(2)
11.2.2 Multiple testing under dependence measurements
521(2)
11.2.3 Power of factor adjustments
523(1)
11.2.4 FarmTest
524(2)
11.2.5 Application to neuroblastoma data
526(2)
11.3 Factor Augmented Regression Methods
528(4)
11.3.1 Principal component regression
528(2)
11.3.2 Augmented principal component regression
530(1)
11.3.3 Application to forecast bond risk premia
531(1)
11.4 Applications to Statistical Machine Learning
532(16)
11.4.1 Community detection
533(6)
11.4.2 Topic model
539(1)
11.4.3 Matrix completion
540(2)
11.4.4 Item ranking
542(3)
11.4.5 Gaussian mixture models
545(3)
11.5 Bibliographical Notes
548(2)
11.6 Exercises
550(3)
12 Supervised Learning
553(54)
12.1 Model-based Classifiers
553(6)
12.1.1 Linear and quadratic discriminant analysis
553(4)
12.1.2 Logistic regression
557(2)
12.2 Kernel Density Classifiers and Naive Bayes
559(4)
12.3 Nearest Neighbor Classifiers
563(2)
12.4 Classification Trees and Ensemble Classifiers
565(10)
12.4.1 Classification trees
565(2)
12.4.2 Bagging
567(2)
12.4.3 Random forests
569(2)
12.4.4 Boosting
571(4)
12.5 Support Vector Machines
575(6)
12.5.1 The standard support vector machine
575(3)
12.5.2 Generalizations of SVMs
578(3)
12.6 Sparse Classifiers via Penalized Empirical Loss
581(5)
12.6.1 The importance of sparsity under high-dimensionality
581(2)
12.6.2 Sparse support vector machines
583(1)
12.6.3 Sparse large margin classifiers
584(2)
12.7 Sparse Discriminant Analysis
586(11)
12.7.1 Nearest shrunken centroids classifier
588(1)
12.7.2 Features annealed independent rule
589(2)
12.7.3 Selection bias of sparse independence rules
591(1)
12.7.4 Regularized optimal affine discriminant
592(1)
12.7.5 Linear programming discriminant
593(1)
12.7.6 Direct sparse discriminant analysis
594(2)
12.7.7 Solution path equivalence between ROAD and DSDA
596(1)
12.8 Feature Augmention and Sparse Additive Classifiers
597(5)
12.8.1 Feature augmentation
597(2)
12.8.2 Penalized additive logistic regression
599(1)
12.8.3 Semiparametric sparse discriminant analysis
600(2)
12.9 Bibliographical Notes
602(1)
12.10 Exercises
602(5)
13 Unsupervised Learning
607(36)
13.1 Cluster Analysis
607(10)
13.1.1 K-means clustering
608(1)
13.1.2 Hierarchical clustering
609(2)
13.1.3 Model-based clustering
611(4)
13.1.4 Spectral clustering
615(2)
13.2 Data-driven Choices of the Number of Clusters
617(3)
13.3 Variable Selection in Clustering
620(7)
13.3.1 Sparse clustering
620(2)
13.3.2 Sparse model-based clustering
622(2)
13.3.3 Sparse mixture of experts model
624(3)
13.4 An Introduction to High Dimensional PCA
627(3)
13.4.1 Inconsistency of the regular PCA
627(1)
13.4.2 Consistency under sparse eigenvector model
628(2)
13.5 Sparse Principal Component Analysis
630(9)
13.5.1 Sparse PCA
630(3)
13.5.2 An iterative SVD thresholding approach
633(2)
13.5.3 A penalized matrix decomposition approach
635(1)
13.5.4 A semidefinite programming approach
636(1)
13.5.5 A generalized power method
637(2)
13.6 Bibliographical Notes
639(1)
13.7 Exercises
640(3)
14 An Introduction to Deep Learning
643(40)
14.1 Rise of Deep Learning
644(2)
14.2 Feed-forward Neural Networks
646(4)
14.2.1 Model setup
646(1)
14.2.2 Back-propagation in computational graphs
647(3)
14.3 Popular Models
650(9)
14.3.1 Convolutional neural networks
651(3)
14.3.2 Recurrent neural networks
654(1)
14.3.2.1 Vanilla RNNs
654(1)
14.3.2.2 GRUs and LSTM
655(1)
14.3.2.3 Multilayer RNNs
656(1)
14.3.3 Modules
657(2)
14.4 Deep Unsupervised Learning
659(6)
14.4.1 Autoencoders
659(3)
14.4.2 Generative adversarial networks
662(1)
14.4.2.1 Sampling view of GANs
662(1)
14.4.2.2 Minimum distance view of GANs
663(2)
14.5 Training deep neural nets
665(6)
14.5.1 Stochastic gradient descent
666(1)
14.5.1.1 Mini-batch SGD
666(1)
14.5.1.2 Momentum-based SGD
667(1)
14.5.1.3 SGD with adaptive learning rates
667(1)
14.5.2 Easing numerical instability
668(1)
14.5.2.1 ReLU activation function
668(1)
14.5.2.2 Skip connections
669(1)
14.5.2.3 Batch normalization
669(1)
14.5.3 Regularization techniques
670(1)
14.5.3.1 Weight decay
670(1)
14.5.3.2 Dropout
670(1)
14.5.3.3 Data augmentation
671(1)
14.6 Example: Image Classification
671(2)
14.7 Additional Examples using FensorFlow and R
673(7)
14.8 Bibliography Notes
680(3)
References 683(48)
Author Index 731(12)
Index 743
The authors are international authorities and leaders on the presented topics. All are fellows of the Institute of Mathematical Statistics and the American Statistical Association.

Jianqing Fan is Frederick L. Moore Professor, Princeton University. He is co-editing Journal of Business and Economics Statistics and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, and Journal of Econometrics and has been recognized by the 2000 COPSS Presidents' Award, AAAS Fellow, Guggenheim Fellow, Guy medal in silver, Noether Senior Scholar Award, and Academician of Academia Sinica.

Runze Li is Elberly family chair professor and AAAS fellow, Pennsylvania State University, and was co-editor of The Annals of Statistics.

Cun-Hui Zhang is distinguished professor, Rutgers University and was co-editor of Statistical Science.

Hui Zou is professor, University of Minnesota and was action editor of Journal of Machine Learning Research.