Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Science and Machine Learning: Mathematical and Statistical Methods

4.60/5 (10 hinnangut Goodreads-ist)

Alice Y.C. Te (University of Wales Trinity Saint David, Hong Kong)

Formaat: 532 pages
Sari: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Ilmumisaeg: 20-Nov-2019
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781000730777

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 118,30 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 532 pages
Sari: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Ilmumisaeg: 20-Nov-2019
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781000730777

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

"This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" -Nicholas Hoell, University of Toronto

"This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche.
-Adam Loy, Carleton College

The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

Key Features:

Focuses on mathematical understanding.

Presentation is self-contained, accessible, and comprehensive.

Extensive list of exercises and worked-out examples.

Many concrete algorithms with Python code.

Full color throughout.

The Authors:

Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method—an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

Zdravko Botev
, PhD

, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

Thomas Taimre
, PhD

, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland. His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.

Arvustused

"The first impression when handling and opening this book at a random page is superb. A big format (A4) and heavy weight, because the paper quality is high, along with a spectacular style and large font, much colour and many plots, and blocks of python code enhanced in colour boxes. This makes the book attractive and easy to study...The book is a very well-designed data science course, with mathematical rigor in mind. Key concepts are highlighted in red in the margins, often with links to other parts of the book...This book will be excellent for those that want to build a strong mathematical foundation for their knowledge on the main machine learning techniques, and at the same time get python recipes on how to perform the analyses for worked examples." - Victor Moreno, ISCB News, December 2020 'The way the Python code was written follows the algorithm closely. This is very useful for readers who wish to understand the rationale and flow of the background knowledge. In each chapter, the authors recommend further readings for those who plan to learn advanced topics. Another useful part is that the Python implementation of different statistical learning algorithms is discussed throughout the book. At the end of each chapter, extensive exercises are designed. These exercises can help readers understand the content better. This book would be a good reference for readers who are already experienced with statistical analysis and are looking for theoretical background knowledge of the algorithms.'

-Yin-Ju Lai and Chuhsing Kate Hsiao, Biometrics, vol 77, issue 4, 2021

Preface

xiii

Notation

xvii

1 Importing, Summarizing, and Visualizing Data

(18)

1.1 Introduction

(2)

1.2 Structuring Features According to Type

(3)

1.3 Summary Tables

(1)

1.4 Summary Statistics

(1)

1.5 Visualizing Data

(7)

1.5.1 Plotting Qualitative Variables

(1)

1.5.2 Plotting Quantitative Variables

(3)

1.5.3 Data Visualization in a Bivariate Setting

(3)

Exercises

(4)

2 Statistical Learning

(48)

2.1 Introduction

(1)

2.2 Supervised and Unsupervised Learning

(3)

2.3 Training and Test Loss

(8)

2.4 Tradeoffs in Statistical Learning

(4)

2.5 Estimating Risk

(5)

2.5.1 In-Sample Risk

(2)

2.5.2 Cross-Validation

(3)

2.6 Modeling Data

(4)

2.7 Multivariate Normal Models

(2)

2.8 Normal Linear Models

(1)

2.9 Bayesian Learning

(11)

Exercises

(9)

3 Monte Carlo Methods

(54)

3.1 Introduction

(1)

3.2 Monte Carlo Sampling

(17)

3.2.1 Generating Random Numbers

(1)

3.2.2 Simulating Random Variables

(5)

3.2.3 Simulating Random Vectors and Processes

(2)

3.2.4 Resampling

(2)

3.2.5 Markov Chain Monte Carlo

(7)

3.3 Monte Carlo Estimation

(11)

3.3.1 Crude Monte Carlo

(3)

3.3.2 Bootstrap Method

(4)

3.3.3 Variance Reduction

(4)

3.4 Monte Carlo for Optimization

(17)

3.4.1 Simulated Annealing

(4)

3.4.2 Cross-Entropy Method

100

(3)

3.4.3 Splitting for Optimization

103

(2)

3.4.4 Noisy Optimization

105

(8)

Exercises

113

(8)

4 Unsupervised Learning

121

(46)

4.1 Introduction

121

(1)

4.2 Risk and Loss in Unsupervised Learning

122

(6)

4.3 Expectation-Maximization (EM) Algorithm

128

(3)

4.4 Empirical Distribution and Density Estimation

131

(4)

4.5 Clustering via Mixture Models

135

(7)

4.5.1 Mixture Models

135

(2)

4.5.2 EM Algorithm for Mixture Models

137

(5)

4.6 Clustering via Vector Quantization

142

(5)

4.6.1 tf-Means

144

(2)

4.6.2 Clustering via Continuous Multiextremal Optimization

146

(1)

4.7 Hierarchical Clustering

147

(6)

4.8 Principal Component Analysis (PCA)

153

(7)

4.8.1 Motivation: Principal Axes of an Ellipsoid

153

(2)

4.8.2 PCA and Singular Value Decomposition (SVD)

155

(5)

Exercises

160

(7)

5 Regression

167

(48)

5.1 Introduction

167

(2)

5.2 Linear Regression

169

(2)

5.3 Analysis via Linear Models

171

(11)

5.3.1 Parameter Estimation

171

(1)

5.3.2 Model Selection and Prediction

172

(1)

5.3.3 Cross-Validation and Predictive Residual Sum of Squares

173

(2)

5.3.4 In-Sample Risk and Akaike Information Criterion

175

(2)

5.3.5 Categorical Features

177

(3)

5.3.6 Nested Models

180

(1)

5.3.7 Coefficient of Determination

181

(1)

5.4 Inference for Normal Linear Models

182

(6)

5.4.1 Comparing Two Normal Linear Models

183

(3)

5.4.2 Confidence and Prediction Intervals

186

(2)

5.5 Nonlinear Regression Models

188

(3)

5.6 Linear Models in Python

191

(13)

5.6.1 Modeling

191

(2)

5.6.2 Analysis

193

(2)

5.6.3 Analysis of Variance (ANOVA)

195

(3)

5.6.4 Confidence and Prediction Intervals

198

(1)

5.6.5 Model Validation

198

(1)

5.6.6 Variable Selection

199

(5)

5.7 Generalized Linear Models

204

(3)

Exercises

207

(8)

6 Regularization and Kernel Methods

215

(36)

6.1 Introduction

215

(1)

6.2 Regularization

216

(6)

6.3 Reproducing Kernel Hilbert Spaces

222

(2)

6.4 Construction of Reproducing Kernels

224

(6)

6.4.1 Reproducing Kernels via Feature Mapping

224

(1)

6.4.2 Kernels from Characteristic Functions

225

(2)

6.4.3 Reproducing Kernels Using Orthonormal Features

227

(2)

6.4.4 Kernels from Kernels

229

(1)

6.5 Representer Theorem

230

(5)

6.6 Smoothing Cubic Splines

235

(3)

6.7 Gaussian Process Regression

238

(4)

6.8 Kernel PCA

242

(3)

Exercises

245

(6)

7 Classification

251

(36)

7.1 Introduction

251

(2)

7.2 Classification Metrics

253

(4)

7.3 Classification via Bayes' Rule

257

(2)

7.4 Linear and Quadratic Discriminant Analysis

259

(7)

7.5 Logistic Regression and Softmax Classification

266

(2)

7.6 k-Nearest Neighbors Classification

268

(1)

7.7 Support Vector Machine

269

(8)

7.8 Classification with Scikit-Learn

277

(2)

Exercises

279

(8)

8 Decision Trees and Ensemble Methods

287

(36)

8.1 Introduction

287

(2)

8.2 Top-Down Construction of Decision Trees

289

(9)

8.2.1 Regional Prediction Functions

290

(1)

8.2.2 Splitting Rules

291

(1)

8.2.3 Termination Criterion

292

(2)

8.2.4 Basic Implementation

294

(4)

8.3 Additional Considerations

298

(2)

8.3.1 Binary Versus Non-Binary Trees

298

(1)

8.3.2 Data Preprocessing

298

(1)

8.3.3 Alternative Splitting Rules

298

(1)

8.3.4 Categorical Variables

299

(1)

8.3.5 Missing Values

299

(1)

8.4 Controlling the Tree Shape

300

(5)

8.4.1 Cost-Complexity Pruning

303

(1)

8.4.2 Advantages and Limitations of Decision Trees

304

(1)

8.5 Bootstrap Aggregation

305

(4)

8.6 Random Forests

309

(4)

8.7 Boosting

313

(8)

Exercises

321

(2)

9 Deep Learning

323

(32)

9.1 Introduction

323

(3)

9.2 Feed-Forward Neural Networks

326

(4)

9.3 Back-Propagation

330

(4)

9.4 Methods for Training

334

(6)

9.4.1 Steepest Descent

334

(1)

9.4.2 Levenberg-Marquardt Method

335

(1)

9.4.3 Limited-Memory BFGS Method

336

(2)

9.4.4 Adaptive Gradient Methods

338

(2)

9.5 Examples in Python

340

(9)

9.5.1 Simple Polynomial Regression

340

(4)

9.5.2 Image Classification

344

(5)

Exercises

349

(6)

A Linear Algebra and Functional Analysis

355

(42)

A.1 Vector Spaces, Bases, and Matrices

355

(5)

A.2 Inner Product

360

(1)

A.3 Complex Vectors and Matrices

361

(1)

A.4 Orthogonal Projections

362

(1)

A.5 Eigenvalues and Eigenvectors

363

(5)

A.5.2 Left-and Right-Eigenvectors

364

(4)

A.6 Matrix Decompositions

368

(16)

A.6.2 (P)LU Decomposition

368

(2)

A.6.2 Woodbury Identity

370

(3)

A.6.3 Cholesky Decomposition

373

(2)

A.6.4 QR Decomposition and the Gram-Schmidt Procedure

375

(1)

A.6.5 Singular Value Decomposition

376

(3)

A.6.6 Solving Structured Matrix Equations

379

(5)

A.7 Functional Analysis

384

(6)

A.8 Fourier Transforms

390

(7)

A.8.2 Discrete Fourier Transform

392

(2)

A.8.2 Fast Fourier Transform

394

(3)

B Multivariate Differentiation and Optimization

397

(24)

B.1 Multivariate Differentiation

397

(5)

B.1.1 Taylor Expansion

400

(1)

B.1.2 Chain Rule

400

(2)

B.2 Optimization Theory

402

(6)

B.2.2 Convexity and Optimization

403

(3)

B.2.2 Lagrangian Method

406

(1)

B.2.3 Duality

407

(1)

B.3 Numerical Root-Finding and Minimization

408

(7)

B.3.2 Newton-Like Methods

409

(2)

B.3.2 Quasi-Newton Methods

411

(2)

B.3.3 Normal Approximation Method

413

(1)

B.3.4 Nonlinear Least Squares

414

(1)

B.4 Constrained Minimization via Penalty Functions

415

(6)

C Probability and Statistics

421

(42)

C.1 Random Experiments and Probability Spaces

421

(1)

C.2 Random Variables and Probability Distributions

422

(4)

C.3 Expectation

426

(1)

C.4 Joint Distributions

427

(1)

C.5 Conditioning and Independence

428

(3)

C.5.1 Conditional Probability

428

(1)

C.5.2 Independence

428

(1)

C.5.3 Expectation and Covariance

429

(1)

C.5.4 Conditional Density and Conditional Expectation

430

(1)

C.6 Functions of Random Variables

431

(3)

C.7 Multivariate Normal Distribution

434

(5)

C.8 Convergence of Random Variables

439

(6)

C.9 Law of Large Numbers and Central Limit Theorem

445

(6)

C.10 Markov Chains

451

(2)

C.11 Statistics

453

(1)

C.12 Estimation

454

(3)

C.12.1 Method of Moments

455

(1)

C.12.2 Maximum Likelihood Method

456

(1)

C.13 Confidence Intervals

457

(1)

C.14 Hypothesis Testing

458

(5)

D Python Primer

463

(32)

D.1 Getting Started

463

(2)

D.2 Python Objects

465

(1)

D.3 Types and Operators

466

(2)

D.4 Functions and Methods

468

(1)

D.5 Modules

469

(2)

D.6 Flow Control

471

(1)

D.7 Iteration

472

(1)

D.8 Classes

473

(2)

D.9 Files

475

(3)

D.10 NumPy

478

(5)

D.10.1 Creating and Shaping Arrays

478

(2)

D.10.2 Slicing

480

(1)

D.10.3 Array Operations

480

(2)

D.l0.4 Random Numbers

482

(1)

D.11 Matplotlib

483

(2)

D.11.1 Creating a Basic Plot

483

(2)

D.12 Pandas

485

(5)

D.12.1 Series and DataFrame

485

(2)

D.12.2 Manipulating Data Frames

487

(1)

D.12.3 Extracting Information

488

(2)

D.12.4 Plotting

490

(1)

D.13 Scikit-learn

490

(3)

D.13.1 Partitioning the Data

491

(1)

D.13.2 Standardization

491

(1)

D.13.3 Fitting and Prediction

492

(1)

D.13.4 Testing the Model

492

(1)

D.14 System Calls, URL Access, and Speed-Up

493

(2)

Bibliography

495

(8)

Index

503

Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy methodan adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

Zdravko Botev, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

Thomas Taimre, PhD, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland. His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97810007307772e.html

Märksõnad:

E-raamat: Data Science and Machine Learning: Mathematical and Statistical Methods

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv