Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Introduction to Machine Learning with Python: A Guide for Data Scientists [Pehme köide]

4.34/5 (1066 hinnangut Goodreads-ist)

Andreas C. Mueller

Formaat: Paperback / softback, 392 pages, kõrgus x laius x paksus: 245x145x20 mm
Ilmumisaeg: 15-Nov-2016
Kirjastus: O'Reilly Media
ISBN-10: 1449369413
ISBN-13: 9781449369415

Teised raamatud teemal:

Web programming

Pehme köide
Hind: 57,45 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 67,59 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 392 pages, kõrgus x laius x paksus: 245x145x20 mm
Ilmumisaeg: 15-Nov-2016
Kirjastus: O'Reilly Media
ISBN-10: 1449369413
ISBN-13: 9781449369415

Teised raamatud teemal:

Web programming

Püsilink: https://www.kriso.ee/db/9781449369415.html

Märksõnad:

Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination. You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book. With this book, you'll learn: Fundamental concepts and applications of machine learning Advantages and shortcomings of widely used machine learning algorithms How to represent data processed by machine learning, including which data aspects to focus on Advanced methods for model evaluation and parameter tuning The concept of pipelines for chaining models and encapsulating your workflow Methods for working with text data, including text-specific processing techniques Suggestions for improving your machine learning and data science skills

Preface

vii

1 Introduction

(24)

Why Machine Learning?

(4)

Problems Machine Learning Can Solve

(2)

Knowing Your Task and Knowing Your Data

(1)

Why Python?

(1)

Scikit-Learn

(2)

Installing scikit-learn

(1)

Essential Libraries and Tools

(5)

Jupyter Notebook

(1)

NumPy

(1)

SciPy

(1)

Matplotlib

(1)

Pandas

(1)

Mglearn

(1)

Python 2 Versus Python 3

(1)

Versions Used in this Book

(1)

A First Application: Classifying Iris Species

(10)

Meet the Data

(3)

Measuring Success: Training and Testing Data

(2)

First Things First: Look at Your Data

(1)

Building Your First Model: k-Nearest Neighbors

(2)

Making Predictions

(1)

Evaluating the Model

(1)

Summary and Outlook

(2)

2 Supervised Learning

(106)

Classification and Regression

(1)

Generalization, Overfitting, and Underfitting

(3)

Relation of Model Complexity to Dataset Size

(1)

Supervised Machine Learning Algorithms

(90)

Some Sample Datasets

(5)

k-Nearest Neighbors

(10)

Linear Models

(23)

Naive Bayes Classifiers

(2)

Decision Trees

(13)

Ensembles of Decision Trees

(9)

Kernelized Support Vector Machines

(12)

Neural Networks (Deep Learning)

104

(15)

Uncertainty Estimates from Classifiers

119

(8)

The Decision Function

120

(2)

Predicting Probabilities

122

(2)

Uncertainty in Multiclass Classification

124

(3)

Summary and Outlook

127

(4)

3 Unsupervised Learning and Preprocessing

131

(80)

Types of Unsupervised Learning

131

(1)

Challenges in Unsupervised Learning

132

(1)

Preprocessing and Scaling

132

(8)

Different Kinds of Preprocessing

133

(1)

Applying Data Transformations

134

(2)

Scaling Training and Test Data the Same Way

136

(2)

The Effect of Preprocessing on Supervised Learning

138

(2)

Dimensionality Reduction, Feature Extraction, and Manifold Learning

140

(28)

Principal Component Analysis (PCA)

140

(16)

Non-Negative Matrix Factorization (NMF)

156

(7)

Manifold Learning with t-SNE

163

(5)

Clustering

168

(40)

k-Means Clustering

168

(14)

Agglomerative Clustering

182

(5)

DBSCAN

187

(4)

Comparing and Evaluating Clustering Algorithms

191

(16)

Summary of Clustering Methods

207

(1)

Summary and Outlook

208

(3)

4 Representing Data and Engineering Features

211

(40)

Categorical Variables

212

(8)

One-Hot-Encoding (Dummy Variables)

213

(5)

Numbers Can Encode Categoricals

218

(2)

Binning, Discretization, Linear Models, and Trees

220

(4)

Interactions and Polynomials

224

(8)

Univariate Nonlinear Transformations

232

(4)

Automatic Feature Selection

236

(6)

Univariate Statistics

236

(2)

Model-Based Feature Selection

238

(2)

Iterative Feature Selection

240

(2)

Utilizing Expert Knowledge

242

(8)

Summary and Outlook

250

(1)

5 Model Evaluation and Improvement

251

(54)

Cross-Validation

252

(8)

Cross-Validation in scikit-learn

253

(1)

Benefits of Cross-Validation

254

(1)

Stratified k-Fold Cross-Validation and Other Strategies

254

(6)

Grid Search

260

(15)

Simple Grid Search

261

(1)

The Danger of Overfitting the Parameters and the Validation Set

261

(2)

Grid Search with Cross-Validation

263

(12)

Evaluation Metrics and Scoring

275

(27)

Keep the End Goal in Mind

275

(1)

Metrics for Binary Classification

276

(20)

Metrics for Multiclass Classification

296

(3)

Regression Metrics

299

(1)

Using Evaluation Metrics in Model Selection

300

(2)

Summary and Outlook

302

(3)

6 Algorithm Chains and Pipelines

305

(18)

Parameter Selection with Preprocessing

306

(2)

Building Pipelines

308

(1)

Using Pipelines in Grid Searches

309

(3)

The General Pipeline Interface

312

(5)

Convenient Pipeline Creation with make_pipeline

313

(1)

Accessing Step Attributes

314

(1)

Accessing Attributes in a Grid-Searched Pipeline

315

(2)

Grid-Searching Preprocessing Steps and Model Parameters

317

(2)

Grid-Searching Which Model To Use

319

(1)

Summary and Outlook

320

(3)

7 Working with Text Data

323

(34)

Types of Data Represented as Strings

323

(2)

Example Application: Sentiment Analysis of Movie Reviews

325

(2)

Representing Text Data as a Bag of Words

327

(7)

Applying Bag-of-Words to a Toy Dataset

329

(1)

Bag-of-Words for Movie Reviews

330

(4)

Stopwords

334

(2)

Rescaling the Data with tf-idf

336

(2)

Investigating Model Coefficients

338

(1)

Bag-of-Words with More Than One Word (n-Grams)

339

(5)

Advanced Tokenization, Stemming, and Lemmatization

344

(3)

Topic Modeling and Document Clustering

347

(8)

Latent Dirichlet Allocation

348

(7)

Summary and Outlook

355

(2)

8 Wrapping Up

357

(10)

Approaching a Machine Learning Problem

357

(1)

Humans in the Loop

358

(1)

From Prototype to Production

359

(1)

Testing Production Systems

359

(1)

Building Your Own Estimator

360

(1)

Where to Go from Here

361

(5)

Theory

361

(1)

Other Machine Learning Frameworks and Packages

362

(1)

Ranking, Recommender Systems, and Other Kinds of Learning

363

(1)

Probabilistic Modeling, Inference, and Probabilistic Programming

363

(1)

Neural Networks

364

(1)

Scaling to Larger Datasets

364

(1)

Honing Your Skills

365

(1)

Conclusion

366

(1)

Index

367

Andreas Muller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at the New York University. In the last four years, he has been maintainer and one of the core contributor of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms. Sarah is a data scientist who has spent a lot of time working in start-ups. She loves Python, machine learning, large quantities of data, and the tech world. She is an accomplished conference speaker, currently resides in New York City, and attended the University of Michigan for grad school.

Introduction to Machine Learning with Python: A Guide for Data Scientists [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv