Muutke küpsiste eelistusi

Introduction to Machine Learning with Python: A Guide for Data Scientists [Pehme köide]

4.34/5 (1066 hinnangut Goodreads-ist)
  • Formaat: Paperback / softback, 392 pages, kõrgus x laius x paksus: 245x145x20 mm
  • Ilmumisaeg: 15-Nov-2016
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1449369413
  • ISBN-13: 9781449369415
Teised raamatud teemal:
  • Pehme köide
  • Hind: 57,45 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 67,59 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 392 pages, kõrgus x laius x paksus: 245x145x20 mm
  • Ilmumisaeg: 15-Nov-2016
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1449369413
  • ISBN-13: 9781449369415
Teised raamatud teemal:
Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination. You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book. With this book, you'll learn: Fundamental concepts and applications of machine learning Advantages and shortcomings of widely used machine learning algorithms How to represent data processed by machine learning, including which data aspects to focus on Advanced methods for model evaluation and parameter tuning The concept of pipelines for chaining models and encapsulating your workflow Methods for working with text data, including text-specific processing techniques Suggestions for improving your machine learning and data science skills
Preface vii
1 Introduction
1(24)
Why Machine Learning?
1(4)
Problems Machine Learning Can Solve
2(2)
Knowing Your Task and Knowing Your Data
4(1)
Why Python?
5(1)
Scikit-Learn
5(2)
Installing scikit-learn
6(1)
Essential Libraries and Tools
7(5)
Jupyter Notebook
7(1)
NumPy
7(1)
SciPy
8(1)
Matplotlib
9(1)
Pandas
10(1)
Mglearn
11(1)
Python 2 Versus Python 3
12(1)
Versions Used in this Book
12(1)
A First Application: Classifying Iris Species
13(10)
Meet the Data
14(3)
Measuring Success: Training and Testing Data
17(2)
First Things First: Look at Your Data
19(1)
Building Your First Model: k-Nearest Neighbors
20(2)
Making Predictions
22(1)
Evaluating the Model
22(1)
Summary and Outlook
23(2)
2 Supervised Learning
25(106)
Classification and Regression
25(1)
Generalization, Overfitting, and Underfitting
26(3)
Relation of Model Complexity to Dataset Size
29(1)
Supervised Machine Learning Algorithms
29(90)
Some Sample Datasets
30(5)
k-Nearest Neighbors
35(10)
Linear Models
45(23)
Naive Bayes Classifiers
68(2)
Decision Trees
70(13)
Ensembles of Decision Trees
83(9)
Kernelized Support Vector Machines
92(12)
Neural Networks (Deep Learning)
104(15)
Uncertainty Estimates from Classifiers
119(8)
The Decision Function
120(2)
Predicting Probabilities
122(2)
Uncertainty in Multiclass Classification
124(3)
Summary and Outlook
127(4)
3 Unsupervised Learning and Preprocessing
131(80)
Types of Unsupervised Learning
131(1)
Challenges in Unsupervised Learning
132(1)
Preprocessing and Scaling
132(8)
Different Kinds of Preprocessing
133(1)
Applying Data Transformations
134(2)
Scaling Training and Test Data the Same Way
136(2)
The Effect of Preprocessing on Supervised Learning
138(2)
Dimensionality Reduction, Feature Extraction, and Manifold Learning
140(28)
Principal Component Analysis (PCA)
140(16)
Non-Negative Matrix Factorization (NMF)
156(7)
Manifold Learning with t-SNE
163(5)
Clustering
168(40)
k-Means Clustering
168(14)
Agglomerative Clustering
182(5)
DBSCAN
187(4)
Comparing and Evaluating Clustering Algorithms
191(16)
Summary of Clustering Methods
207(1)
Summary and Outlook
208(3)
4 Representing Data and Engineering Features
211(40)
Categorical Variables
212(8)
One-Hot-Encoding (Dummy Variables)
213(5)
Numbers Can Encode Categoricals
218(2)
Binning, Discretization, Linear Models, and Trees
220(4)
Interactions and Polynomials
224(8)
Univariate Nonlinear Transformations
232(4)
Automatic Feature Selection
236(6)
Univariate Statistics
236(2)
Model-Based Feature Selection
238(2)
Iterative Feature Selection
240(2)
Utilizing Expert Knowledge
242(8)
Summary and Outlook
250(1)
5 Model Evaluation and Improvement
251(54)
Cross-Validation
252(8)
Cross-Validation in scikit-learn
253(1)
Benefits of Cross-Validation
254(1)
Stratified k-Fold Cross-Validation and Other Strategies
254(6)
Grid Search
260(15)
Simple Grid Search
261(1)
The Danger of Overfitting the Parameters and the Validation Set
261(2)
Grid Search with Cross-Validation
263(12)
Evaluation Metrics and Scoring
275(27)
Keep the End Goal in Mind
275(1)
Metrics for Binary Classification
276(20)
Metrics for Multiclass Classification
296(3)
Regression Metrics
299(1)
Using Evaluation Metrics in Model Selection
300(2)
Summary and Outlook
302(3)
6 Algorithm Chains and Pipelines
305(18)
Parameter Selection with Preprocessing
306(2)
Building Pipelines
308(1)
Using Pipelines in Grid Searches
309(3)
The General Pipeline Interface
312(5)
Convenient Pipeline Creation with make_pipeline
313(1)
Accessing Step Attributes
314(1)
Accessing Attributes in a Grid-Searched Pipeline
315(2)
Grid-Searching Preprocessing Steps and Model Parameters
317(2)
Grid-Searching Which Model To Use
319(1)
Summary and Outlook
320(3)
7 Working with Text Data
323(34)
Types of Data Represented as Strings
323(2)
Example Application: Sentiment Analysis of Movie Reviews
325(2)
Representing Text Data as a Bag of Words
327(7)
Applying Bag-of-Words to a Toy Dataset
329(1)
Bag-of-Words for Movie Reviews
330(4)
Stopwords
334(2)
Rescaling the Data with tf-idf
336(2)
Investigating Model Coefficients
338(1)
Bag-of-Words with More Than One Word (n-Grams)
339(5)
Advanced Tokenization, Stemming, and Lemmatization
344(3)
Topic Modeling and Document Clustering
347(8)
Latent Dirichlet Allocation
348(7)
Summary and Outlook
355(2)
8 Wrapping Up
357(10)
Approaching a Machine Learning Problem
357(1)
Humans in the Loop
358(1)
From Prototype to Production
359(1)
Testing Production Systems
359(1)
Building Your Own Estimator
360(1)
Where to Go from Here
361(5)
Theory
361(1)
Other Machine Learning Frameworks and Packages
362(1)
Ranking, Recommender Systems, and Other Kinds of Learning
363(1)
Probabilistic Modeling, Inference, and Probabilistic Programming
363(1)
Neural Networks
364(1)
Scaling to Larger Datasets
364(1)
Honing Your Skills
365(1)
Conclusion
366(1)
Index 367
Andreas Muller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at the New York University. In the last four years, he has been maintainer and one of the core contributor of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms. Sarah is a data scientist who has spent a lot of time working in start-ups. She loves Python, machine learning, large quantities of data, and the tech world. She is an accomplished conference speaker, currently resides in New York City, and attended the University of Michigan for grad school.