Muutke küpsiste eelistusi

Art of Feature Engineering: Essentials for Machine Learning [Pehme köide]

  • Formaat: Paperback / softback, 284 pages, kõrgus x laius x paksus: 228x152x16 mm, kaal: 420 g, Worked examples or Exercises
  • Ilmumisaeg: 25-Jun-2020
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1108709389
  • ISBN-13: 9781108709385
  • Pehme köide
  • Hind: 46,80 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 62,40 €
  • Säästad 25%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 3-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 284 pages, kõrgus x laius x paksus: 228x152x16 mm, kaal: 420 g, Worked examples or Exercises
  • Ilmumisaeg: 25-Jun-2020
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1108709389
  • ISBN-13: 9781108709385
"When working with a data set, a machine learning engineer might train a model but find that the results are not as good as they need. To get better results, they can try to improve the model or collect more data, but there is another avenue: feature engineering. The feature engineering process can help improve results by modifying the data's features to better capture the nature of the problem. This process is partly an art and partly a palette of tricks and recipes. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques of feature engineering,the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks"--

This is a guide for data scientists who want to use feature engineering to improve the performance of their machine learning solutions. The book provides a unified view of the field, beginning with basic concepts and techniques, followed by a cross-domain approach to advanced topics, like texts and images, with hands-on case studies.

Arvustused

'Pablo Duboue is a true grandmaster of the art and science of feature engineering. His foundational contributions to the creation of IBM Watson were a critical component of its success. Now readers can benefit from his expertise. His book provides deep insights into to how to develop, assess, combine, and enhance machine learning features. Of particular interest to advanced practitioners is his discussion of feature engineering and deep learning; there is a pervasive myth in the industry that deep learning and big data have made feature engineering obsolete, but the book explains why that is often incorrect for real-world computing applications and explains the relationship between building effective features and deep neural network architectures. The book engages with countless other basic and advanced topics in the area of machine learning and feature engineering, making it a valuable resource for machine learning practitioners of all levels of experience.' J. William Murdock, IBM 'Feature engineering is the process of identifying, selecting and evaluating input variables to statistical and machine learning models for a given problem. Pablo Duboue's The Art of Feature Engineering introduces the process with rich detail from a practitioner's point of view, and adds new insights through four input data scenarios for the same prediction task. Highly recommended!' Nelson Correa, Andinum Inc. 'TAoFE is a comprehensive handbook - sure to be a hit with data science practitioners. With highly accessible and didactic explanations of complex concepts, the book represents the state-of-the-art, and shows in practical terms how it applies to a wide range of real-world case studies.' Gavin Brown, University of Manchester 'This book provides a large catalogue of feature manipulation techniques along with non-trivial examples to illustrate their applicability and impact on performance. It could be suitable as a textbook for an upper level undergrad or graduate text mining or multimodal data analysis class. Recent graduates starting in field data mining and text analysis will find this a useful text.' Wlodek Zadrozny, University of North Carolina

Muu info

A practical guide for data scientists who want to improve the performance of any machine learning solution with feature engineering.
Preface xi
PART ONE FUNDAMENTALS
1(136)
1 Introduction
3(31)
1.1 Feature Engineering
6(4)
1.2 Evaluation
10(5)
1.3 Cycles
15(5)
1.4 Analysis
20(5)
1.5 Other Processes
25(5)
1.6 Discussion
30(2)
1.7 Learning More
32(2)
2 Features, Combined: Normalization, Discretization and Outliers
34(25)
2.1 Normalizing Features
35(8)
2.2 Discretization and Binning
43(7)
2.3 Descriptive Features
50(4)
2.4 Dealing with Outliers
54(2)
2.5 Advanced Techniques
56(1)
2.6 Learning More
57(2)
3 Features, Expanded: Computable Features, Imputation and Kernels
59(20)
3.1 Computable Features
60(7)
3.2 Imputation
67(3)
3.3 Decomposing Complex Features
70(3)
3.4 Kernel-Induced Feature Expansion
73(5)
3.5 Learning More
78(1)
4 Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
79(33)
4.1 Feature Selection
80(14)
4.2 Regularization and Embedded Feature Selection
94(5)
4.3 Dimensionality Reduction
99(12)
4.4 Learning More
111(1)
5 Advanced Topics: Variable-Length Data and Automated Feature Engineering
112(25)
5.1 Variable-Length Feature Vectors
112(12)
5.2 Instance-Based Engineering
124(3)
5.3 Deep Learning and Feature Engineering
127(3)
5.4 Automated Feature Engineering
130(5)
5.5 Learning More
135(2)
PART TWO CASE STUDIES
137(109)
6 Graph Data
139(24)
6.1 WikiCities Dataset
142(2)
6.2 Exploratory Data Analysis (EDA)
144(6)
6.3 First Feature Set
150(8)
6.4 Second Feature Set
158(2)
6.5 Final Feature Sets
160(2)
6.6 Learning More
162(1)
7 Timestamped Data
163(23)
7.1 WikiCities: Historical Features
166(3)
7.2 Time Lagged Features
169(3)
7.3 Sliding Windows
172(1)
7.4 Third Featurization: EMA
173(1)
7.5 Historical Data as Data Expansion
174(2)
7.6 Time Series
176(7)
7.7 Learning More
183(3)
8 Textual Data
186(26)
8.1 WikiCities: Text
189(1)
8.2 Exploratory Data Analysis
190(4)
8.3 Numeric Tokens Only
194(2)
8.4 Bag-of-Words
196(4)
8.5 Stop Words and Morphological Features
200(3)
8.6 Features in Context
203(1)
8.7 Skip Bigrams and Feature Hashing
204(1)
8.8 Dimensionality Reduction and Embeddings
205(3)
8.9 Closing Remarks
208(3)
8.10 Learning More
211(1)
9 Image Data
212(21)
9.1 WikiCities: Satellite Images
215(1)
9.2 Exploratory Data Analysis
216(1)
9.3 Pixels as Features
217(5)
9.4 Automatic Dataset Expansion
222(1)
9.5 Descriptive Features: Histograms
223(2)
9.6 Local Feature Detectors: Corners
225(2)
9.7 Dimensionality Reduction: HOGs
227(1)
9.8 Closing Remarks
228(3)
9.9 Learning More
231(2)
10 Other Domains: Video, GIS and Preferences
233(13)
10.1 Video
234(5)
10.2 Geographical Features
239(3)
10.3 Preferences
242(4)
Bibliography 246(24)
Index 270
Pablo Duboue is Director of Textualization Software Ltd. and is passionate about improving society through technology. He has a Ph.D. in Computer Science from Columbia University and was part of the IBM Watson team that beat the Jeopardy! Champions in 2011. He splits his time between teaching machine learning, doing open research, contributing to free software projects, and consulting for start-ups. He has taught in three different countries and done joint research with more than fifty co-authors. Recent career highlights include a best paper award in the Canadian AI conference industrial track and consulting for a start-up acquired by Intel Corp.