Preface |
|
xv | |
Contributors |
|
xvii | |
1 Preliminaries and Overview |
|
1 | (12) |
|
|
|
|
1 | (3) |
|
|
1 | (2) |
|
1.1.2 Feature Engineering |
|
|
3 | (1) |
|
1.1.3 Machine Learning and Data Analytic Tasks |
|
|
3 | (1) |
|
1.2 Overview of the Chapters |
|
|
4 | (3) |
|
|
7 | (8) |
|
1.3.1 Feature Engineering for Specific Data Types |
|
|
8 | (1) |
|
1.3.2 Feature Engineering on Non-Data-Specific Topics |
|
|
9 | (4) |
I Feature Engineering for Various Data Types |
|
13 | (176) |
|
2 Feature Engineering for Text Data |
|
|
15 | (40) |
|
|
|
|
|
16 | (1) |
|
2.2 Overview of Text Representation |
|
|
17 | (1) |
|
|
18 | (1) |
|
2.4 Sequence of Words Representation |
|
|
19 | (2) |
|
2.5 Bag of Words Representation |
|
|
21 | (7) |
|
|
22 | (5) |
|
2.5.2 Beyond Single Words |
|
|
27 | (1) |
|
2.6 Structural Representation of Text |
|
|
28 | (3) |
|
2.6.1 Semantic Structure Features |
|
|
30 | (1) |
|
2.7 Latent Semantic Representation |
|
|
31 | (6) |
|
2.7.1 Latent Semantic Analysis |
|
|
31 | (2) |
|
2.7.2 Probabilistic Latent Semantic Analysis |
|
|
33 | (2) |
|
2.7.3 Latent Dirichlet Allocation |
|
|
35 | (2) |
|
2.8 Explicit Semantic Representation |
|
|
37 | (1) |
|
2.9 Embeddings for Text Representation |
|
|
37 | (5) |
|
2.9.1 Matrix Factorization for Word Embeddings |
|
|
38 | (2) |
|
2.9.2 Neural Networks for Word Embeddings |
|
|
40 | (1) |
|
2.9.3 Document Representations from Word Embeddings |
|
|
41 | (1) |
|
2.10 Context-Sensitive Text Representation |
|
|
42 | (3) |
|
|
45 | (10) |
|
3 Feature Extraction and Learning for Visual Data |
|
|
55 | (32) |
|
|
|
|
3.1 Classical Visual Feature Representations |
|
|
57 | (9) |
|
|
57 | (4) |
|
|
61 | (2) |
|
|
63 | (3) |
|
3.2 Latent Feature Extraction |
|
|
66 | (5) |
|
3.2.1 Principal Component Analysis |
|
|
67 | (1) |
|
3.2.2 Kernel Principal Component Analysis |
|
|
68 | (1) |
|
3.2.3 Multidimensional Scaling |
|
|
69 | (1) |
|
|
69 | (1) |
|
3.2.5 Laplacian Eigenmaps |
|
|
70 | (1) |
|
|
71 | (16) |
|
3.3.1 Convolutional Neural Networks |
|
|
72 | (1) |
|
3.3.1.1 The Dot-Product Layer |
|
|
72 | (1) |
|
3.3.1.2 The Convolution Layer |
|
|
73 | (2) |
|
3.3.2 CNN Architecture Design |
|
|
75 | (1) |
|
3.3.3 Fine-Tuning Off-the-Shelf Neural Networks |
|
|
76 | (3) |
|
3.3.4 Summary and Conclusions |
|
|
79 | (8) |
|
4 Feature-Based Time-Series Analysis |
|
|
87 | (30) |
|
|
|
87 | (5) |
|
4.1.1 The Time Series Data Type |
|
|
87 | (2) |
|
4.1.2 Time-Series Characterization |
|
|
89 | (1) |
|
4.1.3 Applications of Time-Series Analysis |
|
|
90 | (2) |
|
4.2 Feature-Based Representations of Time Series |
|
|
92 | (3) |
|
|
95 | (7) |
|
4.3.1 Examples of Global Features |
|
|
95 | (3) |
|
4.3.2 Massive Feature Vectors and Highly Comparative Time-Series Analysis |
|
|
98 | (4) |
|
|
102 | (4) |
|
|
102 | (1) |
|
|
103 | (2) |
|
4.4.3 Pattern Dictionaries |
|
|
105 | (1) |
|
4.5 Combining Time-Series Representations |
|
|
106 | (2) |
|
4.6 Feature-Based Forecasting |
|
|
108 | (1) |
|
|
109 | (8) |
|
5 Feature Engineering for Data Streams |
|
|
117 | (28) |
|
|
|
|
|
118 | (1) |
|
|
119 | (2) |
|
5.3 Linear Methods for Streaming Feature Construction |
|
|
121 | (4) |
|
5.3.1 Principal Component Analysis for Data Streams |
|
|
121 | (2) |
|
5.3.2 Linear Discriminant Analysis for Data Streams |
|
|
123 | (2) |
|
5.4 Non-Linear Methods for Streaming Feature Construction |
|
|
125 | (7) |
|
5.4.1 Locally Linear Embedding for Data Streams |
|
|
125 | (1) |
|
5.4.2 Kernel Learning for Data Streams |
|
|
126 | (2) |
|
5.4.3 Neural Networks for Data Streams |
|
|
128 | (4) |
|
|
132 | (1) |
|
5.5 Feature Selection for Data Streams with Streaming Features |
|
|
132 | (3) |
|
5.5.1 The Grafting Algorithm |
|
|
133 | (1) |
|
5.5.2 The Alpha-Investing Algorithm |
|
|
133 | (1) |
|
5.5.3 The Online Streaming Feature Selection Algorithm |
|
|
134 | (1) |
|
5.5.4 Unsupervised Streaming Feature Selection in Social Media |
|
|
135 | (1) |
|
5.6 Feature Selection for Data Streams with Streaming Instances |
|
|
135 | (1) |
|
5.6.1 Online Feature Selection |
|
|
136 | (1) |
|
5.6.2 Unsupervised Feature Selection on Data Streams |
|
|
136 | (1) |
|
5.7 Discussions and Challenges |
|
|
136 | (10) |
|
|
137 | (1) |
|
|
137 | (1) |
|
5.7.3 Heterogeneous Streaming Data |
|
|
137 | (8) |
|
6 Feature Generation and Feature Engineering for Sequences |
|
|
145 | (22) |
|
|
|
|
|
|
146 | (2) |
|
6.2 Basics on Sequence Data and Sequence Patterns |
|
|
148 | (1) |
|
6.3 Approaches to Using Patterns in Sequence Features |
|
|
149 | (1) |
|
6.4 Traditional Pattern-Based Sequence Features |
|
|
150 | (1) |
|
6.5 Mined Sequence Patterns for Use in Sequence Features |
|
|
151 | (10) |
|
6.5.1 Frequent Sequence Patterns |
|
|
152 | (2) |
|
6.5.2 Closed Sequential Patterns |
|
|
154 | (1) |
|
6.5.3 Gap Constraints for Sequence Patterns |
|
|
155 | (1) |
|
6.5.4 Partial Order Patterns |
|
|
156 | (2) |
|
6.5.5 Periodic Sequence Patterns |
|
|
158 | (1) |
|
6.5.6 Distinguishing Sequence Patterns |
|
|
158 | (2) |
|
6.5.7 Pattern Matching for Sequences |
|
|
160 | (1) |
|
6.6 Factors for Selecting Sequence Patterns as Features |
|
|
161 | (1) |
|
6.7 Sequence Features Not Defined by Patterns |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
163 | (4) |
|
7 Feature Generation for Graphs and Networks |
|
|
167 | (22) |
|
|
|
|
|
|
168 | (1) |
|
|
168 | (1) |
|
|
169 | (12) |
|
|
170 | (5) |
|
|
175 | (4) |
|
|
179 | (2) |
|
|
181 | (2) |
|
7.4.1 Multi-Label Classification |
|
|
181 | (1) |
|
|
181 | (1) |
|
|
182 | (1) |
|
|
182 | (1) |
|
7.5 Conclusions and Future Directions |
|
|
183 | (5) |
|
|
188 | (1) |
II General Feature Engineering Techniques |
|
189 | (120) |
|
8 Feature Selection and Evaluation |
|
|
191 | (30) |
|
|
|
|
191 | (1) |
|
8.2 Feature Selection Frameworks |
|
|
192 | (4) |
|
8.2.1 Search-Based Feature Selection Framework |
|
|
193 | (1) |
|
8.2.2 Correlation-Based Feature Selection Framework |
|
|
194 | (2) |
|
8.3 Advanced Topics for Feature Selection |
|
|
196 | (15) |
|
8.3.1 Stable Feature Selection |
|
|
196 | (3) |
|
8.3.2 Sparsity-Based Feature Selection |
|
|
199 | (1) |
|
8.3.3 Multi-Source Feature Selection |
|
|
200 | (3) |
|
8.3.4 Distributed Feature Selection |
|
|
203 | (1) |
|
8.3.5 Multi-View Feature Selection |
|
|
204 | (1) |
|
8.3.6 Multi-Label Feature Selection |
|
|
205 | (1) |
|
8.3.7 Online Feature Selection |
|
|
206 | (2) |
|
8.3.8 Privacy-Preserving Feature Selection |
|
|
208 | (2) |
|
8.3.9 Adversarial Feature Selection |
|
|
210 | (1) |
|
8.4 Future Work and Conclusion |
|
|
211 | (10) |
|
9 Automating Feature Engineering in Supervised Learning |
|
|
221 | (24) |
|
|
|
222 | (3) |
|
9.1.1 Challenges in Performing Feature Engineering |
|
|
224 | (1) |
|
9.2 Terminology and Problem Definition |
|
|
225 | (1) |
|
9.3 A Few Simple Approaches |
|
|
226 | (1) |
|
9.4 Hierarchical Exploration of Feature Transformations |
|
|
227 | (4) |
|
9.4.1 Transformation Graph |
|
|
228 | (1) |
|
9.4.2 Transformation Graph Exploration |
|
|
229 | (2) |
|
9.5 Learning Optimal Traversal Policy |
|
|
231 | (4) |
|
9.5.1 Feature Exploration through Reinforcement Learning |
|
|
233 | (2) |
|
9.6 Finding Effective Features without Model Training |
|
|
235 | (4) |
|
9.6.1 Learning to Predict Useful Transformations |
|
|
237 | (2) |
|
|
239 | (7) |
|
|
239 | (1) |
|
9.7.2 Research Opportunities |
|
|
240 | (1) |
|
|
240 | (5) |
|
10 Pattern-Based Feature Generation |
|
|
245 | (34) |
|
|
|
|
|
|
246 | (1) |
|
|
247 | (4) |
|
|
247 | (1) |
|
10.2.2 Patterns for Non-Transactional Data |
|
|
248 | (3) |
|
10.3 Framework of Pattern-Based Feature Generation |
|
|
251 | (3) |
|
|
251 | (1) |
|
|
252 | (1) |
|
10.3.3 Feature Generation |
|
|
253 | (1) |
|
10.4 Pattern Mining Algorithms |
|
|
254 | (4) |
|
10.4.1 Frequent Pattern Mining |
|
|
254 | (2) |
|
10.4.2 Contrast Pattern Mining |
|
|
256 | (2) |
|
10.5 Pattern, Selection Approaches |
|
|
258 | (4) |
|
10.5.1 Past-Processing Pruning |
|
|
258 | (2) |
|
10.5.2 In-processing Pruning |
|
|
260 | (2) |
|
10.6 Pattern-Based Feature Generation |
|
|
262 | (4) |
|
10.6.1 Unsupervised Mapping Functions |
|
|
262 | (1) |
|
10.6.2 Supervised Mapping Functions |
|
|
263 | (2) |
|
10.6.3 Feature Generation for Sequence Data and Graph Data |
|
|
265 | (1) |
|
10.6.4 Comparison with Similar Techniques |
|
|
265 | (1) |
|
10.7 Pattern-Based Feature Generation for Classification |
|
|
266 | (3) |
|
|
266 | (1) |
|
10.7.2 Direct Classification in the Pattern Space |
|
|
267 | (1) |
|
10.7.3 Indirect Classification in the Pattern Space |
|
|
268 | (1) |
|
10.7.4 Connection with Stacking Technique |
|
|
269 | (1) |
|
10.8 Pattern-Based Feature Generation for Clustering |
|
|
269 | (2) |
|
10.8.1 Clustering in the Pattern Space |
|
|
269 | (1) |
|
10.8.2 Subspace Clustering |
|
|
270 | (1) |
|
|
271 | (8) |
|
11 Deep Learning for Feature Representation |
|
|
279 | (30) |
|
|
|
|
279 | (1) |
|
11.2 Restricted Boltzmann Machine |
|
|
280 | (4) |
|
11.2.1 Deep Belief Networks and Deep Boltzmann Machine |
|
|
281 | (2) |
|
11.2.2 RBM for Real-Valued Data |
|
|
283 | (1) |
|
|
284 | (4) |
|
11.3.1 Sparse Autoencoder |
|
|
286 | (1) |
|
11.3.2 Denoising Autoencoder |
|
|
287 | (1) |
|
11.3.3 Stacked Autoencoder |
|
|
287 | (1) |
|
11.4 Convolutional Neural Networks |
|
|
288 | (3) |
|
11.4.1 Transfer Feature Learning of CNN |
|
|
290 | (1) |
|
11.5 Word Embedding and Recurrent Neural Networks |
|
|
291 | (5) |
|
|
291 | (3) |
|
11.5.2 Recurrent Neural Networks |
|
|
294 | (1) |
|
11.5.3 Gated Recurrent Unit |
|
|
295 | (1) |
|
11.5.4 Long Short-Term Memory |
|
|
296 | (1) |
|
11.6 Generative Adversarial Networks and Variational Autoencoder |
|
|
296 | (3) |
|
11.6.1 Generative Adversarial Networks |
|
|
297 | (1) |
|
11.6.2 Variational Autoencoder |
|
|
298 | (1) |
|
11.7 Discussion and Further Readings |
|
|
299 | (10) |
III Feature Engineering in Special Applications |
|
309 | (86) |
|
12 Feature Engineering for Social Bot Detection |
|
|
311 | (24) |
|
|
|
|
|
|
312 | (1) |
|
12.2 Social Bot Detection |
|
|
312 | (2) |
|
|
313 | (1) |
|
12.2.2 Pairwise Account Comparison |
|
|
313 | (1) |
|
12.2.3 Egocentric Analysis |
|
|
314 | (1) |
|
12.3 Online Bot Detection Framework |
|
|
314 | (11) |
|
12.3.1 Feature Extraction |
|
|
315 | (1) |
|
12.3.1.1 User-Based Features |
|
|
316 | (1) |
|
|
316 | (1) |
|
12.3.1.3 Network Features |
|
|
318 | (1) |
|
12.3.1.4 Content and Language Features |
|
|
318 | (1) |
|
12.3.1.5 Sentiment Features |
|
|
319 | (1) |
|
12.3.1.6 Temporal Features |
|
|
320 | (1) |
|
12.3.2 Possible Directions for Feature Engineering |
|
|
320 | (1) |
|
|
320 | (3) |
|
|
323 | (1) |
|
|
323 | (1) |
|
12.3.4.2 Top Individual Features |
|
|
324 | (1) |
|
|
325 | (9) |
|
|
334 | (1) |
|
13 Feature Generation and Engineering for Software Analytics |
|
|
335 | (24) |
|
|
|
|
336 | (1) |
|
13.2 Features for Defect Prediction |
|
|
337 | (6) |
|
13.2.1 File-level Defect Prediction |
|
|
337 | (1) |
|
|
338 | (1) |
|
13.2.1.2 Process Features |
|
|
340 | (1) |
|
13.2.2 Just-in-time Defect Prediction |
|
|
341 | (2) |
|
13.2.3 Prediction Models and Results |
|
|
343 | (1) |
|
13.3 Features for Crash Release Prediction for Apps |
|
|
343 | (5) |
|
13.3.1 Complexity Dimension |
|
|
344 | (1) |
|
|
345 | (1) |
|
|
346 | (1) |
|
13.3.4 Diffusion Dimension |
|
|
346 | (1) |
|
|
347 | (1) |
|
|
347 | (1) |
|
13.3.7 Prediction Models and Results |
|
|
348 | (1) |
|
13.4 Features from Mining Monthly Reports to Predict Developer Turnover |
|
|
348 | (3) |
|
|
349 | (1) |
|
|
349 | (1) |
|
|
350 | (1) |
|
13.4.4 Prediction Models and Results |
|
|
351 | (1) |
|
|
351 | (8) |
|
14 Feature Engineering for Twitter-Based Applications |
|
|
359 | (36) |
|
|
|
|
|
|
|
|
Krishnaprasad Thirunarayan |
|
|
|
359 | (2) |
|
14.2 Data Present in a Tweet |
|
|
361 | (3) |
|
14.2.1 Tweet Text-Related Data |
|
|
362 | (1) |
|
14.2.2 Twitter User-Related Data |
|
|
363 | (1) |
|
|
364 | (1) |
|
14.3 Common Types of Features Used in Twitter-Based Applications |
|
|
364 | (6) |
|
|
365 | (3) |
|
14.3.2 Image and Video Features |
|
|
368 | (1) |
|
14.3.3 Twitter Metadata-Related Features |
|
|
369 | (1) |
|
|
370 | (1) |
|
14.4 Twitter Feature Engineering in Selected Twitter-Based Studies |
|
|
370 | (11) |
|
14.4.1 Twitter User Profile Classification |
|
|
371 | (1) |
|
14.4.2 Assisting Coordination during Crisis Events |
|
|
372 | (3) |
|
14.4.3 Location Extraction from Tweets |
|
|
375 | (2) |
|
14.4.4 Studying the Mental Health Conditions of Depressed Twitter Users |
|
|
377 | (2) |
|
14.4.5 Sentiment and Emotion Analysis on Twitter |
|
|
379 | (2) |
|
14.5 Twitris: A Real-Time Social Media Analysis Platform |
|
|
381 | (2) |
|
|
383 | (1) |
|
|
384 | (11) |
Index |
|
395 | |