Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Feature Engineering for Machine Learning and Data Analytics

4.50/5 (4 hinnangut Goodreads-ist)

Edited by Guozhu Dong (Wright State University, Ohio, USA), Edited by Huan Liu (Arizona State University, Arizona, USA)

Formaat: 418 pages
Sari: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Ilmumisaeg: 14-Mar-2018
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781351721271

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 58,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 418 pages
Sari: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Ilmumisaeg: 14-Mar-2018
Kirjastus: CRC Press
Keel: eng
ISBN-13: 9781351721271

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the quality of the available features. Feature Engineering for Machine Learning and Data Analytics provides a comprehensive introduction to feature engineering, including feature generation, feature extraction, feature transformation, feature selection, and feature analysis and evaluation.

The book presents key concepts, methods, examples, and applications, as well as chapters on feature engineering for major data types such as texts, images, sequences, time series, graphs, streaming data, software engineering data, Twitter data, and social media data. It also contains generic feature generation approaches, as well as methods for generating tried-and-tested, hand-crafted, domain-specific features.

The first chapter defines the concepts of features and feature engineering, offers an overview of the book, and provides pointers to topics not covered in this book. The next six chapters are devoted to feature engineering, including feature generation for specific data types. The subsequent four chapters cover generic approaches for feature engineering, namely feature selection, feature transformation based feature engineering, deep learning based feature engineering, and pattern based feature generation and engineering. The last three chapters discuss feature engineering for social bot detection, software management, and Twitter-based applications respectively.

This book can be used as a reference for data analysts, big data scientists, data preprocessing workers, project managers, project developers, prediction modelers, professors, researchers, graduate students, and upper level undergraduate students. It can also be used as the primary text for courses on feature engineering, or as a supplement for courses on machine learning, data mining, and big data analytics.

Preface

Contributors

xvii

1 Preliminaries and Overview

(12)

Guozhu Dong

Huan Liu

1.1 Preliminaries

(3)

1.1.1 Features

(2)

1.1.2 Feature Engineering

(1)

1.1.3 Machine Learning and Data Analytic Tasks

(1)

1.2 Overview of the
Chapters

(3)

1.3 Beyond this Book

(8)

1.3.1 Feature Engineering for Specific Data Types

(1)

1.3.2 Feature Engineering on Non-Data-Specific Topics

(4)

I Feature Engineering for Various Data Types

(176)

2 Feature Engineering for Text Data

(40)

Chase Geigle

Qiaozhu Mei

ChengXiang Zhai

2.1 Introduction

(1)

2.2 Overview of Text Representation

(1)

2.3 Text as Strings

(1)

2.4 Sequence of Words Representation

(2)

2.5 Bag of Words Representation

(7)

2.5.1 Term Weighting

(5)

2.5.2 Beyond Single Words

(1)

2.6 Structural Representation of Text

(3)

2.6.1 Semantic Structure Features

(1)

2.7 Latent Semantic Representation

(6)

2.7.1 Latent Semantic Analysis

(2)

2.7.2 Probabilistic Latent Semantic Analysis

(2)

2.7.3 Latent Dirichlet Allocation

(2)

2.8 Explicit Semantic Representation

(1)

2.9 Embeddings for Text Representation

(5)

2.9.1 Matrix Factorization for Word Embeddings

(2)

2.9.2 Neural Networks for Word Embeddings

(1)

2.9.3 Document Representations from Word Embeddings

(1)

2.10 Context-Sensitive Text Representation

(3)

2.11 Summary

(10)

3 Feature Extraction and Learning for Visual Data

(32)

Parag S. Chandakkar

Ragav Venkatesan

Baoxin Li

3.1 Classical Visual Feature Representations

(9)

3.1.1 Color Features

(4)

3.1.2 Texture Features

(2)

3.1.3 Shape Features

(3)

3.2 Latent Feature Extraction

(5)

3.2.1 Principal Component Analysis

(1)

3.2.2 Kernel Principal Component Analysis

(1)

3.2.3 Multidimensional Scaling

(1)

3.2.4 Isomap

(1)

3.2.5 Laplacian Eigenmaps

(1)

3.3 Deep Image Features

(16)

3.3.1 Convolutional Neural Networks

(1)

3.3.1.1 The Dot-Product Layer

(1)

3.3.1.2 The Convolution Layer

(2)

3.3.2 CNN Architecture Design

(1)

3.3.3 Fine-Tuning Off-the-Shelf Neural Networks

(3)

3.3.4 Summary and Conclusions

(8)

4 Feature-Based Time-Series Analysis

(30)

Ben D. Fulcher

4.1 Introduction

(5)

4.1.1 The Time Series Data Type

(2)

4.1.2 Time-Series Characterization

(1)

4.1.3 Applications of Time-Series Analysis

(2)

4.2 Feature-Based Representations of Time Series

(3)

4.3 Global Features

(7)

4.3.1 Examples of Global Features

(3)

4.3.2 Massive Feature Vectors and Highly Comparative Time-Series Analysis

(4)

4.4 Subsequence Features

102

(4)

4.4.1 Interval Features

102

(1)

4.4.2 Shapelets

103

(2)

4.4.3 Pattern Dictionaries

105

(1)

4.5 Combining Time-Series Representations

106

(2)

4.6 Feature-Based Forecasting

108

(1)

4.7 Summary and Outlook

109

(8)

5 Feature Engineering for Data Streams

117

(28)

Yao Ma

Jiliang Tang

Charu Aggarwal

5.1 Introduction

118

(1)

5.2 Streaming Settings

119

(2)

5.3 Linear Methods for Streaming Feature Construction

121

(4)

5.3.1 Principal Component Analysis for Data Streams

121

(2)

5.3.2 Linear Discriminant Analysis for Data Streams

123

(2)

5.4 Non-Linear Methods for Streaming Feature Construction

125

(7)

5.4.1 Locally Linear Embedding for Data Streams

125

(1)

5.4.2 Kernel Learning for Data Streams

126

(2)

5.4.3 Neural Networks for Data Streams

128

(4)

5.4.4 Discussion

132

(1)

5.5 Feature Selection for Data Streams with Streaming Features

132

(3)

5.5.1 The Grafting Algorithm

133

(1)

5.5.2 The Alpha-Investing Algorithm

133

(1)

5.5.3 The Online Streaming Feature Selection Algorithm

134

(1)

5.5.4 Unsupervised Streaming Feature Selection in Social Media

135

(1)

5.6 Feature Selection for Data Streams with Streaming Instances

135

(1)

5.6.1 Online Feature Selection

136

(1)

5.6.2 Unsupervised Feature Selection on Data Streams

136

(1)

5.7 Discussions and Challenges

136

(10)

5.7.1 Stability

137

(1)

5.7.2 Number of Features

137

(1)

5.7.3 Heterogeneous Streaming Data

137

(8)

6 Feature Generation and Feature Engineering for Sequences

145

(22)

Guozhu Dong

Lei Duan

Jyrki Nummenmaa

Peng Zhang

6.1 Introduction

146

(2)

6.2 Basics on Sequence Data and Sequence Patterns

148

(1)

6.3 Approaches to Using Patterns in Sequence Features

149

(1)

6.4 Traditional Pattern-Based Sequence Features

150

(1)

6.5 Mined Sequence Patterns for Use in Sequence Features

151

(10)

6.5.1 Frequent Sequence Patterns

152

(2)

6.5.2 Closed Sequential Patterns

154

(1)

6.5.3 Gap Constraints for Sequence Patterns

155

(1)

6.5.4 Partial Order Patterns

156

(2)

6.5.5 Periodic Sequence Patterns

158

(1)

6.5.6 Distinguishing Sequence Patterns

158

(2)

6.5.7 Pattern Matching for Sequences

160

(1)

6.6 Factors for Selecting Sequence Patterns as Features

161

(1)

6.7 Sequence Features Not Defined by Patterns

161

(1)

6.8 Sequence Databases

162

(1)

6.9 Concluding Remarks

163

(4)

7 Feature Generation for Graphs and Networks

167

(22)

Yuan Yao

Hanghang Tong

Feng Xu

Jian Lu

7.1 Introduction

168

(1)

7.2 Feature Types

168

(1)

7.3 Feature Generation

169

(12)

7.3.1 Basic Models

170

(5)

7.3.2 Extensions

175

(4)

7.3.3 Summary

179

(2)

7.4 Feature Usages

181

(2)

7.4.1 Multi-Label Classification

181

(1)

7.4.2 Link Prediction

181

(1)

7.4.3 Anomaly Detection

182

(1)

7.4.4 Visualization

182

(1)

7.5 Conclusions and Future Directions

183

(5)

7.6 Glossary

188

(1)

II General Feature Engineering Techniques

189

(120)

8 Feature Selection and Evaluation

191

(30)

Yun Li

Tao Li

8.1 Introduction

191

(1)

8.2 Feature Selection Frameworks

192

(4)

8.2.1 Search-Based Feature Selection Framework

193

(1)

8.2.2 Correlation-Based Feature Selection Framework

194

(2)

8.3 Advanced Topics for Feature Selection

196

(15)

8.3.1 Stable Feature Selection

196

(3)

8.3.2 Sparsity-Based Feature Selection

199

(1)

8.3.3 Multi-Source Feature Selection

200

(3)

8.3.4 Distributed Feature Selection

203

(1)

8.3.5 Multi-View Feature Selection

204

(1)

8.3.6 Multi-Label Feature Selection

205

(1)

8.3.7 Online Feature Selection

206

(2)

8.3.8 Privacy-Preserving Feature Selection

208

(2)

8.3.9 Adversarial Feature Selection

210

(1)

8.4 Future Work and Conclusion

211

(10)

9 Automating Feature Engineering in Supervised Learning

221

(24)

Udayan Khurana

9.1 Introduction

222

(3)

9.1.1 Challenges in Performing Feature Engineering

224

(1)

9.2 Terminology and Problem Definition

225

(1)

9.3 A Few Simple Approaches

226

(1)

9.4 Hierarchical Exploration of Feature Transformations

227

(4)

9.4.1 Transformation Graph

228

(1)

9.4.2 Transformation Graph Exploration

229

(2)

9.5 Learning Optimal Traversal Policy

231

(4)

9.5.1 Feature Exploration through Reinforcement Learning

233

(2)

9.6 Finding Effective Features without Model Training

235

(4)

9.6.1 Learning to Predict Useful Transformations

237

(2)

9.7 Miscellaneous

239

(7)

9.7.1 Other Related Work

239

(1)

9.7.2 Research Opportunities

240

(1)

9.7.3 Resources

240

(5)

10 Pattern-Based Feature Generation

245

(34)

Yunzhe Jia

James Bailey

Ramamohanarao Kotagiri

Christopher Leckie

10.1 Introduction

246

(1)

10.2 Preliminaries

247

(4)

10.2.1 Data and Patterns

247

(1)

10.2.2 Patterns for Non-Transactional Data

248

(3)

10.3 Framework of Pattern-Based Feature Generation

251

(3)

10.3.1 Pattern Mining

251

(1)

10.3.2 Pattern Selection

252

(1)

10.3.3 Feature Generation

253

(1)

10.4 Pattern Mining Algorithms

254

(4)

10.4.1 Frequent Pattern Mining

254

(2)

10.4.2 Contrast Pattern Mining

256

(2)

10.5 Pattern, Selection Approaches

258

(4)

10.5.1 Past-Processing Pruning

258

(2)

10.5.2 In-processing Pruning

260

(2)

10.6 Pattern-Based Feature Generation

262

(4)

10.6.1 Unsupervised Mapping Functions

262

(1)

10.6.2 Supervised Mapping Functions

263

(2)

10.6.3 Feature Generation for Sequence Data and Graph Data

265

(1)

10.6.4 Comparison with Similar Techniques

265

(1)

10.7 Pattern-Based Feature Generation for Classification

266

(3)

10.7.1 Problem Statement

266

(1)

10.7.2 Direct Classification in the Pattern Space

267

(1)

10.7.3 Indirect Classification in the Pattern Space

268

(1)

10.7.4 Connection with Stacking Technique

269

(1)

10.8 Pattern-Based Feature Generation for Clustering

269

(2)

10.8.1 Clustering in the Pattern Space

269

(1)

10.8.2 Subspace Clustering

270

(1)

10.9 Conclusion

271

(8)

11 Deep Learning for Feature Representation

279

(30)

Suhang Wang

Huan Liu

11.1 Introduction

279

(1)

11.2 Restricted Boltzmann Machine

280

(4)

11.2.1 Deep Belief Networks and Deep Boltzmann Machine

281

(2)

11.2.2 RBM for Real-Valued Data

283

(1)

11.3 AutoEncoder

284

(4)

11.3.1 Sparse Autoencoder

286

(1)

11.3.2 Denoising Autoencoder

287

(1)

11.3.3 Stacked Autoencoder

287

(1)

11.4 Convolutional Neural Networks

288

(3)

11.4.1 Transfer Feature Learning of CNN

290

(1)

11.5 Word Embedding and Recurrent Neural Networks

291

(5)

11.5.1 Word Embedding

291

(3)

11.5.2 Recurrent Neural Networks

294

(1)

11.5.3 Gated Recurrent Unit

295

(1)

11.5.4 Long Short-Term Memory

296

(1)

11.6 Generative Adversarial Networks and Variational Autoencoder

296

(3)

11.6.1 Generative Adversarial Networks

297

(1)

11.6.2 Variational Autoencoder

298

(1)

11.7 Discussion and Further Readings

299

(10)

III Feature Engineering in Special Applications

309

(86)

12 Feature Engineering for Social Bot Detection

311

(24)

Onur Varol

Clayton A. Davis

Filippo Menczer

Alessandro Flammini

12.1 Introduction

312

(1)

12.2 Social Bot Detection

312

(2)

12.2.1 Holistic Approach

313

(1)

12.2.2 Pairwise Account Comparison

313

(1)

12.2.3 Egocentric Analysis

314

(1)

12.3 Online Bot Detection Framework

314

(11)

12.3.1 Feature Extraction

315

(1)

12.3.1.1 User-Based Features

316

(1)

12.3.1.2 Friend Features

316

(1)

12.3.1.3 Network Features

318

(1)

12.3.1.4 Content and Language Features

318

(1)

12.3.1.5 Sentiment Features

319

(1)

12.3.1.6 Temporal Features

320

(1)

12.3.2 Possible Directions for Feature Engineering

320

(1)

12.3.3 Feature Analysis

320

(3)

12.3.4 Feature Selection

323

(1)

12.3.4.1 Feature Classes

323

(1)

12.3.4.2 Top Individual Features

324

(1)

12.4 Conclusions

325

(9)

12.5 Glossary

334

(1)

13 Feature Generation and Engineering for Software Analytics

335

(24)

Xin Xia

David Lo

13.1 Introduction

336

(1)

13.2 Features for Defect Prediction

337

(6)

13.2.1 File-level Defect Prediction

337

(1)

13.2.1.1 Code Features

338

(1)

13.2.1.2 Process Features

340

(1)

13.2.2 Just-in-time Defect Prediction

341

(2)

13.2.3 Prediction Models and Results

343

(1)

13.3 Features for Crash Release Prediction for Apps

343

(5)

13.3.1 Complexity Dimension

344

(1)

13.3.2 Time Dimension

345

(1)

13.3.3 Code Dimension

346

(1)

13.3.4 Diffusion Dimension

346

(1)

13.3.5 Commit Dimension

347

(1)

13.3.6 Text Dimension

347

(1)

13.3.7 Prediction Models and Results

348

(1)

13.4 Features from Mining Monthly Reports to Predict Developer Turnover

348

(3)

13.4.1 Working Hours

349

(1)

13.4.2 Task Report

349

(1)

13.4.3 Project

350

(1)

13.4.4 Prediction Models and Results

351

(1)

13.5 Summary

351

(8)

14 Feature Engineering for Twitter-Based Applications

359

(36)

Sanjaya Wijeratne

Amit Sheth

Shreyansh Bhatt

Lakshika Balasuriya

Hussein S. Al-Olimat

Manas Gaur

Amir Hossein Yazdavar

Krishnaprasad Thirunarayan

14.1 Introduction

359

(2)

14.2 Data Present in a Tweet

361

(3)

14.2.1 Tweet Text-Related Data

362

(1)

14.2.2 Twitter User-Related Data

363

(1)

14.2.3 Other Metadata

364

(1)

14.3 Common Types of Features Used in Twitter-Based Applications

364

(6)

14.3.1 Textual Features

365

(3)

14.3.2 Image and Video Features

368

(1)

14.3.3 Twitter Metadata-Related Features

369

(1)

14.3.4 Network Features

370

(1)

14.4 Twitter Feature Engineering in Selected Twitter-Based Studies

370

(11)

14.4.1 Twitter User Profile Classification

371

(1)

14.4.2 Assisting Coordination during Crisis Events

372

(3)

14.4.3 Location Extraction from Tweets

375

(2)

14.4.4 Studying the Mental Health Conditions of Depressed Twitter Users

377

(2)

14.4.5 Sentiment and Emotion Analysis on Twitter

379

(2)

14.5 Twitris: A Real-Time Social Media Analysis Platform

381

(2)

14.6 Conclusion

383

(1)

14.7 Acknowledgment

384

(11)

Index

395

Dr. Guozhu Dong is a professor of Computer Science and Engineering at Wright State University. He obtained his Ph.D. in Computer Science from University of Southern California and his B.S. in Mathematics from Shandong University. Before joining Wright State University, he was a faculty member at Flinders University and then at the University of Melbourne. At Wright State University, he was recognized for Excellence in Research in the College of Engineering and Computer Science. His research interests are in data mining, machine learning, database, data science, and artificial intelligence. He co-authored a book on Sequence Data Mining and co-edited a book on Contrast Data Mining. He has served on numerous conference program committees.

Dr. Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore. At Arizona State University, he was recognized for excellence in teaching and research in Computer Science and Engineering and received the 2014 President's Award for Innovation. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating interdisciplinary problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms such as social media. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He is a co-author of Social Media Mining: An Introduction by Cambridge University Press. He serves on journal editorial boards and numerous conference program committees, and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction. He is an IEEE Fellow. More can be found at http://www.public.asu.edu/~huanliu.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97813517212712e.html

Märksõnad:

E-raamat: Feature Engineering for Machine Learning and Data Analytics

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv