Tasuta saatmine! | Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: General Introduction to Data Analytics [Wiley Online]

Tomįs Horvath, Andre Carvalho, Joćo Moreira

Teised formaadid

Other digital carrier (Hind: 110,93 €) - 29-Jun-2018

Formaat: 352 pages
Ilmumisaeg: 24-Aug-2018
Kirjastus: Wiley-Interscience
ISBN-10: 1119296293
ISBN-13: 9781119296294

Teised raamatud teemal:

Data analysis: general

Wiley Online
Hind: 108,85 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 352 pages
Ilmumisaeg: 24-Aug-2018
Kirjastus: Wiley-Interscience
ISBN-10: 1119296293
ISBN-13: 9781119296294

Teised raamatud teemal:

Data analysis: general

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119296294

Describes the principles and methods of data analysis in an approach that can be understood by readers without specific knowledge of statistics or programming

This book teaches readers without specific knowledge of statistics or programming how to understand and use data analytics. The authors focus on explanation of intuition beyond the basic data analytics techniques. To do this, they employ easy to use tools to present and illustrate the examples. This book contains four parts. The first part motivates people for the necessity of analyzing data. The next part involves visualizing data and finding natural groups from data. Predicting the unknown is the subject of the next part, in which the authors discuss classification, regression, and advanced predictive methods. The last part discusses mining the web, and covers topics such as information retrieval, social network analysis, working with text, and recommender systems feedback. At the end of parts 2, 3, and 4 there is a project following the CRISP methodology that shows how to develop a project in the area of that part. The proposal is that the readers can develop their own project with their own dataset or with a dataset from a public repository. This book will be of interest to non-mathematicians, non-statisticians, and non-computer scientists interested in getting an introduction to data science.

Explains the reasoning behind the given data mining techniques
Uses freely available software packages to show readers how to perform data analysis
Expands upon a unique illustrative example throughout all chapters
Contains exercises at the end of each chapter, and larger projects at the end of each part
Supplementary material includes presentation slides available to instructors

A General Introduction to Data Analytics is a text for upper level undergraduates or first year graduate students in areas that are using quantitative methods but outside mathematics and computer science.

Joao Moreira is a professor in the Department of Computer Engineering at the University of Porto, Porto, Portugal. He received his Ph.D. from University of Porto. Moreira is winner of the Best Paper Award at the 2014 International Conference on Advanced Data Mining and Applications, Guilin, China.

Andre Carvalho is a professor in the Department of Computer Science at the University of Sao Paulo, Brazil. He received his Ph.D. from the University of Kent at Canterbury, United Kingdom. Carvalho is one of the founding and first chief editors of the International Journal of Computational Intelligence and Applications, Imperial College Press and World Scientific.

Tomas Horvath is an assistant professor at Pavol Jozef Safarik University in Kosice, Slovakia. He received his Ph.D. from the Institute of Computer Science in Pavol Jozef Safarik University.

Preface

xiii

Acknowledgments

Presentational Conventions

xvii

About the Companion Website

xix

Part I Introductory Background

(18)

1 What Can We Do With Data?

(16)

1.1 Big Data and Data Science

(1)

1.2 Big Data Architectures

(1)

1.3 Small Data

(1)

1.4 What is Data?

(2)

1.5 A Short Taxonomy of Data Analytics

(1)

1.6 Examples of Data Use

(2)

1.6.1 Breast Cancer in Wisconsin

(1)

1.6.2 Polish Company Insolvency Data

(1)

1.7 A Project on Data Analytics

(4)

1.7.1 A Little History on Methodologies for Data Analytics

(2)

1.7.2 The KDD Process

(1)

1.7.3 The CRISP-DM Methodology

(1)

1.8 How this Book is Organized

(2)

1.9 Who Should Read this Book

(1)

Part II Getting Insights from Data

(140)

2 Descriptive Statistics

(28)

2.1 Scale Types

(3)

2.2 Descriptive Univariate Analysis

(15)

2.2.1 Univariate Frequencies

(2)

2.2.2 Univariate Data Visualization

(5)

2.2.3 Univariate Statistics

(6)

2.2.4 Common Univariate Probability Distributions

(2)

2.3 Descriptive Bivariate Analysis

(7)

2.3.1 Two Quantitative Attributes

(4)

2.3.2 Two Qualitative Attributes, at Least one of them Nominal

(1)

2.3.3 Two Ordinal Attributes

(1)

2.4 Final Remarks

(1)

2.5 Exercises

(2)

3 Descriptive Multivariate Analysis

(22)

3.1 Multivariate Frequencies

(1)

3.2 Multivariate Data Visualization

(9)

3.3 Multivariate Statistics

(7)

3.3.1 Location Multivariate Statistics

(1)

3.3.2 Dispersion Multivariate Statistics

(6)

3.4 Infographics and Word Clouds

(1)

3.4.1 Infographics

(1)

3.4.2 Word Clouds

(1)

3.5 Final Remarks

(1)

3.6 Exercises

(3)

4 Data Quality and Preprocessing

(28)

4.1 Data Quality

(6)

4.1.1 Missing Values

(2)

4.1.2 Redundant Data

(1)

4.1.3 Inconsistent Data

(1)

4.1.4 Noisy Data

(1)

4.1.5 Outliers

(1)

4.2 Converting to a Different Scale Type

(6)

4.2.1 Converting Nominal to Relative

(3)

4.2.2 Converting Ordinal to Relative or Absolute

(1)

4.2.3 Converting Relative or Absolute to Ordinal or Nominal

(1)

4.3 Converting to a Different Scale

(2)

4.4 Data Transformation

(1)

4.5 Dimensionality Reduction

(10)

4.5.1 Attribute Aggregation

(1)

4.5.1.1 Principal Component Analysis

(3)

4.5.1.2 Independent Component Analysis

(1)

4.5.1.3 Multidimensional Scaling

(1)

4.5.2 Attribute Selection

(1)

4.5.2.1 Filters

(1)

4.5.2.2 Wrappers

(1)

4.5.2.3 Embedded

(1)

4.5.2.4 Search Strategies

(1)

4.6 Final Remarks

(1)

4.7 Exercises

(3)

5 Clustering

(26)

5.1 Distance Measures

100

(7)

5.1.1 Differences between Values of Common Attribute Types

101

(2)

5.1.2 Distance Measures for Objects with Quantitative Attributes

103

(1)

5.1.3 Distance Measures for Non-conventional Attributes

104

(3)

5.2 Clustering Validation

107

(1)

5.3 Clustering Techniques

108

(14)

5.3.1 K-means

110

(1)

5.3.1.1 Centroids and Distance Measures

110

(1)

5.3.1.2 How K-means Works

111

(4)

5.3.2 DBSCAN

115

(2)

5.3.3 Agglomerative Hierarchical Clustering Technique

117

(2)

5.3.3.1 Linkage Criterion

119

(1)

5.3.3.2 Dendrograms

120

(2)

5.4 Final Remarks

122

(1)

5.5 Exercises

123

(2)

6 Frequent Pattern Mining

125

(26)

6.1 Frequent Itemsets

127

(12)

6.1.1 Setting the min_sup Threshold

128

(3)

6.1.2 Apriori -- a Join-based Method

131

(2)

6.1.3 Eclat

133

(1)

6.1.4 FP-Growth

134

(4)

6.1.5 Maximal and Closed Frequent Itemsets

138

(1)

6.2 Association Rules

139

(3)

6.3 Behind Support and Confidence

142

(105)

6.3.1 Cross-support Patterns

143

(1)

6.3.2 Lift

144

(1)

6.3.3 Simpson's Paradox

145

(102)

6.4 Other Types of Pattern

247

6.4.1 Sequential patterns

147

(1)

6.4.2 Frequent Sequence Mining

148

(1)

6.4.3 Closed and Maximal Sequences

148

(1)

6.5 Final Remarks

149

(1)

6.6 Exercises

149

(2)

7 Cheat Sheet and Project on Descriptive Analytics

151

(8)

7.1 Cheat Sheet of Descriptive Analytics

151

(3)

7.1.1 On Data Summarization

151

(1)

7.1.2 On Clustering

151

(2)

7.1.3 On Frequent Pattern Mining

153

(1)

7.2 Project on Descriptive Analytics

154

(5)

7.2.1 Business Understanding

154

(1)

7.2.2 Data Understanding

155

(100)

7.2.3 Data Preparation

255

7.2.4 Modeling

157

(1)

7.2.5 Evaluation

158

(100)

7.2.6 Deployment

258

Part III Predicting the Unknown

159

(108)

8 Regression

161

(26)

8.1 Predictive Performance Estimation

164

(7)

8.1.1 Generalization

164

(1)

8.1.2 Model Validation

165

(4)

8.1.3 Predictive Performance Measures for Regression

169

(2)

8.2 Finding the Parameters of the Model

171

(11)

8.2.1 Linear Regression

171

(2)

8.2.1.1 Empirical Error

173

(2)

8.2.2 The Bias-variance Trade-off

175

(2)

8.2.3 Shrinkage Methods

177

(2)

8.2.3.1 Ridge Regression

179

(101)

8.2.3.2 Lasso Regression

280

8.2.4 Methods that use Linear Combinations of Attributes

181

(1)

8.2.4.1 Principal Components Regression

181

(1)

8.2.4.2 Partial Least Squares Regression

182

(1)

8.3 Technique and Model Selection

182

(1)

8.4 Final Remarks

183

(1)

8.5 Exercises

184

(3)

9 Classification

187

(24)

9.1 Binary Classification

188

(4)

9.2 Predictive Performance Measures for Classification

192

(7)

9.3 Distance-based Learning Algorithms

199

(4)

9.3.1 K-nearest Neighbor Algorithms

199

(3)

9.3.2 Case-based Reasoning

202

(1)

9.4 Probabilistic Classification Algorithms

203

(5)

9.4.1 Logistic Regression Algorithm

205

(2)

9.4.2 Naive Bayes Algorithm

207

(1)

9.5 Final Remarks

208

(12)

9.6 Exercises

220

10 Additional Predictive Methods

211

(30)

10.1 Search-based Algorithms

211

(10)

10.1.1 Decision Tree Induction Algorithms

212

(5)

10.1.2 Decision Trees for Regression

217

(1)

10.1.2.1 Model Trees

218

(1)

10.1.2.2 Multivariate Adaptive Regression Splines

219

(2)

10.2 Optimization-based Algorithms

221

(17)

10.2.1 Artificial Neural Networks

222

(2)

10.2.1.1 Backpropagation

224

(6)

10.2.1.2 Deep Networks and Deep Learning Algorithms

230

(3)

10.2.2 Support Vector Machines

233

(4)

10.2.2.1 SVM for Regression

237

(1)

10.3 Final Remarks

238

(1)

10.4 Exercises

239

(2)

11 Advanced Predictive Topics

241

(18)

11.1 Ensemble Learning

241

(5)

11.1.1 Bagging

243

(1)

11.1.2 Random Forests

244

(1)

11.1.3 AdaBoost

245

(1)

11.2 Algorithm Bias

246

(2)

11.3 Non-binary Classification Tasks

248

(5)

11.3.1 One-class Classification

248

(1)

11.3.2 Multi-class Classification

249

(1)

11.3.3 Ranking Classification

250

(1)

11.3.4 Multi-label Classification

251

(1)

11.3.5 Hierarchical Classification

252

(1)

11.4 Advanced Data Preparation Techniques for Prediction

253

(2)

11.4.1 Imbalanced Data Classification

253

(1)

11.4.2 For Incomplete Target Labeling

254

(1)

11.4.2.1 Semi-supervised Learning

254

(1)

11.4.2.2 Active Learning

255

(1)

11.5 Description and Prediction with Supervised Interpretable Techniques

255

(1)

11.6 Exercises

256

(3)

12 Cheat Sheet and Project on Predictive Analytics

259

(8)

12.1 Cheat Sheet on Predictive Analytics

259

(1)

12.2 Project on Predictive Analytics

259

(8)

12.2.1 Business Understanding

260

(1)

12.2.2 Data Understanding

260

(5)

12.2.3 Data Preparation

265

(1)

12.2.4 Modeling

265

(1)

12.2.5 Evaluation

265

(1)

12.2.6 Deployment

266

(1)

Part IV Popular Data Analytics Applications

267

(36)

13 Applications for Text, Web and Social Media

269

(34)

13.1 Working with Texts

269

(9)

13.1.1 Data Acquisition

271

(1)

13.1.2 Feature Extraction

271

(1)

13.1.2.1 Tokenization

272

(1)

13.1.2.2 Stemming

272

(3)

13.1.2.3 Conversion to Structured Data

275

(1)

13.1.2.4 Is the Bag of Words Enough?

276

(1)

13.1.3 Remaining Phases

277

(1)

13.1.4 Trends

277

(1)

13.1.4.1 Sentiment Analysis

278

(1)

13.1.4.2 Web Mining

278

(1)

13.2 Recommender Systems

278

(13)

13.2.1 Feedback

279

(1)

13.2.2 Recommendation Tasks

280

(1)

13.2.3 Recommendation Techniques

281

(1)

13.2.3.1 Knowledge-based Techniques

281

(1)

13.2.3.2 Content-based Techniques

282

(1)

13.2.3.3 Collaborative Filtering Techniques

282

(7)

13.2.4 Final Remarks

289

(2)

13.3 Social Network Analysis

291

(9)

13.3.1 Representing Social Networks

291

(3)

13.3.2 Basic Properties of Nodes

294

(1)

13.3.2.1 Degree

294

(1)

13.3.2.2 Distance

294

(1)

13.3.2.3 Closeness

295

(1)

13.3.2.4 Betweenness

296

(1)

13.3.2.5 Clustering Coefficient

297

(1)

13.3.3 Basic and Structural Properties of Networks

297

(1)

13.3.3.1 Diameter

297

(1)

13.3.3.2 Centralization

297

(2)

13.3.3.3 Cliques

299

(1)

13.3.3.4 Clustering Coefficient

299

(1)

13.3.3.5 Modularity

299

(1)

13.3.4 Trends and Final Remarks

299

(1)

13.4 Exercises

300

(3)

Appendix A Comprehensive Description of the CRISP-DM Methodology

303

(8)

References

311

(4)

Index

315

Joćo Mendes Moreira, PhD, is an assistant professor in the Faculty of Engineering at the University of Porto, Porto, Portugal and is also a researcher in LIAAD-INESC TEC, Porto, Portugal.

André de Carvalho, PhD, is a full professor in the Institute of Mathematics and Computer Science at the University of Sćo Paulo, Brazil.

Tomį Horvįth, PhD, is an assistant professor at the Faculty of Informatics of the Eötvös Lorįnd University in Budapest, Hungary, and is also associated with the Faculty of Science at the Pavol Jozef afįrik University in Koice, Slovakia.

Püsilink: https://www.kriso.ee/db/9781119296294_pe.html

Märksõnad:

E-raamat: General Introduction to Data Analytics [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv