Muutke küpsiste eelistusi

E-raamat: General Introduction to Data Analytics [Wiley Online]

  • Formaat: 352 pages
  • Ilmumisaeg: 24-Aug-2018
  • Kirjastus: Wiley-Interscience
  • ISBN-10: 1119296293
  • ISBN-13: 9781119296294
Teised raamatud teemal:
  • Wiley Online
  • Hind: 108,85 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Formaat: 352 pages
  • Ilmumisaeg: 24-Aug-2018
  • Kirjastus: Wiley-Interscience
  • ISBN-10: 1119296293
  • ISBN-13: 9781119296294
Teised raamatud teemal:

Describes the principles and methods of data analysis in an approach that can be understood by readers without specific knowledge of statistics or programming

This book teaches readers without specific knowledge of statistics or programming how to understand and use data analytics. The authors focus on explanation of intuition beyond the basic data analytics techniques. To do this, they employ easy to use tools to present and illustrate the examples. This book contains four parts. The first part motivates people for the necessity of analyzing data. The next part involves visualizing data and finding natural groups from data. Predicting the unknown is the subject of the next part, in which the authors discuss classification, regression, and advanced predictive methods. The last part discusses mining the web, and covers topics such as information retrieval, social network analysis, working with text, and recommender systems feedback. At the end of parts 2, 3, and 4 there is a project following the CRISP methodology that shows how to develop a project in the area of that part. The proposal is that the readers can develop their own project with their own dataset or with a dataset from a public repository. This book will be of interest to non-mathematicians, non-statisticians, and non-computer scientists interested in getting an introduction to data science.   

  • Explains the reasoning behind the given data mining techniques
  • Uses freely available software packages to show readers how to perform data analysis
  • Expands upon a unique illustrative example throughout all chapters
  • Contains exercises at the end of each chapter, and larger projects at the end of each part
  • Supplementary material includes presentation slides available to instructors

A General Introduction to Data Analytics is a text for upper level undergraduates or first year graduate students in areas that are using quantitative methods but outside mathematics and computer science.

Joao Moreira is a professor in the Department of Computer Engineering at the University of Porto, Porto, Portugal. He received his Ph.D. from University of Porto. Moreira is winner of the Best Paper Award at the 2014 International Conference on Advanced Data Mining and Applications, Guilin, China.

Andre Carvalho is a professor in the Department of Computer Science at the University of Sao Paulo, Brazil. He received his Ph.D. from the University of Kent at Canterbury, United Kingdom. Carvalho is one of the founding and first chief editors of the International Journal of Computational Intelligence and ApplicationsImperial College Press and World Scientific.

Tomas Horvath is an assistant professor at Pavol Jozef Safarik University in Kosice, Slovakia. He received his Ph.D. from the Institute of Computer Science in Pavol Jozef Safarik University.
Preface xiii
Acknowledgments xv
Presentational Conventions xvii
About the Companion Website xix
Part I Introductory Background
1(18)
1 What Can We Do With Data?
3(16)
1.1 Big Data and Data Science
4(1)
1.2 Big Data Architectures
5(1)
1.3 Small Data
6(1)
1.4 What is Data?
7(2)
1.5 A Short Taxonomy of Data Analytics
9(1)
1.6 Examples of Data Use
10(2)
1.6.1 Breast Cancer in Wisconsin
11(1)
1.6.2 Polish Company Insolvency Data
11(1)
1.7 A Project on Data Analytics
12(4)
1.7.1 A Little History on Methodologies for Data Analytics
12(2)
1.7.2 The KDD Process
14(1)
1.7.3 The CRISP-DM Methodology
15(1)
1.8 How this Book is Organized
16(2)
1.9 Who Should Read this Book
18(1)
Part II Getting Insights from Data
19(140)
2 Descriptive Statistics
21(28)
2.1 Scale Types
22(3)
2.2 Descriptive Univariate Analysis
25(15)
2.2.1 Univariate Frequencies
25(2)
2.2.2 Univariate Data Visualization
27(5)
2.2.3 Univariate Statistics
32(6)
2.2.4 Common Univariate Probability Distributions
38(2)
2.3 Descriptive Bivariate Analysis
40(7)
2.3.1 Two Quantitative Attributes
41(4)
2.3.2 Two Qualitative Attributes, at Least one of them Nominal
45(1)
2.3.3 Two Ordinal Attributes
46(1)
2.4 Final Remarks
47(1)
2.5 Exercises
47(2)
3 Descriptive Multivariate Analysis
49(22)
3.1 Multivariate Frequencies
49(1)
3.2 Multivariate Data Visualization
50(9)
3.3 Multivariate Statistics
59(7)
3.3.1 Location Multivariate Statistics
59(1)
3.3.2 Dispersion Multivariate Statistics
60(6)
3.4 Infographics and Word Clouds
66(1)
3.4.1 Infographics
66(1)
3.4.2 Word Clouds
67(1)
3.5 Final Remarks
67(1)
3.6 Exercises
68(3)
4 Data Quality and Preprocessing
71(28)
4.1 Data Quality
71(6)
4.1.1 Missing Values
72(2)
4.1.2 Redundant Data
74(1)
4.1.3 Inconsistent Data
75(1)
4.1.4 Noisy Data
76(1)
4.1.5 Outliers
77(1)
4.2 Converting to a Different Scale Type
77(6)
4.2.1 Converting Nominal to Relative
78(3)
4.2.2 Converting Ordinal to Relative or Absolute
81(1)
4.2.3 Converting Relative or Absolute to Ordinal or Nominal
82(1)
4.3 Converting to a Different Scale
83(2)
4.4 Data Transformation
85(1)
4.5 Dimensionality Reduction
86(10)
4.5.1 Attribute Aggregation
88(1)
4.5.1.1 Principal Component Analysis
88(3)
4.5.1.2 Independent Component Analysis
91(1)
4.5.1.3 Multidimensional Scaling
91(1)
4.5.2 Attribute Selection
92(1)
4.5.2.1 Filters
92(1)
4.5.2.2 Wrappers
93(1)
4.5.2.3 Embedded
94(1)
4.5.2.4 Search Strategies
95(1)
4.6 Final Remarks
96(1)
4.7 Exercises
96(3)
5 Clustering
99(26)
5.1 Distance Measures
100(7)
5.1.1 Differences between Values of Common Attribute Types
101(2)
5.1.2 Distance Measures for Objects with Quantitative Attributes
103(1)
5.1.3 Distance Measures for Non-conventional Attributes
104(3)
5.2 Clustering Validation
107(1)
5.3 Clustering Techniques
108(14)
5.3.1 K-means
110(1)
5.3.1.1 Centroids and Distance Measures
110(1)
5.3.1.2 How K-means Works
111(4)
5.3.2 DBSCAN
115(2)
5.3.3 Agglomerative Hierarchical Clustering Technique
117(2)
5.3.3.1 Linkage Criterion
119(1)
5.3.3.2 Dendrograms
120(2)
5.4 Final Remarks
122(1)
5.5 Exercises
123(2)
6 Frequent Pattern Mining
125(26)
6.1 Frequent Itemsets
127(12)
6.1.1 Setting the min_sup Threshold
128(3)
6.1.2 Apriori -- a Join-based Method
131(2)
6.1.3 Eclat
133(1)
6.1.4 FP-Growth
134(4)
6.1.5 Maximal and Closed Frequent Itemsets
138(1)
6.2 Association Rules
139(3)
6.3 Behind Support and Confidence
142(105)
6.3.1 Cross-support Patterns
143(1)
6.3.2 Lift
144(1)
6.3.3 Simpson's Paradox
145(102)
6.4 Other Types of Pattern
247
6.4.1 Sequential patterns
147(1)
6.4.2 Frequent Sequence Mining
148(1)
6.4.3 Closed and Maximal Sequences
148(1)
6.5 Final Remarks
149(1)
6.6 Exercises
149(2)
7 Cheat Sheet and Project on Descriptive Analytics
151(8)
7.1 Cheat Sheet of Descriptive Analytics
151(3)
7.1.1 On Data Summarization
151(1)
7.1.2 On Clustering
151(2)
7.1.3 On Frequent Pattern Mining
153(1)
7.2 Project on Descriptive Analytics
154(5)
7.2.1 Business Understanding
154(1)
7.2.2 Data Understanding
155(100)
7.2.3 Data Preparation
255
7.2.4 Modeling
157(1)
7.2.5 Evaluation
158(100)
7.2.6 Deployment
258
Part III Predicting the Unknown
159(108)
8 Regression
161(26)
8.1 Predictive Performance Estimation
164(7)
8.1.1 Generalization
164(1)
8.1.2 Model Validation
165(4)
8.1.3 Predictive Performance Measures for Regression
169(2)
8.2 Finding the Parameters of the Model
171(11)
8.2.1 Linear Regression
171(2)
8.2.1.1 Empirical Error
173(2)
8.2.2 The Bias-variance Trade-off
175(2)
8.2.3 Shrinkage Methods
177(2)
8.2.3.1 Ridge Regression
179(101)
8.2.3.2 Lasso Regression
280
8.2.4 Methods that use Linear Combinations of Attributes
181(1)
8.2.4.1 Principal Components Regression
181(1)
8.2.4.2 Partial Least Squares Regression
182(1)
8.3 Technique and Model Selection
182(1)
8.4 Final Remarks
183(1)
8.5 Exercises
184(3)
9 Classification
187(24)
9.1 Binary Classification
188(4)
9.2 Predictive Performance Measures for Classification
192(7)
9.3 Distance-based Learning Algorithms
199(4)
9.3.1 K-nearest Neighbor Algorithms
199(3)
9.3.2 Case-based Reasoning
202(1)
9.4 Probabilistic Classification Algorithms
203(5)
9.4.1 Logistic Regression Algorithm
205(2)
9.4.2 Naive Bayes Algorithm
207(1)
9.5 Final Remarks
208(12)
9.6 Exercises
220
10 Additional Predictive Methods
211(30)
10.1 Search-based Algorithms
211(10)
10.1.1 Decision Tree Induction Algorithms
212(5)
10.1.2 Decision Trees for Regression
217(1)
10.1.2.1 Model Trees
218(1)
10.1.2.2 Multivariate Adaptive Regression Splines
219(2)
10.2 Optimization-based Algorithms
221(17)
10.2.1 Artificial Neural Networks
222(2)
10.2.1.1 Backpropagation
224(6)
10.2.1.2 Deep Networks and Deep Learning Algorithms
230(3)
10.2.2 Support Vector Machines
233(4)
10.2.2.1 SVM for Regression
237(1)
10.3 Final Remarks
238(1)
10.4 Exercises
239(2)
11 Advanced Predictive Topics
241(18)
11.1 Ensemble Learning
241(5)
11.1.1 Bagging
243(1)
11.1.2 Random Forests
244(1)
11.1.3 AdaBoost
245(1)
11.2 Algorithm Bias
246(2)
11.3 Non-binary Classification Tasks
248(5)
11.3.1 One-class Classification
248(1)
11.3.2 Multi-class Classification
249(1)
11.3.3 Ranking Classification
250(1)
11.3.4 Multi-label Classification
251(1)
11.3.5 Hierarchical Classification
252(1)
11.4 Advanced Data Preparation Techniques for Prediction
253(2)
11.4.1 Imbalanced Data Classification
253(1)
11.4.2 For Incomplete Target Labeling
254(1)
11.4.2.1 Semi-supervised Learning
254(1)
11.4.2.2 Active Learning
255(1)
11.5 Description and Prediction with Supervised Interpretable Techniques
255(1)
11.6 Exercises
256(3)
12 Cheat Sheet and Project on Predictive Analytics
259(8)
12.1 Cheat Sheet on Predictive Analytics
259(1)
12.2 Project on Predictive Analytics
259(8)
12.2.1 Business Understanding
260(1)
12.2.2 Data Understanding
260(5)
12.2.3 Data Preparation
265(1)
12.2.4 Modeling
265(1)
12.2.5 Evaluation
265(1)
12.2.6 Deployment
266(1)
Part IV Popular Data Analytics Applications
267(36)
13 Applications for Text, Web and Social Media
269(34)
13.1 Working with Texts
269(9)
13.1.1 Data Acquisition
271(1)
13.1.2 Feature Extraction
271(1)
13.1.2.1 Tokenization
272(1)
13.1.2.2 Stemming
272(3)
13.1.2.3 Conversion to Structured Data
275(1)
13.1.2.4 Is the Bag of Words Enough?
276(1)
13.1.3 Remaining Phases
277(1)
13.1.4 Trends
277(1)
13.1.4.1 Sentiment Analysis
278(1)
13.1.4.2 Web Mining
278(1)
13.2 Recommender Systems
278(13)
13.2.1 Feedback
279(1)
13.2.2 Recommendation Tasks
280(1)
13.2.3 Recommendation Techniques
281(1)
13.2.3.1 Knowledge-based Techniques
281(1)
13.2.3.2 Content-based Techniques
282(1)
13.2.3.3 Collaborative Filtering Techniques
282(7)
13.2.4 Final Remarks
289(2)
13.3 Social Network Analysis
291(9)
13.3.1 Representing Social Networks
291(3)
13.3.2 Basic Properties of Nodes
294(1)
13.3.2.1 Degree
294(1)
13.3.2.2 Distance
294(1)
13.3.2.3 Closeness
295(1)
13.3.2.4 Betweenness
296(1)
13.3.2.5 Clustering Coefficient
297(1)
13.3.3 Basic and Structural Properties of Networks
297(1)
13.3.3.1 Diameter
297(1)
13.3.3.2 Centralization
297(2)
13.3.3.3 Cliques
299(1)
13.3.3.4 Clustering Coefficient
299(1)
13.3.3.5 Modularity
299(1)
13.3.4 Trends and Final Remarks
299(1)
13.4 Exercises
300(3)
Appendix A Comprehensive Description of the CRISP-DM Methodology 303(8)
References 311(4)
Index 315
João Mendes Moreira, PhD, is an assistant professor in the Faculty of Engineering at the University of Porto, Porto, Portugal and is also a researcher in LIAAD-INESC TEC, Porto, Portugal.

André de Carvalho, PhD, is a full professor in the Institute of Mathematics and Computer Science at the University of São Paulo, Brazil.

Tomá Horváth, PhD, is an assistant professor at the Faculty of Informatics of the Eötvös Loránd University in Budapest, Hungary, and is also associated with the Faculty of Science at the Pavol Jozef afárik University in Koice, Slovakia.