Glossary of terms |
|
xii | |
|
Part I Data Mining Concept |
|
|
1 | (30) |
|
|
3 | (11) |
|
|
3 | (2) |
|
|
5 | (3) |
|
|
6 | (1) |
|
|
7 | (1) |
|
1.2.3 Associated Concepts |
|
|
7 | (1) |
|
|
8 | (1) |
|
1.4 Example Datasets Used in This Book |
|
|
8 | (3) |
|
|
11 | (2) |
|
1.6 Further Reading and Resources |
|
|
13 | (1) |
|
|
14 | (17) |
|
2.1 Types of Data Mining Questions |
|
|
15 | (4) |
|
2.1.1 Population and Sample |
|
|
15 | (1) |
|
|
16 | (1) |
|
2.1.3 Supervised and Unsupervised Methods |
|
|
16 | (2) |
|
2.1.4 Knowledge-Discovery Techniques |
|
|
18 | (1) |
|
|
19 | (1) |
|
2.3 Business Task: Clarification of the Business Question behind the Problem |
|
|
20 | (1) |
|
2.4 Data: Provision and Processing of the Required Data |
|
|
21 | (4) |
|
2.4.1 Fixing the Analysis Period |
|
|
22 | (1) |
|
2.4.2 Basic Unit of Interest |
|
|
23 | (1) |
|
|
24 | (1) |
|
2.4.4 Input Variables/Explanatory Variables |
|
|
24 | (1) |
|
2.5 Modelling: Analysis of the Data |
|
|
25 | (1) |
|
2.6 Evaluation and Validation during the Analysis Stage |
|
|
25 | (3) |
|
2.7 Application of Data Mining Results and Learning from the Experience |
|
|
28 | (3) |
|
Part II Data Mining Practicalities |
|
|
31 | (142) |
|
|
33 | (27) |
|
|
34 | (7) |
|
3.1.1 Data, Information, Knowledge and Wisdom |
|
|
35 | (1) |
|
3.1.2 Sources and Quality of Data |
|
|
36 | (1) |
|
3.1.3 Measurement Level and Types of Data |
|
|
37 | (2) |
|
3.1.4 Measures of Magnitude and Dispersion |
|
|
39 | (2) |
|
|
41 | (1) |
|
3.2 Data Partition: Random Samples for Training, Testing and Validation |
|
|
41 | (3) |
|
3.3 Types of Business Information Systems |
|
|
44 | (3) |
|
3.3.1 Operational Systems Supporting Business Processes |
|
|
44 | (1) |
|
3.3.2 Analysis-Based Information Systems |
|
|
45 | (1) |
|
3.3.3 Importance of Information |
|
|
45 | (2) |
|
|
47 | (3) |
|
|
47 | (1) |
|
3.4.2 Logical Integration and Homogenisation |
|
|
48 | (1) |
|
|
48 | (1) |
|
|
48 | (1) |
|
3.4.5 Using the Data Warehouse |
|
|
49 | (1) |
|
3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS |
|
|
50 | (2) |
|
3.5.1 Database Management System (DBMS) |
|
|
51 | (1) |
|
|
51 | (1) |
|
3.5.3 Database Communication Systems (DBCS) |
|
|
51 | (1) |
|
|
52 | (2) |
|
3.6.1 Regularly Filled Data Marts |
|
|
53 | (1) |
|
3.6.2 Comparison between Data Marts and Data Warehouses |
|
|
53 | (1) |
|
3.7 A Typical Example from the Online Marketing Area |
|
|
54 | (1) |
|
|
54 | (4) |
|
3.8.1 Permanent Data Marts |
|
|
54 | (2) |
|
3.8.2 Data Marts Resulting from Complex Analysis |
|
|
56 | (2) |
|
3.9 Data Mart: Do's and Don'ts |
|
|
58 | (2) |
|
3.9.1 Do's and Don'ts for Processes |
|
|
58 | (1) |
|
3.9.2 Do's and Don'ts for Handling |
|
|
58 | (1) |
|
3.9.3 Do's and Don'ts for Coding/Programming |
|
|
59 | (1) |
|
|
60 | (18) |
|
4.1 Necessity of Data Preparation |
|
|
61 | (1) |
|
4.2 From Small and Long to Short and Wide |
|
|
61 | (4) |
|
4.3 Transformation of Variables |
|
|
65 | (1) |
|
4.4 Missing Data and Imputation Strategies |
|
|
66 | (3) |
|
|
69 | (1) |
|
4.6 Dealing with the Vagaries of Data |
|
|
70 | (1) |
|
|
70 | (1) |
|
4.6.2 Tests for Normality |
|
|
70 | (1) |
|
4.6.3 Data with Totally Different Scales |
|
|
70 | (1) |
|
4.7 Adjusting the Data Distributions |
|
|
71 | (1) |
|
4.7.1 Standardisation and Normalisation |
|
|
71 | (1) |
|
|
71 | (1) |
|
4.7.3 Box--Cox Transformation |
|
|
71 | (1) |
|
|
72 | (5) |
|
|
73 | (1) |
|
4.8.2 Analytical Binning for Nominal Variables |
|
|
73 | (1) |
|
|
73 | (1) |
|
4.8.4 Binning in Practice |
|
|
74 | (3) |
|
4.9 Timing Considerations |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
78 | |
|
|
79 | (1) |
|
5.2 Basis of Statistical Tests |
|
|
80 | (3) |
|
5.2.1 Hypothesis Tests and P Values |
|
|
80 | (2) |
|
5.2.2 Tolerance Intervals |
|
|
82 | (1) |
|
5.2.3 Standard Errors and Confidence Intervals |
|
|
83 | (1) |
|
|
83 | (2) |
|
|
83 | (1) |
|
|
84 | (1) |
|
5.3.3 Sample Quality and Stability |
|
|
84 | (1) |
|
5.4 Basic Statistics for Pre-analytics |
|
|
85 | |
|
|
85 | (3) |
|
|
88 | (1) |
|
5.4.3 Cross Tabulation and Contingency Tables |
|
|
89 | (1) |
|
|
90 | (1) |
|
5.4.5 Association Measures for Nominal Variables |
|
|
91 | (1) |
|
5.4.6 Examples of Output from Comparative and Cross Tabulation Tests |
|
|
92 | (4) |
|
5.5 Feature Selection/Reduction of Variables |
|
|
96 | (3) |
|
5.5.1 Feature Reduction Using Domain Knowledge |
|
|
96 | (1) |
|
5.5.2 Feature Selection Using Chi-Square |
|
|
97 | (1) |
|
5.5.3 Principal Components Analysis and Factor Analysis |
|
|
97 | (1) |
|
5.5.4 Canonical Correlation, PLS and SEM |
|
|
98 | (1) |
|
|
98 | (1) |
|
|
98 | (1) |
|
|
99 | (3) |
|
|
102 | (59) |
|
|
104 | (1) |
|
|
105 | (4) |
|
6.2.1 Introduction and Process Steps |
|
|
105 | (1) |
|
|
105 | (1) |
|
6.2.3 Provision and Processing of the Required Data |
|
|
106 | (1) |
|
6.2.4 Analysis of the Data |
|
|
107 | (1) |
|
6.2.5 Evaluation and Validation of the Results (during the Analysis) |
|
|
108 | (1) |
|
6.2.6 Application of the Results |
|
|
108 | (1) |
|
6.3 Multiple Linear Regression for Use When Target is Continuous |
|
|
109 | (10) |
|
6.3.1 Rationale of Multiple Linear Regression Modelling |
|
|
109 | (1) |
|
6.3.2 Regression Coefficients |
|
|
110 | (1) |
|
6.3.3 Assessment of the Quality of the Model |
|
|
111 | (2) |
|
6.3.4 Example of Linear Regression in Practice |
|
|
113 | (6) |
|
6.4 Regression When the Target is Not Continuous |
|
|
119 | (10) |
|
6.4.1 Logistic Regression |
|
|
119 | (2) |
|
6.4.2 Example of Logistic Regression in Practice |
|
|
121 | (5) |
|
6.4.3 Discriminant Analysis |
|
|
126 | (2) |
|
6.4.4 Log-Linear Models and Poisson Regression |
|
|
128 | (1) |
|
|
129 | (8) |
|
|
129 | (5) |
|
6.5.2 Selection Procedures of the Relevant Input Variables |
|
|
134 | (1) |
|
|
134 | (1) |
|
6.5.4 Number of Splits (Branches of the Tree) |
|
|
135 | (1) |
|
|
135 | (1) |
|
|
135 | (2) |
|
|
137 | (4) |
|
6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks |
|
|
141 | (1) |
|
6.8 Unsupervised Learning |
|
|
142 | (6) |
|
6.8.1 Introduction and Process Steps |
|
|
142 | (1) |
|
|
143 | (1) |
|
6.8.3 Provision and Processing of the Required Data |
|
|
143 | (2) |
|
6.8.4 Analysis of the Data |
|
|
145 | (2) |
|
6.8.5 Evaluation and Validation of the Results (during the Analysis) |
|
|
147 | (1) |
|
6.8.6 Application of the Results |
|
|
148 | (1) |
|
|
148 | (3) |
|
|
148 | (1) |
|
6.9.2 Hierarchical Cluster Analysis |
|
|
149 | (1) |
|
6.9.3 K-Means Method of Cluster Analysis |
|
|
150 | (1) |
|
6.9.4 Example of Cluster Analysis in Practice |
|
|
151 | (1) |
|
6.10 Kohonen Networks and Self-Organising Maps |
|
|
151 | (4) |
|
|
151 | (1) |
|
6.10.2 Example of SOMs in Practice |
|
|
152 | (3) |
|
6.11 Group Purchase Methods: Association and Sequence Analysis |
|
|
155 | (6) |
|
|
155 | (2) |
|
6.11.2 Analysis of the Data |
|
|
157 | (1) |
|
6.11.3 Group Purchase Methods |
|
|
158 | (1) |
|
6.11.4 Examples of Group Purchase Methods in Practice |
|
|
158 | (3) |
|
7 Validation and Application |
|
|
161 | (12) |
|
7.1 Introduction to Methods for Validation |
|
|
161 | (1) |
|
|
162 | (2) |
|
|
164 | (3) |
|
|
167 | (2) |
|
7.5 Threshold Analytics and Confusion Matrix |
|
|
169 | (1) |
|
|
170 | (1) |
|
7.7 Cross-Validation and Robustness |
|
|
171 | (1) |
|
|
172 | (1) |
|
Part III Data Mining in Action |
|
|
173 | (112) |
|
|
175 | (23) |
|
8.1 Recipe 1: Response Optimisation: To Find and Address the Right Number of Customers |
|
|
176 | (10) |
|
8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer |
|
|
186 | (1) |
|
8.3 Recipe 3: To Find the Right Number of Customers to Ignore |
|
|
187 | (3) |
|
8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer |
|
|
190 | (1) |
|
8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy |
|
|
191 | (1) |
|
8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy |
|
|
192 | (1) |
|
8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase |
|
|
193 | (1) |
|
8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas |
|
|
194 | (2) |
|
8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas |
|
|
196 | (2) |
|
9 Intra-Customer Analysis |
|
|
198 | (27) |
|
9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer |
|
|
199 | (1) |
|
9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer |
|
|
200 | (6) |
|
9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products |
|
|
206 | (4) |
|
9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage |
|
|
210 | (6) |
|
9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups |
|
|
216 | (1) |
|
9.6 Recipe 15: Product Set Combination |
|
|
217 | (2) |
|
9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer |
|
|
219 | (6) |
|
10 Learning from a Small Testing Sample and Prediction |
|
|
225 | (19) |
|
10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income) |
|
|
225 | (11) |
|
10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases |
|
|
236 | (5) |
|
10.3 Recipe 19: To Understand Operational Features and General Business Forecasting |
|
|
241 | (3) |
|
|
244 | (17) |
|
11.1 Recipe 20: To Find Customers Who Will Potentially Churn |
|
|
244 | (5) |
|
11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract |
|
|
249 | (1) |
|
11.3 Recipe 22: Social Media Target Group Descriptions |
|
|
250 | (4) |
|
11.4 Recipe 23: Web Monitoring |
|
|
254 | (4) |
|
11.5 Recipe 24: To Predict Who is Likely to Click on a Special Banner |
|
|
258 | (3) |
|
12 Software and Tools: A Quick Guide |
|
|
261 | (10) |
|
12.1 List of Requirements When Choosing a Data Mining Tool |
|
|
261 | (4) |
|
12.2 Introduction to the Idea of Fully Automated Modelling (FAM) |
|
|
265 | (1) |
|
12.2.1 Predictive Behavioural Targeting |
|
|
265 | (1) |
|
12.2.2 Fully Automatic Predictive Targeting and Modelling Real-Time Online Behaviour |
|
|
266 | (1) |
|
|
266 | (1) |
|
|
267 | (1) |
|
12.5 FAM Data Flows and Databases |
|
|
268 | (1) |
|
12.6 FAM Modelling Aspects |
|
|
269 | (1) |
|
12.7 FAM Challenges and Critical Success Factors |
|
|
270 | (1) |
|
|
270 | (1) |
|
|
271 | (14) |
|
13.1 To Make Use of Official Statistics |
|
|
272 | (1) |
|
13.2 How to Use Simple Maths to Make an Impression |
|
|
272 | (3) |
|
|
272 | (1) |
|
13.2.2 Absolute and Relative Values |
|
|
273 | (1) |
|
|
273 | (1) |
|
|
273 | (1) |
|
13.2.5 Confidence Intervals |
|
|
274 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
13.3 Differences between Statistical Analysis and Data Mining |
|
|
275 | (2) |
|
|
275 | (1) |
|
13.3.2 Values Missing Because 'Nothing Happened' |
|
|
275 | (1) |
|
|
276 | (1) |
|
13.3.4 Goodness-of-Fit Tests |
|
|
276 | (1) |
|
|
277 | (1) |
|
13.4 How to Use Data Mining in Different Industries |
|
|
277 | (6) |
|
|
283 | (2) |
Bibliography |
|
285 | (11) |
Index |
|
296 | |