Preface |
|
xv | |
List of Contributors |
|
xix | |
1 Introduction to Analytics |
|
1 | (30) |
|
|
|
|
|
1 | (2) |
|
|
3 | (3) |
|
1.2.1 Data-Centric Analytics |
|
|
3 | (1) |
|
1.2.2 Decision-Centric Analytics |
|
|
4 | (1) |
|
1.2.3 Combining Data-and Decision-Centric Approaches |
|
|
5 | (1) |
|
1.3 Categories of Analytics |
|
|
6 | (10) |
|
1.3.1 Descriptive Analytics |
|
|
7 | (3) |
|
|
7 | (3) |
|
|
10 | (1) |
|
|
10 | (1) |
|
|
10 | (1) |
|
1.3.2 Predictive Analytics |
|
|
10 | (4) |
|
Data Mining and Pattern Recognition |
|
|
11 | (1) |
|
Predictive Modeling, Simulation, and Forecasting |
|
|
11 | (1) |
|
|
12 | (2) |
|
1.3.3 Prescriptive Analytics |
|
|
14 | (2) |
|
1.4 Analytics Within Organizations |
|
|
16 | (7) |
|
|
17 | (4) |
|
1.4.2 Communicating Analytics |
|
|
21 | (1) |
|
1.4.3 Organizational Capability |
|
|
21 | (2) |
|
|
23 | (2) |
|
1.6 The Changing World of Analytics |
|
|
25 | (3) |
|
|
28 | (1) |
|
|
28 | (3) |
2 Getting Started with Analytics |
|
31 | (18) |
|
|
|
31 | (1) |
|
2.2 Five Manageable Tasks |
|
|
32 | (11) |
|
2.2.1 Task 1: Selecting the Target Problem |
|
|
33 | (1) |
|
2.2.2 Task 2: Assemble the Team |
|
|
34 | (2) |
|
|
35 | (1) |
|
|
35 | (1) |
|
|
35 | (1) |
|
|
35 | (1) |
|
|
36 | (1) |
|
|
36 | (1) |
|
2.2.3 Task 3: Prepare the Data |
|
|
36 | (3) |
|
2.2.4 Task 4: Selecting Analytics Tools |
|
|
39 | (3) |
|
Analytical Specificity or Breadth |
|
|
39 | (1) |
|
|
40 | (1) |
|
|
40 | (1) |
|
|
40 | (1) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
41 | (1) |
|
Sharing and Collaboration |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (3) |
|
Case 1: Sensor Data and High-Velocity Analytics to Save Operating Costs |
|
|
43 | (1) |
|
Case 2: Social Media and High-Velocity Analytics for Quick Response to Customers |
|
|
44 | (1) |
|
Case 3: Sensor Data and High-Velocity Analytics to Save Maintenance Costs |
|
|
44 | (1) |
|
Case 4: Using Old Data and Analytics to Detect New Fraudulent Claims |
|
|
45 | (1) |
|
Case 5: Using Old and New Data Plus Analytics to Decrease Crime |
|
|
45 | (1) |
|
Case 6: Collecting the Data and Applying the Analytics Is the Business |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (1) |
|
|
48 | (1) |
3 The Analytics Team |
|
49 | (28) |
|
|
|
49 | (1) |
|
3.2 Skills Necessary for Analytics |
|
|
50 | (7) |
|
3.2.1 More Advanced or Recent Analytical and Data Science Skills |
|
|
51 | (2) |
|
|
53 | (4) |
|
3.3 Managing Analytical Talent |
|
|
57 | (4) |
|
|
58 | (1) |
|
3.3.2 Working with the HR Organization |
|
|
59 | (2) |
|
|
61 | (11) |
|
3.4.1 Goals of a Particular Analytics Organization |
|
|
62 | (1) |
|
3.4.2 Basic Models for Organizing Analytics |
|
|
63 | (2) |
|
3.4.3 Coordination Approaches |
|
|
65 | (5) |
|
Program Management Office |
|
|
66 | (1) |
|
|
67 | (1) |
|
|
67 | (1) |
|
|
67 | (1) |
|
|
67 | (1) |
|
|
67 | (1) |
|
What Model Fits Your Business? |
|
|
68 | (2) |
|
3.4.4 Organizational Structures for Specific Analytics Strategies and Scenarios |
|
|
70 | (1) |
|
3.4.5 Analytical Leadership and the Chief Analytics Officer |
|
|
70 | (2) |
|
3.5 To Where Should Analytical Functions Report? |
|
|
72 | (3) |
|
|
72 | (1) |
|
|
72 | (1) |
|
|
72 | (1) |
|
|
73 | (1) |
|
Marketing or Other Specific Function |
|
|
73 | (1) |
|
|
73 | (1) |
|
3.5.1 Building an Analytical Ecosystem |
|
|
73 | (1) |
|
3.5.2 Developing the Analytical Organization over Time |
|
|
74 | (1) |
|
|
75 | (2) |
4 The Data |
|
77 | (22) |
|
|
|
77 | (1) |
|
|
77 | (9) |
|
|
77 | (3) |
|
|
80 | (6) |
|
|
86 | (7) |
|
|
93 | (4) |
|
4.4.1 Relational Databases |
|
|
93 | (2) |
|
4.4.2 Nonrelational Databases |
|
|
95 | (2) |
|
|
97 | (2) |
5 Solution Methodologies |
|
99 | (56) |
|
|
|
99 | (7) |
|
5.1.1 What Exactly Do We Mean by "Solution," "Problem," and "Methodology?" |
|
|
99 | (2) |
|
5.1.2 It's All About the Problem |
|
|
101 | (1) |
|
5.1.3 Solutions versus Products |
|
|
101 | (2) |
|
5.1.4 How This Chapter Is Organized |
|
|
103 | (2) |
|
5.1.5 The "Descriptive-Predictive-Prescriptive" Analytics Paradigm |
|
|
105 | (1) |
|
5.1.6 The Goals of This Chapter |
|
|
105 | (1) |
|
5.2 Macro-Solution Methodologies for the Analytics Practitioner |
|
|
106 | (10) |
|
5.2.1 The Scientific Research Methodology |
|
|
106 | (3) |
|
5.2.2 The Operations Research Project Methodology |
|
|
109 | (3) |
|
5.2.3 The Cross-Industry Standard Process for Data Mining (CRISP-DM) Methodology |
|
|
112 | (2) |
|
5.2.4 Software Engineering-Related Solution Methodologies |
|
|
114 | (1) |
|
5.2.5 Summary of Macro-Methodologies |
|
|
114 | (2) |
|
5.3 Micro-Solution Methodologies for the Analytics Practitioner |
|
|
116 | (26) |
|
5.3.1 Micro-Solution Methodology Preliminaries |
|
|
116 | (1) |
|
5.3.2 Micro-Solution Methodology Description Framework |
|
|
117 | (2) |
|
5.3.3 Group I: Micro-Solution Methodologies for Exploration and Discovery |
|
|
119 | (8) |
|
Group I: Problems of Interest |
|
|
119 | (1) |
|
|
119 | (1) |
|
Group I: Data Considerations |
|
|
120 | (1) |
|
Group I: Solution Techniques |
|
|
120 | (6) |
|
Group I: Relationship to Macro-Methodologies |
|
|
126 | (1) |
|
|
126 | (1) |
|
5.3.4 Group II: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Independent of Data |
|
|
127 | (10) |
|
Group II: Problems of Interest |
|
|
127 | (1) |
|
Group II: Relevant Models |
|
|
127 | (1) |
|
Group II: Data Considerations |
|
|
128 | (1) |
|
Group II: Solution Techniques |
|
|
128 | (7) |
|
Group II: Relationship to Macro-Methodologies |
|
|
135 | (2) |
|
|
137 | (1) |
|
5.3.5 Group III: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Dependent on Data |
|
|
137 | (4) |
|
Group III: Problems of Interest |
|
|
137 | (1) |
|
Group III: Relevant Models |
|
|
138 | (1) |
|
Group III: Data Considerations |
|
|
138 | (1) |
|
Group III: Solution Techniques |
|
|
139 | (1) |
|
Group III: Relationship to Macro-Methodologies |
|
|
140 | (1) |
|
|
141 | (1) |
|
5.3.6 Micro-Methodology Summary |
|
|
141 | (1) |
|
5.4 General Methodology-Related Considerations |
|
|
142 | (2) |
|
5.4.1 Planning an Analytics Project |
|
|
142 | (1) |
|
5.4.2 Software and Tool Selection |
|
|
142 | (1) |
|
|
143 | (1) |
|
5.4.4 Fields with Related Methodologies |
|
|
144 | (1) |
|
5.5 Summary and Conclusions |
|
|
144 | (5) |
|
5.5.1 "Ding Dong, the Scientific Method Is Dead!" |
|
|
145 | (1) |
|
5.5.2 "Methodology Cramps My Analytics Style" |
|
|
145 | (1) |
|
5.5.3 "There Is Only One Way to Solve This" |
|
|
146 | (2) |
|
5.5.4 Perceived Success Is More Important Than the Right Answer |
|
|
148 | (1) |
|
|
149 | (1) |
|
|
149 | (6) |
6 Modeling |
|
155 | (76) |
|
|
|
|
6.2 When Are Models Appropriate |
|
|
155 | (6) |
|
6.2.1 What Is the Problem with This System? |
|
|
159 | (1) |
|
6.2.2 Is This Problem Important? |
|
|
159 | (1) |
|
6.2.3 How Will This Problem Be Solved Without a New Model? |
|
|
159 | (1) |
|
6.2.4 What Modeling Technique Will Be Used? |
|
|
159 | (1) |
|
6.2.5 How Will We Know When We Have Succeeded? |
|
|
160 | (1) |
|
Who Are the System Operator Stakeholders? |
|
|
160 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
6.3.3 Prescriptive Models |
|
|
161 | (1) |
|
6.4 Models Can Also Be Characterized by Whether They Are Deterministic or Stochastic (Random) |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
163 | (2) |
|
6.7 Probability Perspectives and Subject Matter Experts |
|
|
165 | (1) |
|
6.8 Subject Matter Experts |
|
|
165 | (1) |
|
|
166 | (3) |
|
|
166 | (1) |
|
6.9.2 Descriptive Statistics |
|
|
166 | (1) |
|
6.9.3 Parameter Estimation with a Confidence Interval |
|
|
166 | (1) |
|
|
167 | (2) |
|
6.10 Inferential Statistics |
|
|
169 | (1) |
|
6.11 A Stochastic Process |
|
|
170 | (3) |
|
|
173 | (1) |
|
6.12.1 Static versus Dynamic Simulations |
|
|
174 | (1) |
|
6.13 Mathematical Optimization |
|
|
174 | (1) |
|
|
175 | (1) |
|
6.15 Critical Path Method |
|
|
176 | (2) |
|
6.16 Portfolio Optimization Case Study Solved By a Variety of Methods |
|
|
178 | (3) |
|
|
178 | (1) |
|
|
179 | (1) |
|
6.16.3 Assessing Our Progress |
|
|
179 | (1) |
|
6.16.4 Relaxations and Bounds |
|
|
179 | (1) |
|
6.16.5 Are We Finished Yet? |
|
|
180 | (1) |
|
|
181 | (3) |
|
|
184 | (3) |
|
6.19 Susceptible, Exposed, Infected, Recovered (SEIR) Epidemiology |
|
|
187 | (2) |
|
|
189 | (1) |
|
6.21 Lanchester Models of Warfare |
|
|
189 | (3) |
|
6.22 Hughes' Salvo Model of Combat |
|
|
192 | (1) |
|
|
193 | (2) |
|
6.24 The Principle of Optimality and Dynamic Programming |
|
|
195 | (2) |
|
6.25 Stack-Based Enumeration |
|
|
197 | (3) |
|
|
197 | (2) |
|
|
199 | (1) |
|
6.25.3 Generating Permutations and Combinations |
|
|
199 | (1) |
|
6.26 Traveling Salesman Problem: Another Case Study in Alternate Solution Methods |
|
|
200 | (6) |
|
6.27 Model Documentation, Management, and Performance |
|
|
206 | (9) |
|
|
206 | (1) |
|
6.27.2 Choice of Implementation Language |
|
|
207 | (1) |
|
6.27.3 Supervised versus Automated Models |
|
|
207 | (1) |
|
|
208 | (2) |
|
6.27.5 Sensitivity Analysis |
|
|
210 | (1) |
|
6.27.6 With Different Methods |
|
|
211 | (1) |
|
6.27.7 With Different Variables |
|
|
212 | (1) |
|
|
213 | (1) |
|
|
213 | (1) |
|
|
213 | (1) |
|
|
214 | (1) |
|
|
215 | (2) |
|
|
215 | (1) |
|
|
215 | (1) |
|
6.28.3 Personally Identifiable Information |
|
|
216 | (1) |
|
6.28.4 Protected Critical Infrastructure Information System (PCIIMS) |
|
|
216 | (1) |
|
6.28.5 Institutional Review Board (IRB) |
|
|
216 | (1) |
|
6.28.6 Department of Defense and Department of Energy Classification |
|
|
216 | (1) |
|
6.28.7 Law Enforcement Data |
|
|
216 | (1) |
|
6.28.8 Copyright and Trademark |
|
|
216 | (1) |
|
6.28.9 Paraphrased and Plagiarized |
|
|
217 | (1) |
|
6.28.10 Displays of Model Outputs |
|
|
217 | (1) |
|
|
217 | (1) |
|
6.28.12 Multiple Data Evolutions |
|
|
217 | (1) |
|
6.29 Data Interpolation and Extrapolation |
|
|
217 | (1) |
|
6.30 Model Verification and Validation |
|
|
218 | (2) |
|
|
219 | (1) |
|
|
219 | (1) |
|
|
219 | (1) |
|
|
220 | (1) |
|
|
220 | (1) |
|
6.30.6 Data Vintage and Provenance |
|
|
220 | (1) |
|
6.31 Communicate with Stakeholders |
|
|
220 | (7) |
|
|
221 | (1) |
|
|
221 | (1) |
|
6.31.3 Standard Form Model Statement |
|
|
222 | (1) |
|
6.31.4 Persistence and Monotonicity: Examples of Realistic Model Restrictions |
|
|
223 | (1) |
|
6.31.5 Model Solutions Require a Lot of Polish and Refinement Before They Can Directly Influence Policy |
|
|
224 | (2) |
|
6.31.6 Model Obsolescence and Model-Advised Thumb Rules |
|
|
226 | (1) |
|
|
227 | (1) |
|
6.33 Where to Go from Here |
|
|
228 | (1) |
|
|
228 | (1) |
|
|
229 | (2) |
7 Machine Learning |
|
231 | (44) |
|
|
|
|
231 | (1) |
|
7.2 Supervised, Unsupervised, and Reinforcement Learning |
|
|
232 | (3) |
|
7.3 Model Development, Selection, and Deployment for Supervised Learning |
|
|
235 | (8) |
|
7.3.1 Goals and Guiding Principles in Machine Learning |
|
|
235 | (1) |
|
7.3.2 Algorithmic Modeling Overview |
|
|
236 | (1) |
|
7.3.3 Data Acquisition and Cleaning |
|
|
236 | (1) |
|
7.3.4 Feature Engineering |
|
|
237 | (1) |
|
|
238 | (2) |
|
7.3.6 Model Fitting (Training) and Feature Selection |
|
|
240 | (1) |
|
7.3.7 Model (Algorithm) Selection |
|
|
241 | (1) |
|
7.3.8 Model Performance Assessment |
|
|
242 | (1) |
|
7.3.9 Model Implementation |
|
|
242 | (1) |
|
7.4 Model Fitting, Model Error, and the Bias-Variance Trade-Off |
|
|
243 | (4) |
|
7.4.1 Components of (Regression) Model Error |
|
|
243 | (2) |
|
7.4.2 Model Fitting: Balancing Bias and Variance |
|
|
245 | (2) |
|
7.5 Predictive Performance Evaluation |
|
|
247 | (7) |
|
7.5.1 Regression Performance Evaluation |
|
|
248 | (1) |
|
7.5.2 Classification Performance Evaluation |
|
|
249 | (4) |
|
7.5.3 Performance Evaluation for Time-Dependent Data |
|
|
253 | (1) |
|
7.6 An Overview of Supervised Learning Algorithms |
|
|
254 | (13) |
|
7.6.1 k-Nearest Neighbors (KNN) |
|
|
255 | (1) |
|
7.6.2 Extensions to Regression |
|
|
256 | (1) |
|
7.6.3 Classification and Regression Trees |
|
|
257 | (2) |
|
7.6.4 Time Series Forecasting |
|
|
259 | (2) |
|
7.6.5 Support Vector Machines |
|
|
261 | (1) |
|
7.6.6 Artificial Neural Networks |
|
|
262 | (3) |
|
|
265 | (2) |
|
7.7 Unsupervised Learning Algorithms |
|
|
267 | (5) |
|
7.7.1 Kernel Density Estimation |
|
|
267 | (1) |
|
7.7.2 Association Rule Mining |
|
|
268 | (1) |
|
|
269 | (1) |
|
7.7.4 Principal Components Analysis (PCA) |
|
|
270 | (1) |
|
7.7.5 Bag-of-Words and Vector Space Models |
|
|
271 | (1) |
|
|
272 | (1) |
|
|
272 | (1) |
|
|
273 | (2) |
8 Deployment and Life Cycle Management |
|
275 | (36) |
|
|
|
275 | (1) |
|
8.2 The Analytics Methodology: Understanding the Critical Steps in Deployment and Life Cycle Management |
|
|
276 | (27) |
|
8.2.1 CRISP-DM Phase 1: Business Understanding |
|
|
278 | (1) |
|
8.2.2 JTA Domain I, Task 1: Obtain or Receive Problem Statement and Usability |
|
|
278 | (1) |
|
8.2.3 JTA Domain I, Task 2: Identify Stakeholders |
|
|
279 | (2) |
|
8.2.4 JTA Domain I, Task 3: Determine if the Problem Is Amenable to an Analytics Solution |
|
|
281 | (1) |
|
8.2.5 JTA Domain I, Task 4: Refine the Problem Statement and Delineate Constraints |
|
|
281 | (1) |
|
8.2.6 JTA Domain I, Task 5: Define an Initial Set of Business Benefits |
|
|
281 | (1) |
|
8.2.7 JTA Domain I, Task 6: Obtain Stakeholder Agreement on the Business Statement |
|
|
282 | (1) |
|
8.2.8 JTA Domain II, Task 1: Reformulate the Problem Statement as an Analytics Problem |
|
|
283 | (2) |
|
8.2.9 JTA Domain II, Task 2: Develop a Proposed Set of Drivers and Relationships to Outputs |
|
|
285 | (1) |
|
8.2.10 JTA Domain II, Task 3: State the Set of Assumptions Related to the Problem |
|
|
286 | (1) |
|
8.2.11 JTA Domain II, Task 4: Define the Key Metrics of Success |
|
|
287 | (1) |
|
8.2.12 JTA Domain II, Task 5: Obtain Stakeholder Agreement |
|
|
287 | (1) |
|
8.2.13 CRISP-DM Phases 2 and 3: Data Understanding and Data Preparation |
|
|
288 | (2) |
|
8.2.14 JTA Domain III, Task 1: Identify and Prioritize Data Needs and Sources |
|
|
290 | (1) |
|
8.2.15 JTA Domain III, Task 2: Acquire Data |
|
|
290 | (1) |
|
8.2.16 JTA Domain III, Task 3: Harmonize, Rescale, Clean, and Share Data |
|
|
291 | (1) |
|
8.2.17 JTA Domain III, Task 4: Identify Relationships in the Data |
|
|
292 | (1) |
|
8.2.18 JTA Domain III, Task 5: Document and Report Finding |
|
|
293 | (1) |
|
8.2.19 JTA Domain III, Task 6: Refine the Business and Analytics Problem Statements |
|
|
293 | (1) |
|
8.2.20 CRISP-DM Phase 4: Modeling |
|
|
293 | (1) |
|
8.2.21 CRISP-DM Phase 5: Evaluation |
|
|
294 | (3) |
|
8.2.22 CRISP-DM Phase 6: Deployment |
|
|
297 | (1) |
|
8.2.23 Deployment of the Analytics Model (Up to Delivery) |
|
|
298 | (3) |
|
8.2.24 Post-deployment Activities (Domain VI: Model Life Cycle Management) |
|
|
301 | (2) |
|
8.3 Overarching Issues of Life Cycle Management |
|
|
303 | (8) |
|
|
303 | (2) |
|
|
305 | (2) |
|
|
307 | (1) |
|
|
308 | (3) |
9 The Blossoming Analytics Talent Pool: An Overview of the Analytics Ecosystem |
|
311 | (16) |
|
|
|
|
311 | (1) |
|
9.2 Analytics Industry Ecosystem |
|
|
312 | (13) |
|
9.2.1 Data Generation Infrastructure Providers |
|
|
314 | (1) |
|
9.2.2 Data Management Infrastructure Providers |
|
|
315 | (1) |
|
9.2.3 Data Warehouse Providers |
|
|
316 | (1) |
|
9.2.4 Middleware Providers |
|
|
316 | (1) |
|
9.2.5 Data Service Providers |
|
|
316 | (1) |
|
9.2.6 Analytics-Focused Software Developers |
|
|
317 | (2) |
|
Reporting/Descriptive Analytics |
|
|
317 | (1) |
|
|
318 | (1) |
|
|
318 | (1) |
|
9.2.7 Application Developers: Industry-Specific or General |
|
|
319 | (2) |
|
9.2.8 Analytics Industry Analysts and Influencers |
|
|
321 | (1) |
|
9.2.9 Academic Institutions and Certification Agencies |
|
|
322 | (1) |
|
9.2.10 Regulators and Policy Makers |
|
|
323 | (1) |
|
9.2.11 Analytics User Organizations |
|
|
323 | (2) |
|
|
325 | (1) |
|
|
326 | (1) |
Appendix: Writing and Teaching Analytics with Cases |
|
327 | (28) |
|
Index |
|
355 | |