About This Book |
|
ix | |
About The Author |
|
xiii | |
|
Chapter 1 Data Management in the Analytics Process |
|
|
1 | (6) |
|
|
1 | (1) |
|
|
2 | (1) |
|
Asking Questions That Data Can Help to Answer |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
Combining and Reconciling Multiple Sources |
|
|
4 | (1) |
|
Identifying and Addressing Data Issues |
|
|
4 | (1) |
|
Data Requirements Shaped by Modeling Strategies |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
5 | (1) |
|
|
5 | (2) |
|
Chapter 2 Data Management Foundations |
|
|
7 | (8) |
|
|
7 | (1) |
|
Matching Form to Function |
|
|
8 | (1) |
|
|
9 | (1) |
|
Data Types and Modeling Types |
|
|
10 | (2) |
|
|
10 | (1) |
|
|
10 | (2) |
|
Basics of Relational Databases |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
14 | (1) |
|
Chapter 3 Sources of Data and Their Challenges |
|
|
15 | (8) |
|
|
15 | (1) |
|
Internal Data in Flat Files |
|
|
15 | (1) |
|
|
16 | (1) |
|
External Data on the World Wide Web |
|
|
16 | (3) |
|
User-Facing Query Interfaces |
|
|
16 | (3) |
|
|
19 | (1) |
|
Evolving WWW Data Standards |
|
|
19 | (1) |
|
Ethical and Legal Considerations |
|
|
19 | (1) |
|
|
20 | (1) |
|
|
21 | (2) |
|
|
23 | (20) |
|
|
23 | (1) |
|
|
23 | (2) |
|
Common Formats Other than JMP |
|
|
25 | (16) |
|
|
25 | (7) |
|
|
32 | (7) |
|
|
39 | (2) |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
42 | (1) |
|
Chapter 5 Database Queries |
|
|
43 | (26) |
|
|
43 | (1) |
|
Sample Databases in This Chapter |
|
|
44 | (1) |
|
|
44 | (4) |
|
Extracting Data from One Table in a Database |
|
|
48 | (4) |
|
|
48 | (1) |
|
Import a Subset of a Table |
|
|
49 | (3) |
|
Querying a Database from JMP |
|
|
52 | (12) |
|
|
52 | (3) |
|
An Illustrative Scenario: Bicycle Parts |
|
|
55 | (2) |
|
Designing a Query with Query Builder |
|
|
57 | (7) |
|
Query Builder for SAS Server Data |
|
|
64 | (2) |
|
|
66 | (1) |
|
|
67 | (2) |
|
Chapter 6 Importing Data from Websites |
|
|
69 | (8) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
70 | (2) |
|
Common Issues to Anticipate |
|
|
72 | (2) |
|
|
74 | (1) |
|
|
75 | (2) |
|
Chapter 7 Reshaping a Data Table |
|
|
77 | (20) |
|
|
77 | (1) |
|
What Shape Is a Data Table? |
|
|
78 | (1) |
|
|
78 | (1) |
|
Reasons for Wide and Long Formats |
|
|
79 | (1) |
|
|
79 | (3) |
|
|
82 | (1) |
|
|
83 | (8) |
|
|
83 | (2) |
|
Scripting for Reproducibility |
|
|
85 | (1) |
|
|
86 | (4) |
|
Transposing Rows and Columns |
|
|
90 | (1) |
|
|
91 | (3) |
|
|
94 | (1) |
|
|
94 | (3) |
|
Chapter 8 Joining, Subsetting, and Filtering |
|
|
97 | (26) |
|
|
97 | (1) |
|
Combining Data from Multiple Tables with Join |
|
|
98 | (4) |
|
Saving Memory with a Virtual Join |
|
|
102 | (1) |
|
Why and How to Select a Subset |
|
|
103 | (4) |
|
A Brief Detour: Creating a New Column from an Existing Column |
|
|
104 | (3) |
|
Row Filters: Global and Local |
|
|
107 | (4) |
|
|
107 | (2) |
|
|
109 | (1) |
|
|
110 | (1) |
|
Combining Rows with Concatenate |
|
|
111 | (2) |
|
|
113 | (8) |
|
|
113 | (1) |
|
Olympic Medals and Development Indicators |
|
|
114 | (7) |
|
|
121 | (1) |
|
|
122 | (1) |
|
Chapter 9 Data Exploration: Visual and Automated Tools to Detect Problems |
|
|
123 | (16) |
|
|
123 | (1) |
|
Common Issues to Anticipate |
|
|
124 | (1) |
|
On the Hunt for Dirty Data |
|
|
125 | (1) |
|
|
126 | (1) |
|
|
126 | (2) |
|
Multivariate (Correlations and Scatterplot Matrix) |
|
|
128 | (2) |
|
More Tools within the Multivariate Platform |
|
|
129 | (1) |
|
|
129 | (1) |
|
|
130 | (1) |
|
|
130 | (1) |
|
|
130 | (5) |
|
|
132 | (1) |
|
|
133 | (1) |
|
Multivariate Robust Outliers |
|
|
133 | (1) |
|
Multivariate k-Nearest Neighbors Outliers |
|
|
134 | (1) |
|
|
135 | (1) |
|
|
136 | (1) |
|
|
137 | (2) |
|
Chapter 10 Missing Data Strategies |
|
|
139 | (16) |
|
|
139 | (1) |
|
|
140 | (2) |
|
|
142 | (1) |
|
Working with Complete Cases |
|
|
142 | (1) |
|
Analysis with Sampling Weights |
|
|
142 | (2) |
|
|
144 | (9) |
|
|
144 | (1) |
|
|
145 | (2) |
|
Multivariate Normal Imputation |
|
|
147 | (2) |
|
Multivariate SVD Imputation |
|
|
149 | (2) |
|
Special Considerations for Time Series |
|
|
151 | (2) |
|
Conclusion and a Note of Caution |
|
|
153 | (1) |
|
|
153 | (2) |
|
Chapter 11 Data Preparation for Analysis |
|
|
155 | (30) |
|
|
155 | (1) |
|
Common Issues and Appropriate Strategies |
|
|
156 | (1) |
|
Distribution of Observations |
|
|
157 | (10) |
|
|
157 | (3) |
|
|
160 | (2) |
|
Scale Differences among Model Variables |
|
|
162 | (1) |
|
Too Many Levels of a Categorical Variable |
|
|
163 | (4) |
|
High Dimensionality: Abundance of Columns |
|
|
167 | (6) |
|
Correlated or Redundant Variables |
|
|
167 | (1) |
|
Missing or Sparse Observations across Columns |
|
|
168 | (1) |
|
|
168 | (5) |
|
|
173 | (6) |
|
Partitioning into Training, Validation, and Test Sets |
|
|
173 | (3) |
|
Aggregating Rows with Summary Tables |
|
|
176 | (1) |
|
|
177 | (2) |
|
Date and Time-Related Issues |
|
|
179 | (4) |
|
Formatting Dates and Times |
|
|
179 | (1) |
|
Some Date Functions: Extracting Parts |
|
|
180 | (1) |
|
|
181 | (1) |
|
Row Functions Especially Useful in Time-Ordered Data |
|
|
181 | (1) |
|
Elapsed Time and Date Arithmetic |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
183 | (2) |
|
Chapter 12 Exporting Work to Other Platforms |
|
|
185 | (10) |
|
|
185 | (1) |
|
Why Export or Exchange Data? |
|
|
185 | (1) |
|
Fit the Method to the Purpose |
|
|
186 | (3) |
|
|
186 | (1) |
|
|
187 | (1) |
|
|
188 | (1) |
|
|
189 | (4) |
|
|
190 | (2) |
|
Static Images: Graphics Formats, PowerPoint, and Word |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
193 | (2) |
Index |
|
195 | |