Foreword |
|
xix | |
Preface |
|
xxi | |
Acknowledgments |
|
xxvii | |
About the Author |
|
xxxi | |
|
|
1 | (90) |
|
1 Pandas Data Frame Basics |
|
|
3 | (22) |
|
|
3 | (1) |
|
1.2 Loading Your First Data Set |
|
|
4 | (3) |
|
1.3 Looking at Columns, Rows, and Cells |
|
|
7 | (11) |
|
|
7 | (1) |
|
|
8 | (4) |
|
|
12 | (6) |
|
1.4 Grouped and Aggregated Calculations |
|
|
18 | (5) |
|
|
19 | (4) |
|
1.4.2 Grouped Frequency, Counts |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
25 | (24) |
|
|
25 | (1) |
|
2.2 Creating Your Own Data |
|
|
26 | (2) |
|
|
26 | (1) |
|
2.2.2 Creating a DataFrame |
|
|
27 | (1) |
|
|
28 | (8) |
|
2.3.1 The Series Is ndarray-like |
|
|
30 | (1) |
|
|
30 | (3) |
|
2.3.3 Operations Are Aligned and Vectorized (Broadcasting) |
|
|
33 | (3) |
|
|
36 | (2) |
|
2.4.1 Boolean Subsetting: DataFrames |
|
|
36 | (1) |
|
2.4.2 Operations Are Automatically Aligned and Vectorized (Broadcasting) |
|
|
37 | (1) |
|
2.5 Making Changes to Series and DataFrames |
|
|
38 | (5) |
|
2.5.1 Add Additional Columns |
|
|
38 | (1) |
|
2.5.2 Directly Change a Column |
|
|
39 | (4) |
|
|
43 | (1) |
|
2.6 Exporting and Importing Data |
|
|
43 | (4) |
|
|
43 | (2) |
|
|
45 | (1) |
|
|
46 | (1) |
|
2.6.4 Feather Format to Interface With R |
|
|
47 | (1) |
|
2.6.5 Other Data Output Types |
|
|
47 | (1) |
|
|
47 | (2) |
|
3 Introduction to Plotting |
|
|
49 | (42) |
|
|
49 | (2) |
|
|
51 | (5) |
|
3.3 Statistical Graphics Using matplotlib |
|
|
56 | (5) |
|
|
57 | (1) |
|
|
58 | (1) |
|
|
59 | (2) |
|
|
61 | (22) |
|
|
62 | (3) |
|
|
65 | (8) |
|
|
73 | (10) |
|
|
83 | (3) |
|
|
84 | (1) |
|
|
85 | (1) |
|
|
85 | (1) |
|
|
86 | (1) |
|
|
86 | (1) |
|
3.6 Seaborn Themes and Styles |
|
|
86 | (4) |
|
|
90 | (1) |
|
|
91 | (52) |
|
|
93 | (16) |
|
|
93 | (1) |
|
|
93 | (1) |
|
4.2.1 Combining Data Sets |
|
|
94 | (1) |
|
|
94 | (8) |
|
|
94 | (4) |
|
|
98 | (1) |
|
4.3.3 Concatenation With Different Indices |
|
|
99 | (3) |
|
4.4 Merging Multiple Data Sets |
|
|
102 | (5) |
|
|
104 | (1) |
|
|
105 | (1) |
|
|
105 | (2) |
|
|
107 | (2) |
|
|
109 | (14) |
|
|
109 | (1) |
|
|
109 | (2) |
|
5.3 Where Do Missing Values Come From? |
|
|
111 | (5) |
|
|
111 | (1) |
|
|
112 | (2) |
|
|
114 | (1) |
|
|
114 | (2) |
|
5.4 Working With Missing Data |
|
|
116 | (5) |
|
5.4.1 Find and Count missing Data |
|
|
116 | (2) |
|
5.4.2 Cleaning Missing Data |
|
|
118 | (2) |
|
5.4.3 Calculations With Missing Data |
|
|
120 | (1) |
|
|
121 | (2) |
|
|
123 | (20) |
|
|
123 | (1) |
|
6.2 Columns Contain Values, Not Variables |
|
|
124 | (4) |
|
6.2.1 Keep One Column Fixed |
|
|
124 | (2) |
|
6.2.2 Keep Multiple Columns Fixed |
|
|
126 | (2) |
|
6.3 Columns Contain Multiple Variables |
|
|
128 | (5) |
|
6.3.1 Split and Add Columns Individually (Simple Method) |
|
|
129 | (2) |
|
6.3.2 Split and Combine in a Single Step (Simple Method) |
|
|
131 | (1) |
|
6.3.3 Split and Combine in a Single Step (More Complicated Method) |
|
|
132 | (1) |
|
6.4 Variables in Both Rows and Columns |
|
|
133 | (1) |
|
6.5 Multiple Observational Units in a Table (Normalization) |
|
|
134 | (3) |
|
6.6 Observational Units Across Multiple Tables |
|
|
137 | (4) |
|
6.6.1 Load Multiple Files Using a Loop |
|
|
139 | (1) |
|
6.6.2 Load Multiple Files Using a List Comprehension |
|
|
140 | (1) |
|
|
141 | (2) |
|
|
143 | (98) |
|
|
145 | (10) |
|
|
145 | (1) |
|
|
145 | (1) |
|
|
146 | (6) |
|
7.3.1 Converting to String Objects |
|
|
146 | (1) |
|
7.3.2 Converting to Numeric Values |
|
|
147 | (5) |
|
|
152 | (1) |
|
7.4.1 Convert to Category |
|
|
152 | (1) |
|
7.4.2 Manipulating Categorical Data |
|
|
153 | (1) |
|
|
153 | (2) |
|
|
155 | (16) |
|
|
155 | (1) |
|
|
155 | (3) |
|
8.2.1 Subsetting and Slicing Strings |
|
|
155 | (2) |
|
8.2.2 Getting the Last Character in a String |
|
|
157 | (1) |
|
|
158 | (2) |
|
|
160 | (1) |
|
|
160 | (1) |
|
|
160 | (1) |
|
|
161 | (3) |
|
8.5.1 Custom String Formatting |
|
|
161 | (1) |
|
8.5.2 Formatting Character Strings |
|
|
162 | (1) |
|
|
162 | (1) |
|
8.5.4 C printf Style Formatting |
|
|
163 | (1) |
|
8.5.5 Formatted Literal Strings in Python 3.6+ |
|
|
163 | (1) |
|
8.6 Regular Expressions (RegEx) |
|
|
164 | (6) |
|
|
164 | (4) |
|
|
168 | (1) |
|
8.6.3 Substituting a Pattern |
|
|
168 | (1) |
|
8.6.4 Compiling a Pattern |
|
|
169 | (1) |
|
|
170 | (1) |
|
|
170 | (1) |
|
|
171 | (18) |
|
|
171 | (1) |
|
|
171 | (1) |
|
|
172 | (5) |
|
9.3.1 Apply Over a Series |
|
|
173 | (1) |
|
9.3.2 Apply Over a DataFrame |
|
|
174 | (3) |
|
9.4 Apply (More Advanced) |
|
|
177 | (5) |
|
9.4.1 Column-wise Operations |
|
|
178 | (2) |
|
9.4.2 Row-wise Operations |
|
|
180 | (2) |
|
|
182 | (3) |
|
|
184 | (1) |
|
|
185 | (1) |
|
|
185 | (2) |
|
|
187 | (2) |
|
10 Groupby Operations: Split-Apply-Combine |
|
|
189 | (24) |
|
|
189 | (1) |
|
|
190 | (7) |
|
10.2.1 Basic One-Variable Grouped Aggregation |
|
|
190 | (1) |
|
10.2.2 Built-in Aggregation Methods |
|
|
191 | (1) |
|
10.2.3 Aggregation Functions |
|
|
192 | (3) |
|
10.2.4 Multiple Functions Simultaneously |
|
|
195 | (1) |
|
10.2.5 Using a diet in agg/aggregate |
|
|
195 | (2) |
|
|
197 | (4) |
|
|
197 | (4) |
|
|
201 | (1) |
|
10.5 The pandas.core.groupby .DataFrameGroupBy Object |
|
|
202 | (5) |
|
|
202 | (1) |
|
10.5.2 Group Calculations Involving Multiple Variables |
|
|
203 | (1) |
|
|
204 | (1) |
|
10.5.4 Iterating Through Groups |
|
|
204 | (2) |
|
|
206 | (1) |
|
10.5.6 Flattening the Results |
|
|
206 | (1) |
|
10.6 Working With a MultiIndex |
|
|
207 | (4) |
|
|
211 | (2) |
|
11 The datetime Data Type |
|
|
213 | (28) |
|
|
213 | (1) |
|
11.2 Python's datetime Object |
|
|
213 | (1) |
|
11.3 Converting to datetime |
|
|
214 | (3) |
|
11.4 Loading Data That Include Dates |
|
|
217 | (1) |
|
11.5 Extracting Date Components |
|
|
217 | (3) |
|
11.6 Date Calculations and Timedeltas |
|
|
220 | (1) |
|
|
221 | (3) |
|
|
224 | (1) |
|
11.9 Subsetting Data Based on Dates |
|
|
225 | (2) |
|
11.9.1 The Datetime Index Object |
|
|
225 | (1) |
|
11.9.2 The TimedeltaIndex Object |
|
|
226 | (1) |
|
|
227 | (3) |
|
|
228 | (1) |
|
|
229 | (1) |
|
|
230 | (7) |
|
|
237 | (1) |
|
|
238 | (2) |
|
|
240 | (1) |
|
|
241 | (62) |
|
|
243 | (10) |
|
|
243 | (1) |
|
12.2 Simple Linear Regression |
|
|
243 | (4) |
|
|
243 | (2) |
|
|
245 | (2) |
|
|
247 | (4) |
|
|
247 | (1) |
|
12.3.2 Using statsmodels With Categorical Variables |
|
|
248 | (1) |
|
|
249 | (1) |
|
12.3.4 Using sklearn With Categorical Variables |
|
|
250 | (1) |
|
12.4 Keeping Index Labels From sklearn |
|
|
251 | (1) |
|
|
252 | (1) |
|
13 Generalized Linear Models |
|
|
253 | (12) |
|
|
253 | (1) |
|
|
253 | (4) |
|
|
255 | (1) |
|
|
256 | (1) |
|
|
257 | (3) |
|
|
258 | (1) |
|
13.3.2 Negative Binomial Regression for Overdispersion |
|
|
259 | (1) |
|
13.4 More Generalized Linear Models |
|
|
260 | (1) |
|
|
260 | (4) |
|
13.5.1 Testing the Cox Model Assumptions |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
265 | (14) |
|
|
265 | (1) |
|
|
265 | (5) |
|
|
268 | (2) |
|
14.3 Comparing Multiple Models |
|
|
270 | (5) |
|
14.3.1 Working With Linear Models |
|
|
270 | (3) |
|
14.3.2 Working With GLM Models |
|
|
273 | (2) |
|
14.4 k-Fold Cross-validation |
|
|
275 | (3) |
|
|
278 | (1) |
|
|
279 | (12) |
|
|
279 | (1) |
|
|
279 | (2) |
|
|
281 | (2) |
|
|
283 | (2) |
|
|
285 | (4) |
|
|
287 | (2) |
|
|
289 | (2) |
|
|
291 | (12) |
|
|
291 | (1) |
|
|
291 | (6) |
|
16.2.1 Dimension Reduction With PCA |
|
|
294 | (3) |
|
16.3 Hierarchical Clustering |
|
|
297 | (4) |
|
16.3.1 Complete Clustering |
|
|
298 | (1) |
|
|
298 | (1) |
|
16.3.3 Average Clustering |
|
|
299 | (1) |
|
16.3.4 Centroid Clustering |
|
|
299 | (1) |
|
16.3.5 Manually Setting the Threshold |
|
|
299 | (2) |
|
|
301 | (2) |
|
|
303 | (10) |
|
17 Life Outside of Pandas |
|
|
305 | (4) |
|
17.1 The (Scientific) Computing Stack |
|
|
305 | (1) |
|
|
306 | (1) |
|
|
306 | (1) |
|
17.2.2 Profiling Your Code |
|
|
307 | (1) |
|
17.3 Going Bigger and Faster |
|
|
307 | (2) |
|
18 Toward a Self-Directed Learner |
|
|
309 | (4) |
|
18.1 It's Dangerous to Go Alone! |
|
|
309 | (1) |
|
|
309 | (1) |
|
|
309 | (1) |
|
|
310 | (1) |
|
|
310 | (1) |
|
|
311 | (2) |
|
|
313 | (2) |
|
|
315 | (2) |
|
|
315 | (1) |
|
|
315 | (1) |
|
|
316 | (1) |
|
|
316 | (1) |
|
|
316 | (1) |
|
|
317 | (2) |
|
|
317 | (1) |
|
|
317 | (1) |
|
|
317 | (1) |
|
|
318 | (1) |
|
|
318 | (1) |
|
|
319 | (2) |
|
|
321 | (4) |
|
D.1 Command Line and Text Editor |
|
|
321 | (1) |
|
|
322 | (1) |
|
|
322 | (1) |
|
D.4 Integrated Development Environments (IDEs) |
|
|
322 | (3) |
|
|
325 | (2) |
|
|
327 | (2) |
|
|
329 | (2) |
|
|
330 | (1) |
|
|
331 | (2) |
|
|
333 | (2) |
|
|
335 | (2) |
|
|
337 | (2) |
|
|
339 | (2) |
|
|
341 | (2) |
|
|
343 | (2) |
|
|
345 | (4) |
|
|
347 | (1) |
|
|
347 | (2) |
|
|
347 | (1) |
|
|
348 | (1) |
|
|
349 | (2) |
|
|
351 | (2) |
|
|
353 | (2) |
|
|
355 | (2) |
|
|
357 | (2) |
Index |
|
359 | |