Preface |
|
ix | |
|
I Data Analysis Essentials |
|
|
1 | (164) |
|
|
3 | (29) |
|
1.1 Quantitative Data Analysis and the Humanities |
|
|
3 | (2) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
7 | (6) |
|
1.4.1 What you should know |
|
|
8 | (4) |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.5 An Exploratory Data Analysis of the United States' Culinary History |
|
|
13 | (1) |
|
1.6 Cooking with Tabular Data |
|
|
14 | (4) |
|
1.7 Taste Trends in Culinary US History |
|
|
18 | (8) |
|
1.8 America's Culinary Melting Pot |
|
|
26 | (4) |
|
|
30 | (2) |
|
Chapter 2 Parsing and Manipulating Structured Data |
|
|
32 | (46) |
|
|
32 | (1) |
|
|
33 | (3) |
|
|
36 | (4) |
|
|
40 | (3) |
|
|
43 | (3) |
|
|
46 | (11) |
|
|
48 | (3) |
|
|
51 | (5) |
|
|
56 | (1) |
|
|
57 | (8) |
|
2.7.1 Retrieving HTML from the web |
|
|
64 | (1) |
|
2.8 Extracting Character Interaction Networks |
|
|
65 | (9) |
|
2.9 Conclusion and Further Reading |
|
|
74 | (4) |
|
Chapter 3 Exploring Texts Using the Vector Space Model |
|
|
78 | (48) |
|
|
78 | (1) |
|
3.2 From Texts to Vectors |
|
|
79 | (11) |
|
|
81 | (9) |
|
|
90 | (21) |
|
3.3.1 Computing distances between documents |
|
|
97 | (10) |
|
|
107 | (4) |
|
|
111 | (2) |
|
3.5 Appendix: Vectorizing Texts with NumPy |
|
|
113 | (13) |
|
3.5.1 Constructing arrays |
|
|
113 | (4) |
|
3.5.2 Indexing and slicing arrays |
|
|
117 | (3) |
|
3.5.3 Aggregating functions |
|
|
120 | (2) |
|
|
122 | (4) |
|
Chapter 4 Processing Tabular Data |
|
|
126 | (39) |
|
4.1 Loading, Inspecting, and Summarizing Tabular Data |
|
|
127 | (9) |
|
4.1.1 Reading tabular data with Pandas |
|
|
130 | (6) |
|
4.2 Mapping Cultural Change |
|
|
136 | (13) |
|
4.2.1 Turnover in naming practices |
|
|
136 | (10) |
|
4.2.2 Visualizing turnovers |
|
|
146 | (3) |
|
4.3 Changing Naming Practices |
|
|
149 | (13) |
|
4.3.1 Increasing name diversity |
|
|
150 | (3) |
|
4.3.2 A bias for names ending in w? |
|
|
153 | (5) |
|
4.3.3 Unisex names in the United States |
|
|
158 | (4) |
|
4.4 Conclusions and Further Reading |
|
|
162 | (3) |
|
II Advanced Data Analysis |
|
|
165 | (158) |
|
Chapter 5 Statistics Essentials: Who Reads Novels? |
|
|
169 | (32) |
|
|
169 | (1) |
|
|
170 | (1) |
|
5.3 Summarizing Location and Dispersion |
|
|
171 | (4) |
|
5.3.1 Data: Novel reading in the United States |
|
|
171 | (4) |
|
|
175 | (4) |
|
|
179 | (9) |
|
5.5.1 Variation in categorical values |
|
|
184 | (4) |
|
5.6 Measuring Association |
|
|
188 | (9) |
|
5.6.1 Measuring association between numbers |
|
|
188 | (4) |
|
5.6.2 Measuring association between categories |
|
|
192 | (3) |
|
|
195 | (2) |
|
|
197 | (1) |
|
|
198 | (3) |
|
Chapter 6 Introduction to Probability |
|
|
201 | (28) |
|
6.1 Uncertainty and Thomas Pynchon |
|
|
202 | (1) |
|
|
203 | (5) |
|
6.2.1 Probability and degree of belief |
|
|
205 | (3) |
|
6.3 Example: Bayes's Rule and Authorship Attribution |
|
|
208 | (17) |
|
6.3.1 Random variables and probability distributions |
|
|
213 | (12) |
|
|
225 | (2) |
|
|
227 | (2) |
|
|
227 | (1) |
|
6.5.2 Fitting a negative binomial distribution |
|
|
228 | (1) |
|
Chapter 7 Narrating with Maps |
|
|
229 | (19) |
|
|
229 | (1) |
|
|
230 | (3) |
|
7.3 Projections and Basemaps |
|
|
233 | (3) |
|
|
236 | (2) |
|
7.5 Mapping the Development of the War |
|
|
238 | (6) |
|
|
244 | (4) |
|
Chapter 8 Stylometry and the Voice of Hildegard |
|
|
248 | (37) |
|
|
248 | (2) |
|
8.2 Authorship Attribution |
|
|
250 | (12) |
|
|
252 | (2) |
|
|
254 | (3) |
|
8.2.3 Computing document distances with Delta |
|
|
257 | (3) |
|
8.2.4 Authorship attribution evaluation |
|
|
260 | (2) |
|
8.3 Hierarchical Agglomerative Clustering |
|
|
262 | (4) |
|
8.4 Principal Component Analysis |
|
|
266 | (14) |
|
|
268 | (3) |
|
8.4.2 The intuition behind PCA |
|
|
271 | (3) |
|
|
274 | (6) |
|
|
280 | (1) |
|
|
280 | (5) |
|
Chapter 9 A Topic Model of United States Supreme Court Opinions, 1900-2000 |
|
|
285 | (38) |
|
|
285 | (2) |
|
9.2 Mixture Models: Artwork Dimensions in the Tate Galleries |
|
|
287 | (7) |
|
9.3 Mixed-Membership Model of Texts |
|
|
294 | (23) |
|
9.3.1 Parameter estimation |
|
|
300 | (4) |
|
9.3.2 Checking an unsupervised model |
|
|
304 | (5) |
|
9.3.3 Modeling different word senses |
|
|
309 | (4) |
|
9.3.4 Exploring trends over time in the Supreme Court |
|
|
313 | (4) |
|
|
317 | (1) |
|
|
318 | (2) |
|
9.6 Appendix: Mapping Between Our Topic Model and Lauderdale and Clark (2014) |
|
|
320 | (3) |
Epilogue: Good Enough Practices |
|
323 | (2) |
Bibliography |
|
325 | (8) |
Index |
|
333 | |