Acknowledgments |
|
xvi | |
Preface |
|
xvii | |
1 Using and abusing data analytics in social science |
|
1 | (21) |
|
|
1 | (2) |
|
1.2 The promise of data analytics for social science |
|
|
3 | (1) |
|
1.2.1 Data analytics in public affairs and public policy |
|
|
3 | (1) |
|
1.2.2 Data analytics in the social sciences |
|
|
3 | (1) |
|
1.2.3 Data analytics in the humanities |
|
|
4 | (1) |
|
1.3 Research design issues in data analytics |
|
|
4 | (6) |
|
1.3.1 Beware the true believer |
|
|
4 | (1) |
|
1.3.2 Pseudo-objectivity in data analytics |
|
|
4 | (1) |
|
1.3.3 The bias of scholarship based on algorithms using big data |
|
|
5 | (3) |
|
1.3.4 The subjectivity of algorithms |
|
|
8 | (1) |
|
1.3.5 Big data and big noise |
|
|
9 | (1) |
|
1.3.6 Limitations of the leading data science dissemination models |
|
|
9 | (1) |
|
1.4 Social and ethical issues in data analytics |
|
|
10 | (9) |
|
1.4.1 Types of ethical issues in data analytics |
|
|
10 | (1) |
|
1.4.2 Bias toward the privileged |
|
|
11 | (1) |
|
|
12 | (1) |
|
1.4.4 Diversity and data analytics |
|
|
13 | (1) |
|
1.4.5 Distortion of democratic processes |
|
|
14 | (1) |
|
1.4.6 Undermining of professional ethics |
|
|
14 | (1) |
|
1.4.7 Privacy, profiling, and surveillance issues |
|
|
15 | (3) |
|
1.4.8 The transparency issue |
|
|
18 | (1) |
|
1.5 Summary: Technology and power |
|
|
19 | (2) |
|
|
21 | (1) |
2 Statistical analytics with R, Part 1 |
|
22 | (69) |
|
Part I: Overview Of Statistical Analysis With R |
|
|
22 | (2) |
|
|
22 | (1) |
|
2.2 Data and packages used in this chapter |
|
|
22 | (2) |
|
|
22 | (1) |
|
|
23 | (1) |
|
Part II: Quick Start On Statistical Analysis With R |
|
|
24 | (9) |
|
2.3 Descriptive statistics |
|
|
24 | (2) |
|
2.4 Linear multiple regression |
|
|
26 | (7) |
|
Part III: Statistical Analysis With R In Detail |
|
|
33 | (58) |
|
|
33 | (3) |
|
2.5.1 One-sample test of means |
|
|
34 | (1) |
|
2.5.2 Means test for two independent samples |
|
|
35 | (1) |
|
2.5.3 Means test for two dependent samples |
|
|
35 | (1) |
|
2.6 Crosstabulation, significance, and association |
|
|
36 | (2) |
|
2.7 Loglinear analysis for categorical variables |
|
|
38 | (1) |
|
2.8 Correlation, correlograms, and scatterplots |
|
|
38 | (5) |
|
2.9 Factor analysis (exploratory) |
|
|
43 | (1) |
|
2.10 Multidimensional scaling |
|
|
44 | (1) |
|
2.11 Reliability analysis |
|
|
44 | (5) |
|
2.11.1 Cronbach's alpha and Guttman's lower bounds |
|
|
46 | (1) |
|
2.11.2 Guttman's lower bounds and Cronbach's alpha |
|
|
46 | (2) |
|
2.11.3 Krippendorff's alpha and Cohen's kappa |
|
|
48 | (1) |
|
|
49 | (11) |
|
2.12.1 Hierarchical cluster analysis |
|
|
50 | (1) |
|
2.12.2 K-means clustering |
|
|
50 | (9) |
|
2.12.3 Nearest neighbor analysis |
|
|
59 | (1) |
|
2.13 Analysis of variance |
|
|
60 | (13) |
|
2.13.1 Data and packages used |
|
|
60 | (1) |
|
2.13.2 GLM univariate: ANOVA |
|
|
61 | (5) |
|
2.13.3 GLM univariate: ANCOVA |
|
|
66 | (1) |
|
2.13.4 GLM multivariate: MANOVA |
|
|
67 | (3) |
|
2.13.5 GLM multivariate: MANCOVA |
|
|
70 | (3) |
|
|
73 | (6) |
|
2.14.1 ROC and AUC analysis |
|
|
77 | (1) |
|
2.14.2 Confusion table and accuracy |
|
|
77 | (2) |
|
2.15 Mediation and moderation |
|
|
79 | (10) |
|
2.16 Chapter 2 command summary |
|
|
89 | (1) |
|
|
89 | (2) |
3 Statistical analytics with R, Part 2 |
|
91 | (45) |
|
Part I: Overview Of Statistical Analytics With R |
|
|
91 | (1) |
|
|
91 | (1) |
|
3.2 Data and packages used in this chapter |
|
|
91 | (1) |
|
|
91 | (1) |
|
|
92 | (1) |
|
Part II: Quick Start On Statistical Analysis Part 2 |
|
|
92 | (9) |
|
3.3 Quick start: Linear regression as a generalized linear modeling (GZLM) |
|
|
92 | (7) |
|
|
92 | (1) |
|
3.3.2 The linear model in glm() |
|
|
92 | (1) |
|
|
93 | (1) |
|
3.3.4 Fitted value, residuals, and plots |
|
|
94 | (3) |
|
3.3.5 Noncanonical custom links |
|
|
97 | (1) |
|
3.3.6 Multiple comparison tests |
|
|
98 | (1) |
|
3.3.7 Estimated marginal means (EMM) |
|
|
98 | (1) |
|
3.4 Quick start: Testing if multilevel modeling is needed |
|
|
99 | (2) |
|
Part III: Statistical Analysis, Part 2, In Detail |
|
|
101 | (35) |
|
3.5 Generalized linear models (GZLM) |
|
|
101 | (14) |
|
|
101 | (2) |
|
3.5.2 Setup for GZLM models in R |
|
|
103 | (1) |
|
3.5.3 Binary logistic regression example |
|
|
104 | (1) |
|
3.5.4 Gamma regression model |
|
|
105 | (3) |
|
3.5.5 Poisson regression model |
|
|
108 | (5) |
|
3.5.6 Negative binomial regression |
|
|
113 | (2) |
|
3.6 Multilevel modeling (MLM) |
|
|
115 | (4) |
|
|
115 | (1) |
|
|
115 | (1) |
|
3.6.3 The random coefficients model |
|
|
116 | (3) |
|
3.6.4 Likelihood ratio test |
|
|
119 | (1) |
|
3.7 Panel data regression (PDR) |
|
|
119 | (15) |
|
|
119 | (1) |
|
|
120 | (2) |
|
|
122 | (1) |
|
|
123 | (1) |
|
3.7.5 PDR with the plm package |
|
|
124 | (9) |
|
3.7.6 PDR with the panelr package |
|
|
133 | (1) |
|
3.8 Structural equation modeling (SEM) |
|
|
134 | (1) |
|
3.9 Missing data analysis and data imputation |
|
|
134 | (1) |
|
3.10 Chapter 3 command summary |
|
|
134 | (1) |
|
|
134 | (2) |
4 Classification and regression trees in R |
|
136 | (79) |
|
Part I: Overview Of Classification And Regression Trees With R |
|
|
136 | (9) |
|
|
137 | (1) |
|
4.2 Advantages of decision tree analysis |
|
|
137 | (1) |
|
4.3 Limitations of decision tree analysis |
|
|
138 | (1) |
|
4.4 Decision tree terminology |
|
|
139 | (1) |
|
4.5 Steps in decision tree analysis |
|
|
140 | (1) |
|
4.6 Decision tree algorithms |
|
|
140 | (2) |
|
4.7 Random forests and ensemble methods |
|
|
142 | (1) |
|
|
143 | (1) |
|
|
143 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
4.9 Data and packages used in this chapter |
|
|
144 | (1) |
|
|
144 | (1) |
|
|
145 | (1) |
|
Part II: Quick Start - Classification And Regression Trees |
|
|
145 | (7) |
|
4.10 Classification tree example: Survival on the Titanic |
|
|
145 | (4) |
|
4.11 Regression tree example: Correlates of murder |
|
|
149 | (3) |
|
Part III: Classification And Regression Trees, In Detail |
|
|
152 | (63) |
|
|
152 | (1) |
|
|
153 | (5) |
|
|
153 | (2) |
|
4.13.2 Training and validation datasets |
|
|
155 | (1) |
|
4.13.3 Setup for rpart() trees |
|
|
156 | (2) |
|
4.14 Classification trees with the rpart package |
|
|
158 | (31) |
|
4.14.1 The basic rpart classification tree |
|
|
158 | (2) |
|
4.14.2 Printing tree rules |
|
|
160 | (1) |
|
4.14.3 Visualization with prp() and draw.tree() |
|
|
161 | (2) |
|
4.14.4 Visualization with fancyRpartPlot() |
|
|
163 | (1) |
|
4.14.5 Interpreting tree summaries |
|
|
164 | (5) |
|
4.14.6 Listing nodes by country and countries by node |
|
|
169 | (1) |
|
4.14.7 Node distribution plots |
|
|
170 | (1) |
|
4.14.8 Saving predictions and residuals |
|
|
171 | (2) |
|
4.14.9 Cross-validation and pruning |
|
|
173 | (3) |
|
4.14.10 The confusion matrix and model performance metrics |
|
|
176 | (6) |
|
4.14.11 The ROC curve and AUC |
|
|
182 | (2) |
|
|
184 | (2) |
|
|
186 | (1) |
|
4.14.14 Precision vs. recall plot |
|
|
186 | (3) |
|
4.15 Regression trees with the rpart package |
|
|
189 | (23) |
|
|
189 | (1) |
|
4.15.2 Creating an rpart regression tree |
|
|
189 | (3) |
|
4.15.3 Printing tree rules |
|
|
192 | (1) |
|
4.15.4 Visualization with prp() and fancyRpartPlot() |
|
|
192 | (2) |
|
4.15.5 Interpreting tree summaries |
|
|
194 | (3) |
|
|
197 | (1) |
|
4.15.7 Listing nodes by country and countries by node |
|
|
198 | (1) |
|
4.15.8 Saving predictions and residuals |
|
|
199 | (1) |
|
4.15.9 Plotting residuals |
|
|
200 | (1) |
|
4.15.10 Cross-validation and pruning |
|
|
201 | (1) |
|
4.15.11 R-squared for regression trees |
|
|
202 | (3) |
|
4.15.12 MSE for regression trees |
|
|
205 | (1) |
|
4.15.13 The confusion matrix |
|
|
206 | (1) |
|
4.15.14 The ROC curve and AUC |
|
|
206 | (1) |
|
|
206 | (3) |
|
4.15.16 Gains plot with OLS comparison |
|
|
209 | (3) |
|
|
212 | (1) |
|
4.17 The ctree() program for conditional decision trees |
|
|
212 | (1) |
|
4.18 More decision trees programs for R |
|
|
212 | (1) |
|
4.19 Chapter 4 command summary |
|
|
213 | (1) |
|
|
213 | (2) |
5 Random forests |
|
215 | (76) |
|
Part I: Overview Of Random Forests In R |
|
|
215 | (3) |
|
|
215 | (3) |
|
5.1.1 Social science examples of random forest models |
|
|
215 | (1) |
|
5.1.2 Advantages of random forests |
|
|
216 | (1) |
|
5.1.3 Limitations of random forests |
|
|
217 | (1) |
|
|
217 | (1) |
|
Part II: Quick Start - Random Forests |
|
|
218 | (8) |
|
5.2 Classification forest example: Searching for the causes of happiness |
|
|
218 | (3) |
|
5.3 Regression forest example: Why so much crime in my town? |
|
|
221 | (5) |
|
Part III: Random Forests, In Detail |
|
|
226 | (65) |
|
5.4 Classification forests with randomForest() |
|
|
226 | (27) |
|
|
226 | (1) |
|
5.4.2 A basic classification model |
|
|
227 | (3) |
|
5.4.3 Output components of randomForest() objects for classification models |
|
|
230 | (8) |
|
5.4.4 Graphing a randomForest tree? |
|
|
238 | (1) |
|
5.4.5 Comparing randomForest() and rpart() performance |
|
|
239 | (2) |
|
5.4.6 Tuning the random forest model |
|
|
241 | (9) |
|
5.4.7 MDS cluster analysis of the RF classification model |
|
|
250 | (3) |
|
5.5 Regression forests with randomForest() |
|
|
253 | (19) |
|
|
253 | (1) |
|
|
254 | (1) |
|
5.5.3 A basic regression model |
|
|
254 | (2) |
|
5.5.4 Output components for regression forest models |
|
|
256 | (4) |
|
5.5.5 Graphing a randomForest tree? |
|
|
260 | (1) |
|
|
260 | (1) |
|
|
261 | (1) |
|
5.5.8 Comparing randomForest() and rpart() regression models |
|
|
262 | (1) |
|
5.5.9 Tuning the randomForest() regression model |
|
|
263 | (5) |
|
5.5.10 Outliers: Identifying and removing |
|
|
268 | (4) |
|
5.6 The randomForestExplainer package |
|
|
272 | (14) |
|
5.6.1 Setup for the randomForestExplainer package |
|
|
272 | (1) |
|
5.6.2 Minimal depth plots |
|
|
273 | (1) |
|
5.6.3 Multiway variable importance plots |
|
|
274 | (3) |
|
5.6.4 Multiway ranking of variable importance |
|
|
277 | (1) |
|
5.6.5 Comparing randomForest and OLS rankings of predictors |
|
|
278 | (2) |
|
5.6.6 Which importance criteria? |
|
|
280 | (1) |
|
5.6.7 Interaction analysis |
|
|
281 | (5) |
|
5.6.8 The explain_forest() function |
|
|
286 | (1) |
|
|
286 | (1) |
|
5.8 Conditional inference forests |
|
|
287 | (1) |
|
5.9 MDS plots for random forests |
|
|
287 | (1) |
|
5.10 More random forest programs for R |
|
|
287 | (2) |
|
|
289 | (1) |
|
|
289 | (2) |
6 Modeling and machine learning |
|
291 | (64) |
|
Part I: Overview Of Modeling And Machine Learning |
|
|
291 | (6) |
|
|
291 | (6) |
|
6.1.1 Social science examples of modeling and machine learning in R |
|
|
292 | (2) |
|
6.1.2 Advantages of modeling and machine learning in R |
|
|
294 | (1) |
|
6.1.3 Limitations of modeling and machine learning in R |
|
|
294 | (1) |
|
6.1.4 Data, packages, and default directory |
|
|
295 | (2) |
|
Part II: Quick Start - Modeling And Machine Learning |
|
|
297 | (19) |
|
6.2 Example 1: Bayesian modeling of county-level poverty |
|
|
297 | (10) |
|
|
297 | (1) |
|
|
297 | (1) |
|
|
298 | (2) |
|
6.2.4 The Bayes generalized linear model |
|
|
300 | (7) |
|
6.3 Example 2: Predicting diabetes among Pima Indians with mlr3 |
|
|
307 | (9) |
|
|
307 | (1) |
|
|
307 | (1) |
|
|
307 | (2) |
|
6.3.4 The Pima Indian data |
|
|
309 | (7) |
|
Part III: Modeling And Machine Learning In Detail |
|
|
316 | (39) |
|
6.4 Illustrating modeling and machine learning with SVM in caret |
|
|
316 | (4) |
|
|
317 | (1) |
|
6.4.2 SVM algorithms compared to logistic and OLS regression |
|
|
317 | (1) |
|
6.4.3 SVM kernels, types, and parameters |
|
|
318 | (1) |
|
|
319 | (1) |
|
6.4.5 SVM and longitudinal data |
|
|
319 | (1) |
|
6.5 SVM versus OLS regression |
|
|
320 | (1) |
|
6.6 SVM with the caret package: Predicting world literacy rates |
|
|
320 | (6) |
|
|
321 | (1) |
|
6.6.2 Constructing the SVM regression model with caret |
|
|
322 | (1) |
|
6.6.3 Obtaining predicted values and residuals |
|
|
323 | (1) |
|
6.6.4 Model performance metrics |
|
|
323 | (1) |
|
6.6.5 Variable importance |
|
|
324 | (1) |
|
6.6.6 Other output elements |
|
|
324 | (1) |
|
|
325 | (1) |
|
|
326 | (7) |
|
6.7.1 Tuning for the train() command from the caret package |
|
|
327 | (1) |
|
6.7.2 Tuning for the svm() command from the e1071 package |
|
|
328 | (2) |
|
6.7.3 Cross-validating SVM models |
|
|
330 | (1) |
|
6.7.4 Using e1071 in caret rather than the default kern package |
|
|
331 | (2) |
|
6.8 SVM classification models: Classifying U.S. Senators |
|
|
333 | (8) |
|
6.8.1 The "senate" example and setup |
|
|
333 | (1) |
|
6.8.2 SVM classification with alternative kernels: Senate example |
|
|
333 | (5) |
|
6.8.3 Tuning the SVM binary classification model |
|
|
338 | (3) |
|
6.9 Gradient boosting machines (GBM) |
|
|
341 | (4) |
|
|
341 | (1) |
|
6.9.2 Setup and example data |
|
|
342 | (1) |
|
6.9.3 Metrics for comparing models |
|
|
343 | (1) |
|
6.9.4 The caret control object |
|
|
343 | (1) |
|
6.9.5 Training the GBM model under caret |
|
|
344 | (1) |
|
6.10 Learning vector quantization (LVQ) |
|
|
345 | (2) |
|
|
345 | (1) |
|
6.10.2 Setup and example data |
|
|
346 | (1) |
|
6.10.3 Metrics for comparing models |
|
|
346 | (1) |
|
6.10.4 The caret control object |
|
|
346 | (1) |
|
6.10.5 Training the LVQ model under caret |
|
|
346 | (1) |
|
|
347 | (2) |
|
|
349 | (3) |
|
6.12.1 Leave-one-out modeling |
|
|
349 | (1) |
|
6.12.2 Recursive feature elimination (RFE) with caret |
|
|
350 | (2) |
|
6.12.3 Other approaches to variable importance |
|
|
352 | (1) |
|
6.13 SVM classification for a multinomial outcome |
|
|
352 | (1) |
|
|
352 | (1) |
|
|
352 | (3) |
7 Neural network models and deep learning |
|
355 | (46) |
|
Part I: Overview Of Neural Network Models And Deep Learning |
|
|
355 | (9) |
|
|
355 | (1) |
|
|
356 | (1) |
|
7.3 Social science examples |
|
|
357 | (1) |
|
7.4 Pros and cons of neural networks |
|
|
358 | (1) |
|
7.5 Artificial neural network (ANN) concepts |
|
|
359 | (5) |
|
|
359 | (3) |
|
7.5.2 R software programs for ANN |
|
|
362 | (1) |
|
7.5.3 Training methods for ANN |
|
|
363 | (1) |
|
7.5.4 Algorithms in neuralnet |
|
|
363 | (1) |
|
|
363 | (1) |
|
|
364 | (1) |
|
Part II: Quick Start - Modeling And Machine Learning |
|
|
364 | (11) |
|
7.6 Example 1: Analyzing NYC airline delays |
|
|
364 | (6) |
|
|
364 | (1) |
|
|
364 | (1) |
|
|
364 | (1) |
|
7.6.4 Modeling NYC airline delays |
|
|
365 | (5) |
|
7.7 Example 2: The classic iris classification example |
|
|
370 | (5) |
|
|
370 | (1) |
|
7.7.2 Exploring separation with a violin plot |
|
|
371 | (1) |
|
7.7.3 Normalizing the data |
|
|
371 | (1) |
|
7.7.4 Training the model with nnet in caret |
|
|
372 | (2) |
|
7.7.5 Obtain model predictions |
|
|
374 | (1) |
|
7.7.6 Display the neural model |
|
|
375 | (1) |
|
Part III: Neural Network Models In Detail |
|
|
375 | (26) |
|
7.8 Analyzing Boston crime via the neuralnet package |
|
|
375 | (11) |
|
|
376 | (1) |
|
7.8.2 The linear regression model for unscaled data |
|
|
377 | (2) |
|
7.8.3 The neuralnet model for unscaled data |
|
|
379 | (1) |
|
|
379 | (1) |
|
7.8.5 The linear regression model for scaled data |
|
|
379 | (1) |
|
7.8.6 The neuralnet model for scaled data |
|
|
380 | (1) |
|
7.8.7 Neuralnet results for the training data |
|
|
381 | (1) |
|
7.8.8 Model performance plots |
|
|
382 | (1) |
|
7.8.9 Visualizing the neuralnet model |
|
|
383 | (1) |
|
7.8.10 Variable importance for the neuralnet model |
|
|
384 | (2) |
|
7.9 Analyzing Boston crime via neuralnet under the caret package |
|
|
386 | (1) |
|
7.10 Analyzing Boston crime via nnet in caret |
|
|
386 | (9) |
|
|
387 | (1) |
|
7.10.2 The nnet/caret model of Boston crime |
|
|
388 | (4) |
|
7.10.3 Variable importance for the nnet/caret model |
|
|
392 | (1) |
|
7.10.4 Further tuning the nnet model outside caret |
|
|
393 | (2) |
|
7.11 A classification model of marital status using nnet |
|
|
395 | (5) |
|
|
395 | (2) |
|
7.11.2 The nnet classification model of marital status |
|
|
397 | (3) |
|
7.12 Neural network analysis using "mlr3keras" |
|
|
400 | (1) |
|
|
400 | (1) |
|
|
400 | (1) |
8 Network analysis |
|
401 | (102) |
|
Part I: Overview Of Network Analysis With R |
|
|
401 | (4) |
|
|
401 | (1) |
|
8.2 Data and packages used in this chapter |
|
|
401 | (2) |
|
8.3 Concepts in network analysis |
|
|
403 | (1) |
|
8.4 Getting data into network format |
|
|
404 | (1) |
|
Part II: Quick Start On Network Analysis With R |
|
|
405 | (11) |
|
8.5 Quick start exercise 1: The Medici family network |
|
|
405 | (4) |
|
8.6 Quick start exercise 2: Marvel hero network communities |
|
|
409 | (7) |
|
Part III: Network Analysis With R In Detail |
|
|
416 | (87) |
|
8.7 Interactive network analysis with visNetwork |
|
|
416 | (13) |
|
8.7.1 Undirected networks: Research team management |
|
|
417 | (4) |
|
8.7.2 Clustering by group: Research team grouped by gender |
|
|
421 | (1) |
|
8.7.3 A larger network with navigation and circle layout |
|
|
422 | (3) |
|
8.7.4 Visualizing classification and regression trees: National literacy |
|
|
425 | (1) |
|
8.7.5 A directed network (asymmetrical relationships in a research team) |
|
|
426 | (3) |
|
8.8 Network analysis with igraph |
|
|
429 | (24) |
|
8.8.1 Term adjacency networks: Gubernatorial websites and the covid pandemic |
|
|
429 | (7) |
|
8.8.2 Similarity/distance networks with igraph: Senate interest group ratings |
|
|
436 | (4) |
|
8.8.3 Communities, modularity, and centrality |
|
|
440 | (7) |
|
8.8.4 Similarity network analysis: All senators |
|
|
447 | (6) |
|
8.9 Using intergraph for network conversions |
|
|
453 | (4) |
|
8.10 Network-on-a-map with the diagram and maps packages |
|
|
457 | (5) |
|
8.11 Network analysis with the statnet and network packages |
|
|
462 | (11) |
|
|
462 | (5) |
|
|
467 | (3) |
|
|
470 | (2) |
|
|
472 | (1) |
|
8.12 Clique analysis with sna |
|
|
473 | (8) |
|
8.12.1 A simplified clique analysis |
|
|
473 | (2) |
|
8.12.2 A clique analysis of the DHHS formal network |
|
|
475 | (6) |
|
8.12.3 K-core analysis of the DHHS formal network |
|
|
481 | (1) |
|
8.13 Mapping international trade flow with statnet and Intergraph |
|
|
481 | (1) |
|
8.14 Correlation networks with corrr |
|
|
481 | (3) |
|
8.15 Network analysis with tidygraph |
|
|
484 | (10) |
|
|
484 | (1) |
|
8.15.2 A simple tidygraph example |
|
|
484 | (6) |
|
8.15.3 Network conversions with tidygraph |
|
|
490 | (1) |
|
8.15.4 Finding community clusters with tidygraph |
|
|
491 | (3) |
|
|
494 | (6) |
|
8.16.1 Agent-based network modeling with SchellingR |
|
|
494 | (5) |
|
8.16.2 Agent-based network modeling with RSiena |
|
|
499 | (1) |
|
8.16.3 Agent-based network modeling with NetLogoR |
|
|
499 | (1) |
|
|
500 | (1) |
|
|
501 | (1) |
|
|
501 | (2) |
9 Text analytics |
|
503 | (110) |
|
Part I: Overview Of Text Analytics With R |
|
|
503 | (13) |
|
|
503 | (1) |
|
9.2 Data used in this chapter |
|
|
503 | (1) |
|
9.3 Packages used in this chapter |
|
|
504 | (1) |
|
|
505 | (1) |
|
|
505 | (11) |
|
|
505 | (1) |
|
|
505 | (1) |
|
9.5.3 Project Gutenberg archive |
|
|
506 | (3) |
|
9.5.4 Comma-separated values (.csv) files |
|
|
509 | (1) |
|
9.5.5 Text from Word .docx files with the textreadr package |
|
|
509 | (3) |
|
9.5.6 Text from other formats with the readtext package |
|
|
512 | (2) |
|
9.5.7 Text from raw text files |
|
|
514 | (2) |
|
Part II: Quick Start On Text Analytics With R |
|
|
516 | (7) |
|
9.6 Quick start exercise 1: Key word in context (kwic) indexing |
|
|
516 | (2) |
|
9.7 Quick start exercise 2: Word frequencies and histograms |
|
|
518 | (5) |
|
Part III: Network Analysis With R In Detail |
|
|
523 | (90) |
|
|
523 | (8) |
|
|
523 | (1) |
|
9.8.2 Web scraping: The "htm2txt" package |
|
|
524 | (3) |
|
9.8.3 Web scraping: The "rvest" package |
|
|
527 | (4) |
|
9.9 Social media scraping |
|
|
531 | (8) |
|
9.9.1 Analysis of Twitter data: Trump and the New York Times |
|
|
532 | (4) |
|
9.9.2 Social media scraping with twitter |
|
|
536 | (3) |
|
9.10 Leading text formats in R |
|
|
539 | (15) |
|
|
539 | (1) |
|
9.10.2 Formats related to the "tidytext" package |
|
|
540 | (3) |
|
9.10.3 Formats related to the "tm" package |
|
|
543 | (4) |
|
9.10.4 Formats related to the "quanteda" package |
|
|
547 | (5) |
|
9.10.5 Common text file conversions |
|
|
552 | (2) |
|
|
554 | (3) |
|
|
554 | (1) |
|
|
554 | (3) |
|
|
557 | (2) |
|
9.13 Text cleaning and preparation |
|
|
559 | (1) |
|
9.14 Analysis: Multigroup word frequency comparisons |
|
|
559 | (8) |
|
9.14.1 Multigroup analysis in tidytext |
|
|
559 | (4) |
|
9.14.2 Multigroup analysis with quanteda's textstat_keyness() command |
|
|
563 | (3) |
|
9.14.3 Multigroup analysis with textstat frequency() in quanteda and ggplot2 |
|
|
566 | (1) |
|
9.15 Analysis: Word clouds |
|
|
567 | (5) |
|
9.16 Analysis: Comparison clouds |
|
|
572 | (2) |
|
9.17 Analysis: Word maps and word correlations |
|
|
574 | (13) |
|
9.17.1 Working with the tdm format |
|
|
574 | (1) |
|
9.17.2 Working with the dtm format |
|
|
575 | (1) |
|
9.17.3 Word frequencies and word correlations |
|
|
576 | (1) |
|
9.17.4 Correlation plots of word and document associations |
|
|
577 | (4) |
|
9.17.5 Plotting word stem correlations for word pairs |
|
|
581 | (3) |
|
9.17.6 Word correlation maps |
|
|
584 | (3) |
|
9.18 Analysis: Sentiment analysis |
|
|
587 | (9) |
|
|
587 | (1) |
|
9.18.2 Example: Sentiment analysis of news articles |
|
|
587 | (9) |
|
9.19 Analysis: Topic modeling |
|
|
596 | (14) |
|
|
596 | (1) |
|
9.19.2 Topic analysis example 1: Modeling topic frequency over time |
|
|
597 | (6) |
|
9.19.3 Topic analysis example 2: LDA analysis |
|
|
603 | (7) |
|
9.20 Analysis: Lexical dispersion plots |
|
|
610 | (1) |
|
9.21 Analysis: Bigrams and ngrams |
|
|
611 | (1) |
|
|
612 | (1) |
|
|
612 | (1) |
Appendix 1: Introduction to R and RStudio |
|
613 | (45) |
Appendix 2: Data used in this book |
|
658 | (10) |
References |
|
668 | (10) |
Index |
|
678 | |