Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Analytics for the Social Sciences: Applications in R

G. David Garson (North Carolina State University, Raleigh, USA)

Formaat: 704 pages
Ilmumisaeg: 29-Nov-2021
Kirjastus: Routledge
ISBN-13: 9781000467161

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 110,49 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 704 pages
Ilmumisaeg: 29-Nov-2021
Kirjastus: Routledge
ISBN-13: 9781000467161

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

This book presents a complete exploration of statistical data analysis in R for a wide variety of social science disciplines and quantitative methods courses.

Data Analytics for the Social Sciences is an introductory, graduate-level treatment of data analytics for social science. It features applications in the R language, arguably the fastest growing and leading statistical tool for researchers.

The book starts with an ethics chapter on the uses and potential abuses of data analytics. Chapters 2 and 3 show how to implement a broad range of statistical procedures in R. Chapters 4 and 5 deal with regression and classification trees and with random forests. Chapter 6 deals with machine learning models and the "caret" package, which makes available to the researcher hundreds of models. Chapter 7 deals with neural network analysis and Chapter 8 with network analysis and visualization of network data. A final chapter treats text analysis, including web scraping, comparative word frequency tables, word clouds, word maps, sentiment analysis, topic analysis, and more. All empirical chapters have two "Quick Start" exercises designed to allow quick immersion in chapter topics, followed by "In Depth" coverage. Data are available for all examples and runnable R code is provided in a "Command Summary". An appendix provides an extended tutorial on R and RStudio. Over 30 online supplements for each chapter provide "books within the book" on a variety of topics, such as agent-based modelling.

Rather than focusing on equations, derivations and proofs, this book emphasises hands-on obtaining of output for various social science models and on how to interpret the output. It is suitable for all advanced level undergraduate and postgraduate students learning statistical data analysis.

Acknowledgments

xvi

Preface

xvii

1 Using and abusing data analytics in social science

(21)

1.1 Introduction

(2)

1.2 The promise of data analytics for social science

(1)

1.2.1 Data analytics in public affairs and public policy

(1)

1.2.2 Data analytics in the social sciences

(1)

1.2.3 Data analytics in the humanities

(1)

1.3 Research design issues in data analytics

(6)

1.3.1 Beware the true believer

(1)

1.3.2 Pseudo-objectivity in data analytics

(1)

1.3.3 The bias of scholarship based on algorithms using big data

(3)

1.3.4 The subjectivity of algorithms

(1)

1.3.5 Big data and big noise

(1)

1.3.6 Limitations of the leading data science dissemination models

(1)

1.4 Social and ethical issues in data analytics

(9)

1.4.1 Types of ethical issues in data analytics

(1)

1.4.2 Bias toward the privileged

(1)

1.4.3 Discrimination

(1)

1.4.4 Diversity and data analytics

(1)

1.4.5 Distortion of democratic processes

(1)

1.4.6 Undermining of professional ethics

(1)

1.4.7 Privacy, profiling, and surveillance issues

(3)

1.4.8 The transparency issue

(1)

1.5 Summary: Technology and power

(2)

Endnotes

(1)

2 Statistical analytics with R, Part 1

(69)

Part I: Overview Of Statistical Analysis With R

(2)

2.1 Introduction

(1)

2.2 Data and packages used in this chapter

(2)

2.2.1 Example data

(1)

2.2.2 R packages used

(1)

Part II: Quick Start On Statistical Analysis With R

(9)

2.3 Descriptive statistics

(2)

2.4 Linear multiple regression

(7)

Part III: Statistical Analysis With R In Detail

(58)

2.5 Hypothesis testing

(3)

2.5.1 One-sample test of means

(1)

2.5.2 Means test for two independent samples

(1)

2.5.3 Means test for two dependent samples

(1)

2.6 Crosstabulation, significance, and association

(2)

2.7 Loglinear analysis for categorical variables

(1)

2.8 Correlation, correlograms, and scatterplots

(5)

2.9 Factor analysis (exploratory)

(1)

2.10 Multidimensional scaling

(1)

2.11 Reliability analysis

(5)

2.11.1 Cronbach's alpha and Guttman's lower bounds

(1)

2.11.2 Guttman's lower bounds and Cronbach's alpha

(2)

2.11.3 Krippendorff's alpha and Cohen's kappa

(1)

2.12 Cluster analysis

(11)

2.12.1 Hierarchical cluster analysis

(1)

2.12.2 K-means clustering

(9)

2.12.3 Nearest neighbor analysis

(1)

2.13 Analysis of variance

(13)

2.13.1 Data and packages used

(1)

2.13.2 GLM univariate: ANOVA

(5)

2.13.3 GLM univariate: ANCOVA

(1)

2.13.4 GLM multivariate: MANOVA

(3)

2.13.5 GLM multivariate: MANCOVA

(3)

2.14 Logistic regression

(6)

2.14.1 ROC and AUC analysis

(1)

2.14.2 Confusion table and accuracy

(2)

2.15 Mediation and moderation

(10)

2.16
Chapter 2 command summary

(1)

Endnotes

(2)

3 Statistical analytics with R, Part 2

(45)

Part I: Overview Of Statistical Analytics With R

(1)

3.1 Introduction

(1)

3.2 Data and packages used in this chapter

(1)

3.2.1 Example data

(1)

3.2.2 R Packages used

(1)

Part II: Quick Start On Statistical Analysis Part 2

(9)

3.3 Quick start: Linear regression as a generalized linear modeling (GZLM)

(7)

3.3.1 Background to GZLM

(1)

3.3.2 The linear model in glm()

(1)

3.3.3 GZLM output

(1)

3.3.4 Fitted value, residuals, and plots

(3)

3.3.5 Noncanonical custom links

(1)

3.3.6 Multiple comparison tests

(1)

3.3.7 Estimated marginal means (EMM)

(1)

3.4 Quick start: Testing if multilevel modeling is needed

(2)

Part III: Statistical Analysis, Part 2, In Detail

101

(35)

3.5 Generalized linear models (GZLM)

101

(14)

3.5.1 Introduction

101

(2)

3.5.2 Setup for GZLM models in R

103

(1)

3.5.3 Binary logistic regression example

104

(1)

3.5.4 Gamma regression model

105

(3)

3.5.5 Poisson regression model

108

(5)

3.5.6 Negative binomial regression

113

(2)

3.6 Multilevel modeling (MLM)

115

(4)

3.6.1 Introduction

115

(1)

3.6.2 Setup and data

115

(1)

3.6.3 The random coefficients model

116

(3)

3.6.4 Likelihood ratio test

119

(1)

3.7 Panel data regression (PDR)

119

(15)

3.7.1 Introduction

119

(1)

3.7.2 Types of PDR model

120

(2)

3.7.3 The Hausman test

122

(1)

3.7.4 Setup and data

123

(1)

3.7.5 PDR with the plm package

124

(9)

3.7.6 PDR with the panelr package

133

(1)

3.8 Structural equation modeling (SEM)

134

(1)

3.9 Missing data analysis and data imputation

134

(1)

3.10
Chapter 3 command summary

134

(1)

Endnotes

134

(2)

4 Classification and regression trees in R

136

(79)

Part I: Overview Of Classification And Regression Trees With R

136

(9)

4.1 Introduction

137

(1)

4.2 Advantages of decision tree analysis

137

(1)

4.3 Limitations of decision tree analysis

138

(1)

4.4 Decision tree terminology

139

(1)

4.5 Steps in decision tree analysis

140

(1)

4.6 Decision tree algorithms

140

(2)

4.7 Random forests and ensemble methods

142

(1)

4.8 Software

143

(1)

4.8.1 R language

143

(1)

4.8.2 Stata

144

(1)

4.8.3 SAS

144

(1)

4.8.4 SPSS

144

(1)

4.8.5 Python language

144

(1)

4.9 Data and packages used in this chapter

144

(1)

4.9.1 Example data

144

(1)

4.9.2 R packages used

145

(1)

Part II: Quick Start - Classification And Regression Trees

145

(7)

4.10 Classification tree example: Survival on the Titanic

145

(4)

4.11 Regression tree example: Correlates of murder

149

(3)

Part III: Classification And Regression Trees, In Detail

152

(63)

4.12 Overview

152

(1)

4.13 The rpart() program

153

(5)

4.13.1 Introduction

153

(2)

4.13.2 Training and validation datasets

155

(1)

4.13.3 Setup for rpart() trees

156

(2)

4.14 Classification trees with the rpart package

158

(31)

4.14.1 The basic rpart classification tree

158

(2)

4.14.2 Printing tree rules

160

(1)

4.14.3 Visualization with prp() and draw.tree()

161

(2)

4.14.4 Visualization with fancyRpartPlot()

163

(1)

4.14.5 Interpreting tree summaries

164

(5)

4.14.6 Listing nodes by country and countries by node

169

(1)

4.14.7 Node distribution plots

170

(1)

4.14.8 Saving predictions and residuals

171

(2)

4.14.9 Cross-validation and pruning

173

(3)

4.14.10 The confusion matrix and model performance metrics

176

(6)

4.14.11 The ROC curve and AUC

182

(2)

4.14.12 Lift plots

184

(2)

4.14.13 Gains plots

186

(1)

4.14.14 Precision vs. recall plot

186

(3)

4.15 Regression trees with the rpart package

189

(23)

4.15.1 Setup

189

(1)

4.15.2 Creating an rpart regression tree

189

(3)

4.15.3 Printing tree rules

192

(1)

4.15.4 Visualization with prp() and fancyRpartPlot()

192

(2)

4.15.5 Interpreting tree summaries

194

(3)

4.15.6 The CP table

197

(1)

4.15.7 Listing nodes by country and countries by node

198

(1)

4.15.8 Saving predictions and residuals

199

(1)

4.15.9 Plotting residuals

200

(1)

4.15.10 Cross-validation and pruning

201

(1)

4.15.11 R-squared for regression trees

202

(3)

4.15.12 MSE for regression trees

205

(1)

4.15.13 The confusion matrix

206

(1)

4.15.14 The ROC curve and AUC

206

(1)

4.15.15 Gains plots

206

(3)

4.15.16 Gains plot with OLS comparison

209

(3)

4.16 The tree package

212

(1)

4.17 The ctree() program for conditional decision trees

212

(1)

4.18 More decision trees programs for R

212

(1)

4.19
Chapter 4 command summary

213

(1)

Endnotes

213

(2)

5 Random forests

215

(76)

Part I: Overview Of Random Forests In R

215

(3)

5.1 Introduction

215

(3)

5.1.1 Social science examples of random forest models

215

(1)

5.1.2 Advantages of random forests

216

(1)

5.1.3 Limitations of random forests

217

(1)

5.1.4 Data and packages

217

(1)

Part II: Quick Start - Random Forests

218

(8)

5.2 Classification forest example: Searching for the causes of happiness

218

(3)

5.3 Regression forest example: Why so much crime in my town?

221

(5)

Part III: Random Forests, In Detail

226

(65)

5.4 Classification forests with randomForest()

226

(27)

5.4.1 Setup

226

(1)

5.4.2 A basic classification model

227

(3)

5.4.3 Output components of randomForest() objects for classification models

230

(8)

5.4.4 Graphing a randomForest tree?

238

(1)

5.4.5 Comparing randomForest() and rpart() performance

239

(2)

5.4.6 Tuning the random forest model

241

(9)

5.4.7 MDS cluster analysis of the RF classification model

250

(3)

5.5 Regression forests with randomForest()

253

(19)

5.5.1 Introduction

253

(1)

5.5.2 Setup

254

(1)

5.5.3 A basic regression model

254

(2)

5.5.4 Output components for regression forest models

256

(4)

5.5.5 Graphing a randomForest tree?

260

(1)

5.5.6 MDS plots

260

(1)

5.5.7 Quartile plots

261

(1)

5.5.8 Comparing randomForest() and rpart() regression models

262

(1)

5.5.9 Tuning the randomForest() regression model

263

(5)

5.5.10 Outliers: Identifying and removing

268

(4)

5.6 The randomForestExplainer package

272

(14)

5.6.1 Setup for the randomForestExplainer package

272

(1)

5.6.2 Minimal depth plots

273

(1)

5.6.3 Multiway variable importance plots

274

(3)

5.6.4 Multiway ranking of variable importance

277

(1)

5.6.5 Comparing randomForest and OLS rankings of predictors

278

(2)

5.6.6 Which importance criteria?

280

(1)

5.6.7 Interaction analysis

281

(5)

5.6.8 The explain_forest() function

286

(1)

5.7 Summary

286

(1)

5.8 Conditional inference forests

287

(1)

5.9 MDS plots for random forests

287

(1)

5.10 More random forest programs for R

287

(2)

5.11 Command summary

289

(1)

Endnotes

289

(2)

6 Modeling and machine learning

291

(64)

Part I: Overview Of Modeling And Machine Learning

291

(6)

6.1 Introduction

291

(6)

6.1.1 Social science examples of modeling and machine learning in R

292

(2)

6.1.2 Advantages of modeling and machine learning in R

294

(1)

6.1.3 Limitations of modeling and machine learning in R

294

(1)

6.1.4 Data, packages, and default directory

295

(2)

Part II: Quick Start - Modeling And Machine Learning

297

(19)

6.2 Example 1: Bayesian modeling of county-level poverty

297

(10)

6.2.1 Introduction

297

(1)

6.2.2 Setup

297

(1)

6.2.3 Correlation plot

298

(2)

6.2.4 The Bayes generalized linear model

300

(7)

6.3 Example 2: Predicting diabetes among Pima Indians with mlr3

307

(9)

6.3.1 Introduction

307

(1)

6.3.2 Setup

307

(1)

6.3.3 How mlr3 works

307

(2)

6.3.4 The Pima Indian data

309

(7)

Part III: Modeling And Machine Learning In Detail

316

(39)

6.4 Illustrating modeling and machine learning with SVM in caret

316

(4)

6.4.1 How SVM works

317

(1)

6.4.2 SVM algorithms compared to logistic and OLS regression

317

(1)

6.4.3 SVM kernels, types, and parameters

318

(1)

6.4.4 Tuning SVM models

319

(1)

6.4.5 SVM and longitudinal data

319

(1)

6.5 SVM versus OLS regression

320

(1)

6.6 SVM with the caret package: Predicting world literacy rates

320

(6)

6.6.1 Setup

321

(1)

6.6.2 Constructing the SVM regression model with caret

322

(1)

6.6.3 Obtaining predicted values and residuals

323

(1)

6.6.4 Model performance metrics

323

(1)

6.6.5 Variable importance

324

(1)

6.6.6 Other output elements

324

(1)

6.6.7 SVM plots

325

(1)

6.7 Tuning SVM models

326

(7)

6.7.1 Tuning for the train() command from the caret package

327

(1)

6.7.2 Tuning for the svm() command from the e1071 package

328

(2)

6.7.3 Cross-validating SVM models

330

(1)

6.7.4 Using e1071 in caret rather than the default kern package

331

(2)

6.8 SVM classification models: Classifying U.S. Senators

333

(8)

6.8.1 The "senate" example and setup

333

(1)

6.8.2 SVM classification with alternative kernels: Senate example

333

(5)

6.8.3 Tuning the SVM binary classification model

338

(3)

6.9 Gradient boosting machines (GBM)

341

(4)

6.9.1 Introduction

341

(1)

6.9.2 Setup and example data

342

(1)

6.9.3 Metrics for comparing models

343

(1)

6.9.4 The caret control object

343

(1)

6.9.5 Training the GBM model under caret

344

(1)

6.10 Learning vector quantization (LVQ)

345

(2)

6.10.1 Introduction

345

(1)

6.10.2 Setup and example data

346

(1)

6.10.3 Metrics for comparing models

346

(1)

6.10.4 The caret control object

346

(1)

6.10.5 Training the LVQ model under caret

346

(1)

6.11 Comparing models

347

(2)

6.12 Variable importance

349

(3)

6.12.1 Leave-one-out modeling

349

(1)

6.12.2 Recursive feature elimination (RFE) with caret

350

(2)

6.12.3 Other approaches to variable importance

352

(1)

6.13 SVM classification for a multinomial outcome

352

(1)

6.14 Command summary

352

(1)

Endnotes

352

(3)

7 Neural network models and deep learning

355

(46)

Part I: Overview Of Neural Network Models And Deep Learning

355

(9)

7.1 Overview

355

(1)

7.2 Data and packages

356

(1)

7.3 Social science examples

357

(1)

7.4 Pros and cons of neural networks

358

(1)

7.5 Artificial neural network (ANN) concepts

359

(5)

7.5.1 ANN terms

359

(3)

7.5.2 R software programs for ANN

362

(1)

7.5.3 Training methods for ANN

363

(1)

7.5.4 Algorithms in neuralnet

363

(1)

7.5.5 Algorithms in nnet

363

(1)

7.5.6 Tuning ANN models

364

(1)

Part II: Quick Start - Modeling And Machine Learning

364

(11)

7.6 Example 1: Analyzing NYC airline delays

364

(6)

7.6.1 Introduction

364

(1)

7.6.2 General setup

364

(1)

7.6.3 Data preparation

364

(1)

7.6.4 Modeling NYC airline delays

365

(5)

7.7 Example 2: The classic iris classification example

370

(5)

7.7.1 Setup

370

(1)

7.7.2 Exploring separation with a violin plot

371

(1)

7.7.3 Normalizing the data

371

(1)

7.7.4 Training the model with nnet in caret

372

(2)

7.7.5 Obtain model predictions

374

(1)

7.7.6 Display the neural model

375

(1)

Part III: Neural Network Models In Detail

375

(26)

7.8 Analyzing Boston crime via the neuralnet package

375

(11)

7.8.1 Setup

376

(1)

7.8.2 The linear regression model for unscaled data

377

(2)

7.8.3 The neuralnet model for unscaled data

379

(1)

7.8.4 Scaling the data

379

(1)

7.8.5 The linear regression model for scaled data

379

(1)

7.8.6 The neuralnet model for scaled data

380

(1)

7.8.7 Neuralnet results for the training data

381

(1)

7.8.8 Model performance plots

382

(1)

7.8.9 Visualizing the neuralnet model

383

(1)

7.8.10 Variable importance for the neuralnet model

384

(2)

7.9 Analyzing Boston crime via neuralnet under the caret package

386

(1)

7.10 Analyzing Boston crime via nnet in caret

386

(9)

7.10.1 Setup

387

(1)

7.10.2 The nnet/caret model of Boston crime

388

(4)

7.10.3 Variable importance for the nnet/caret model

392

(1)

7.10.4 Further tuning the nnet model outside caret

393

(2)

7.11 A classification model of marital status using nnet

395

(5)

7.11.1 Setup

395

(2)

7.11.2 The nnet classification model of marital status

397

(3)

7.12 Neural network analysis using "mlr3keras"

400

(1)

7.13 Command summary

400

(1)

Endnotes

400

(1)

8 Network analysis

401

(102)

Part I: Overview Of Network Analysis With R

401

(4)

8.1 Introduction

401

(1)

8.2 Data and packages used in this chapter

401

(2)

8.3 Concepts in network analysis

403

(1)

8.4 Getting data into network format

404

(1)

Part II: Quick Start On Network Analysis With R

405

(11)

8.5 Quick start exercise 1: The Medici family network

405

(4)

8.6 Quick start exercise 2: Marvel hero network communities

409

(7)

Part III: Network Analysis With R In Detail

416

(87)

8.7 Interactive network analysis with visNetwork

416

(13)

8.7.1 Undirected networks: Research team management

417

(4)

8.7.2 Clustering by group: Research team grouped by gender

421

(1)

8.7.3 A larger network with navigation and circle layout

422

(3)

8.7.4 Visualizing classification and regression trees: National literacy

425

(1)

8.7.5 A directed network (asymmetrical relationships in a research team)

426

(3)

8.8 Network analysis with igraph

429

(24)

8.8.1 Term adjacency networks: Gubernatorial websites and the covid pandemic

429

(7)

8.8.2 Similarity/distance networks with igraph: Senate interest group ratings

436

(4)

8.8.3 Communities, modularity, and centrality

440

(7)

8.8.4 Similarity network analysis: All senators

447

(6)

8.9 Using intergraph for network conversions

453

(4)

8.10 Network-on-a-map with the diagram and maps packages

457

(5)

8.11 Network analysis with the statnet and network packages

462

(11)

8.11.1 Introduction

462

(5)

8.11.2 Visualization

467

(3)

8.11.3 Neighborhoods

470

(2)

8.11.4 Cluster analysis

472

(1)

8.12 Clique analysis with sna

473

(8)

8.12.1 A simplified clique analysis

473

(2)

8.12.2 A clique analysis of the DHHS formal network

475

(6)

8.12.3 K-core analysis of the DHHS formal network

481

(1)

8.13 Mapping international trade flow with statnet and Intergraph

481

(1)

8.14 Correlation networks with corrr

481

(3)

8.15 Network analysis with tidygraph

484

(10)

8.15.1 Introduction

484

(1)

8.15.2 A simple tidygraph example

484

(6)

8.15.3 Network conversions with tidygraph

490

(1)

8.15.4 Finding community clusters with tidygraph

491

(3)

8.16 Simulating networks

494

(6)

8.16.1 Agent-based network modeling with SchellingR

494

(5)

8.16.2 Agent-based network modeling with RSiena

499

(1)

8.16.3 Agent-based network modeling with NetLogoR

499

(1)

8.17 Summary

500

(1)

8.18 Command summary

501

(1)

Endnotes

501

(2)

9 Text analytics

503

(110)

Part I: Overview Of Text Analytics With R

503

(13)

9.1 Overview

503

(1)

9.2 Data used in this chapter

503

(1)

9.3 Packages used in this chapter

504

(1)

9.4 What is a corpus?

505

(1)

9.5 Text files

505

(11)

9.5.1 Overview

505

(1)

9.5.2 Archived texts

505

(1)

9.5.3 Project Gutenberg archive

506

(3)

9.5.4 Comma-separated values (.csv) files

509

(1)

9.5.5 Text from Word .docx files with the textreadr package

509

(3)

9.5.6 Text from other formats with the readtext package

512

(2)

9.5.7 Text from raw text files

514

(2)

Part II: Quick Start On Text Analytics With R

516

(7)

9.6 Quick start exercise 1: Key word in context (kwic) indexing

516

(2)

9.7 Quick start exercise 2: Word frequencies and histograms

518

(5)

Part III: Network Analysis With R In Detail

523

(90)

9.8 Web scraping

523

(8)

9.8.1 Overview

523

(1)

9.8.2 Web scraping: The "htm2txt" package

524

(3)

9.8.3 Web scraping: The "rvest" package

527

(4)

9.9 Social media scraping

531

(8)

9.9.1 Analysis of Twitter data: Trump and the New York Times

532

(4)

9.9.2 Social media scraping with twitter

536

(3)

9.10 Leading text formats in R

539

(15)

9.10.1 Overview

539

(1)

9.10.2 Formats related to the "tidytext" package

540

(3)

9.10.3 Formats related to the "tm" package

543

(4)

9.10.4 Formats related to the "quanteda" package

547

(5)

9.10.5 Common text file conversions

552

(2)

9.11 Tokenization

554

(3)

9.11.1 Overview

554

(1)

9.11.2 Word tokenization

554

(3)

9.12 Character encoding

557

(2)

9.13 Text cleaning and preparation

559

(1)

9.14 Analysis: Multigroup word frequency comparisons

559

(8)

9.14.1 Multigroup analysis in tidytext

559

(4)

9.14.2 Multigroup analysis with quanteda's textstat_keyness() command

563

(3)

9.14.3 Multigroup analysis with textstat frequency() in quanteda and ggplot2

566

(1)

9.15 Analysis: Word clouds

567

(5)

9.16 Analysis: Comparison clouds

572

(2)

9.17 Analysis: Word maps and word correlations

574

(13)

9.17.1 Working with the tdm format

574

(1)

9.17.2 Working with the dtm format

575

(1)

9.17.3 Word frequencies and word correlations

576

(1)

9.17.4 Correlation plots of word and document associations

577

(4)

9.17.5 Plotting word stem correlations for word pairs

581

(3)

9.17.6 Word correlation maps

584

(3)

9.18 Analysis: Sentiment analysis

587

(9)

9.18.1 Overview

587

(1)

9.18.2 Example: Sentiment analysis of news articles

587

(9)

9.19 Analysis: Topic modeling

596

(14)

9.19.1 Overview

596

(1)

9.19.2 Topic analysis example 1: Modeling topic frequency over time

597

(6)

9.19.3 Topic analysis example 2: LDA analysis

603

(7)

9.20 Analysis: Lexical dispersion plots

610

(1)

9.21 Analysis: Bigrams and ngrams

611

(1)

9.22 Command summary

612

(1)

Endnotes

612

(1)

Appendix 1: Introduction to R and RStudio

613

(45)

Appendix 2: Data used in this book

658

(10)

References

668

(10)

Index

678

G. David Garson teaches advanced research methodology in the School of Public and International Affairs, North Carolina State University, USA. Founder and longtime editor emeritus of the Social Science Computer Review, he is president of Statistical Associates Publishing, which provides free digital texts worldwide. His degrees are from Princeton University (BA, 1965) and Harvard University (PhD, 1969).

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97810004671616e.html

Märksõnad:

E-raamat: Data Analytics for the Social Sciences: Applications in R

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv