|
|
xxiii | |
Introduction |
|
1 | (6) |
|
|
|
|
|
|
|
Acknowledgments |
|
7 | (1) |
References |
|
7 | (2) |
|
Section 1 The New Survey Landscape |
|
|
9 | (122) |
|
1 Why Machines Matter for Survey and Social Science Researchers: Exploring Applications of Machine Learning Methods for Design, Data Collection, and Analysis |
|
|
11 | (52) |
|
|
|
|
11 | (2) |
|
1.2 Overview of Machine Learning Methods and Their Evaluation |
|
|
13 | (3) |
|
1.3 Creating Sample Designs and Constructing Sampling Frames Using Machine Learning Methods |
|
|
16 | (7) |
|
1.3.1 Sample Design Creation |
|
|
16 | (2) |
|
1.3.2 Sample Frame Construction |
|
|
18 | (2) |
|
1.3.3 Considerations and Implications for Applying Machine Learning Methods for Creating Sampling Frames and Designs |
|
|
20 | (1) |
|
1.3.3.1 Considerations About Algorithmic Optimization |
|
|
20 | (1) |
|
1.3.3.2 Implications About Machine Learning Model Error |
|
|
21 | (1) |
|
1.3.3.3 Data Type Considerations and Implications About Data Errors |
|
|
22 | (1) |
|
1.4 Questionnaire Design and Evaluation Using Machine Learning Methods |
|
|
23 | (5) |
|
|
24 | (2) |
|
1.4.2 Evaluation and Testing |
|
|
26 | (1) |
|
1.4.3 Instrumentation and Interviewer Training |
|
|
27 | (1) |
|
1.4.4 Alternative Data Sources |
|
|
28 | (1) |
|
1.5 Survey Recruitment and Data Collection Using Machine Learning Methods |
|
|
28 | (5) |
|
1.5.1 Monitoring and Interviewer Falsification |
|
|
29 | (1) |
|
1.5.2 Responsive and Adaptive Designs |
|
|
29 | (4) |
|
1.6 Survey Data Coding and Processing Using Machine Learning Methods |
|
|
33 | (4) |
|
1.6.1 Coding Unstructured Text |
|
|
33 | (2) |
|
1.6.2 Data Validation and Editing |
|
|
35 | (1) |
|
|
35 | (1) |
|
1.6.4 Record Linkage and Duplicate Detection |
|
|
36 | (1) |
|
1.7 Sample Weighting and Survey Adjustments Using Machine Learning Methods |
|
|
37 | (6) |
|
1.7.1 Propensity Score Estimation |
|
|
37 | (4) |
|
|
41 | (2) |
|
1.8 Survey Data Analysis and Estimation Using Machine Learning Methods |
|
|
43 | (4) |
|
1.8.1 Gaining Insights Among Survey Variables |
|
|
44 | (1) |
|
1.8.2 Adapting Machine Learning Methods to the Survey Setting |
|
|
45 | (1) |
|
1.8.3 Leveraging Machine Learning Algorithms for Finite Population Inference |
|
|
46 | (1) |
|
1.9 Discussion and Conclusions |
|
|
47 | (16) |
|
|
48 | (12) |
|
|
60 | (3) |
|
2 The Future Is Now: How Surveys Can Harness Social Media to Address Twenty-first Century Challenges |
|
|
63 | (36) |
|
|
|
|
|
63 | (4) |
|
2.2 New Ways of Thinking About Survey Research |
|
|
67 | (1) |
|
2.3 The Challenge with Sampling People |
|
|
67 | (5) |
|
2.3.1 The Social Media Opportunities |
|
|
68 | (1) |
|
2.3.1.1 Venue-Based, Time-Space Sampling |
|
|
68 | (2) |
|
2.3.1.2 Respondent-Driven Sampling |
|
|
70 | (1) |
|
2.3.2 Outstanding Challenges |
|
|
71 | (1) |
|
2.4 The Challenge with Identifying People |
|
|
72 | (2) |
|
2.4.1 The Social Media Opportunity |
|
|
73 | (1) |
|
2.4.2 Outstanding Challenges |
|
|
73 | (1) |
|
2.5 The Challenge with Reaching People |
|
|
74 | (3) |
|
2.5.1 The Social Media Opportunities |
|
|
75 | (1) |
|
|
75 | (1) |
|
2.5.1.2 Paid Social Media Advertising |
|
|
76 | (1) |
|
2.5.2 Outstanding Challenges |
|
|
77 | (1) |
|
2.6 The Challenge with Persuading People to Participate |
|
|
77 | (4) |
|
2.6.1 The Social Media Opportunities |
|
|
78 | (1) |
|
2.6.1.1 Paid Social Media Advertising |
|
|
78 | (1) |
|
2.6.1.2 Online Influencers |
|
|
79 | (1) |
|
2.6.2 Outstanding Challenges |
|
|
80 | (1) |
|
2.7 The Challenge with Interviewing People |
|
|
81 | (6) |
|
2.7.1 Social Media Opportunities |
|
|
82 | (1) |
|
2.7.1.1 Passive Social Media Data Mining |
|
|
82 | (1) |
|
2.7.1.2 Active Data Collection |
|
|
83 | (1) |
|
2.7.2 Outstanding Challenges |
|
|
84 | (3) |
|
|
87 | (12) |
|
|
89 | (10) |
|
3 Linking Survey Data with Commercial or Administrative Data for Data Quality Assessment |
|
|
99 | (32) |
|
|
|
|
|
99 | (2) |
|
3.2 Thinking About Quality Features of Analytic Data Sources |
|
|
101 | (3) |
|
3.2.1 What Is the Purpose of the Data Linkage? |
|
|
101 | (1) |
|
3.2.2 What Kind of Data Linkage for What Analytic Purpose? |
|
|
102 | (2) |
|
3.3 Data Used in This Chapter |
|
|
104 | (12) |
|
3.3.1 NSECE Household Survey |
|
|
104 | (1) |
|
3.3.2 Proprietary Research Files from Zillow |
|
|
105 | (2) |
|
3.3.3 Linking the NSECE Household Survey with Zillow Proprietary Datafiles |
|
|
107 | (1) |
|
3.3.3.1 Nonuniqueness of Matches |
|
|
107 | (3) |
|
3.3.3.2 Misalignment of Units of Observation |
|
|
110 | (1) |
|
3.3.3.3 Ability to Identify Matches |
|
|
110 | (2) |
|
3.3.3.4 Identifying Matches |
|
|
112 | (2) |
|
3.3.3.5 Implications of the Linking Process for Intended Analyses |
|
|
114 | (2) |
|
3.4 Assessment of Data Quality Using the Linked File |
|
|
116 | (9) |
|
3.4.1 What Variables in the Zillow Datafile Are Most Appropriate for Use in Substantive Analyses Linked to Survey Data? |
|
|
116 | (3) |
|
3.4.2 How Did Different Steps in the Survey Administration Process Contribute to Representativeness of the NSECE Survey Data? |
|
|
119 | (4) |
|
3.4.3 How Well Does the Linked Datafile Represent the Overall NSECE Dataset (Including Unlinked Records)? |
|
|
123 | (2) |
|
|
125 | (6) |
|
|
127 | (2) |
|
|
129 | (2) |
|
Section 2 Total Error and Data Quality |
|
|
131 | (142) |
|
4 Total Error Frameworks for Found Data |
|
|
133 | (30) |
|
|
|
|
133 | (1) |
|
4.2 Data Integration and Estimation |
|
|
134 | (4) |
|
|
135 | (2) |
|
4.2.2 The Integration Process |
|
|
137 | (1) |
|
|
137 | (1) |
|
|
138 | (3) |
|
4.4 Errors in Hybrid Estimates |
|
|
141 | (15) |
|
4.4.1 Error-Generating Processes |
|
|
141 | (4) |
|
4.4.2 Components of Bias, Variance, and Mean Squared Error |
|
|
145 | (3) |
|
|
148 | (5) |
|
|
153 | (1) |
|
4.4.4.1 Sample Recruitment Error |
|
|
153 | (3) |
|
4.4.4.2 Data Encoding Error |
|
|
156 | (1) |
|
4.5 Other Error Frameworks |
|
|
156 | (2) |
|
4.6 Summary and Conclusions |
|
|
158 | (5) |
|
|
160 | (3) |
|
5 Measuring the Strength of Attitudes in Social Media Data |
|
|
163 | (30) |
|
|
|
|
|
|
163 | (2) |
|
|
165 | (9) |
|
|
165 | (1) |
|
5.2.1.1 European Social Survey Data |
|
|
166 | (1) |
|
|
167 | (2) |
|
|
169 | (1) |
|
|
169 | (1) |
|
|
170 | (1) |
|
|
171 | (2) |
|
|
173 | (1) |
|
|
173 | (1) |
|
|
174 | (6) |
|
5.3.1 Overall Comparisons |
|
|
174 | (1) |
|
|
175 | (2) |
|
|
177 | (1) |
|
|
178 | (2) |
|
|
180 | (4) |
|
5.A 2016 German ESS Questions Used in Analysis |
|
|
184 | (2) |
|
5.B Search Terms Used to Identify Topics in Reddit Posts (2016 and 2018) |
|
|
186 | (1) |
|
|
186 | (1) |
|
5.B.2 Interest in Politics |
|
|
186 | (1) |
|
|
186 | (1) |
|
|
187 | (1) |
|
|
187 | (1) |
|
|
187 | (1) |
|
5.C Example of Coding Steps Used to Identify Topics and Assign Sentiment in Reddit Submissions (2016 and 2018) |
|
|
188 | (5) |
|
|
189 | (4) |
|
6 Attention to Campaign Events: Do Twitter and Self-Report Metrics Tell the Same Story? |
|
|
193 | (24) |
|
|
|
|
|
|
|
|
|
|
6.1 What Can Social Media Tell Us About Social Phenomena? |
|
|
193 | (2) |
|
6.2 The Empirical Evidence to Date |
|
|
195 | (1) |
|
6.3 Tweets as Public Attention |
|
|
196 | (1) |
|
|
197 | (1) |
|
|
198 | (6) |
|
6.6 Did Events Peak at the Same Time Across Data Streams? |
|
|
204 | (1) |
|
6.7 Were Event Words Equally Prominent Across Data Streams? |
|
|
205 | (1) |
|
6.8 Were Event Terms Similarly Associated with Particular Candidates? |
|
|
206 | (1) |
|
6.9 Were Event Trends Similar Across Data Streams? |
|
|
207 | (4) |
|
6.10 Unpacking Differences Between Samples |
|
|
211 | (1) |
|
|
212 | (1) |
|
|
213 | (4) |
|
7 Improving Quality of Administrative Data: A Case Study with FBI's National Incident-Based Reporting System Data |
|
|
217 | (28) |
|
|
|
|
|
|
|
217 | (3) |
|
|
220 | (2) |
|
7.2.1 Administrative Crime Statistics and the History of NIBRS Data |
|
|
220 | (1) |
|
1.2.2 Construction of the NIBRS Dataset |
|
|
221 | (1) |
|
7.3 Data Quality Improvement Based on the Total Error Framework |
|
|
222 | (12) |
|
7.3.1 Data Quality Assessment for Using Row-Column-Cell Framework |
|
|
224 | (1) |
|
7.3.1.1 Phase I: Evaluating Each Data Table |
|
|
224 | (1) |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
226 | (1) |
|
7.3.1.5 Row-Column-Cell Errors Impacting NIBRS |
|
|
227 | (1) |
|
7.3.1.6 Phase II: Evaluating the Integrated Data |
|
|
227 | (1) |
|
7.3.1.7 Errors in Data Integration Process |
|
|
227 | (1) |
|
7.3.1.8 Coverage Errors Due to Nonreporting Agencies |
|
|
228 | (1) |
|
7.3.1.9 Nonresponse Errors in the Incident Data Table Due to Unreported Incident Reports |
|
|
229 | (1) |
|
7.3.1.10 Invalid, Unknown, and Missing Values Within the Incident Reports |
|
|
230 | (1) |
|
7.3.2 Improving Data Quality via Sampling, Weighting, and Imputation |
|
|
231 | (1) |
|
7.3.2.1 Sample-Based Method to Improve Data Representativeness at the Agency Level |
|
|
231 | (1) |
|
7.3.2.2 Statistical Weighting to Adjust for Coverage Errors at the Agency Level |
|
|
232 | (1) |
|
7.3.2.3 Imputation to Compensate for Unreported Incidents and Missing Values in the Incident Reports |
|
|
233 | (1) |
|
7.4 Utilizing External Data Sources in Improving Data Quality of the Administrative Data |
|
|
234 | (4) |
|
7.4.1 Understanding the External Data Sources |
|
|
234 | (1) |
|
7.4.1.1 Data Quality Assessment of External Data Sources |
|
|
234 | (1) |
|
7.4.1.2 Producing Population Counts at the Agency Level Through Auxiliary Data |
|
|
235 | (1) |
|
7.4.2 Administrative vs. Survey Data for Crime Statistics |
|
|
236 | (2) |
|
7.4.3 A Pilot Study on Crime in the Bakken Region |
|
|
238 | (1) |
|
7.5 Summary and Future Work |
|
|
239 | (6) |
|
|
241 | (4) |
|
8 Performance and Sensitivities of Home Detection on Mobile Phone Data |
|
|
245 | (28) |
|
|
|
|
|
245 | (4) |
|
8.1.1 Mobile Phone Data and Official Statistics |
|
|
245 | (2) |
|
8.1.2 The Home Detection Problem |
|
|
247 | (2) |
|
8.2 Deploying Home Detection Algorithms to a French CDR Dataset |
|
|
249 | (6) |
|
|
249 | (2) |
|
8.2.2 The French Mobile Phone Dataset |
|
|
251 | (1) |
|
8.2.3 Defining Nine Home Detection Algorithms |
|
|
252 | (1) |
|
8.2.4 Different Observation Periods |
|
|
253 | (2) |
|
8.2.5 Summary of Data and Setup |
|
|
255 | (1) |
|
8.3 Assessing Home Detection Performance at Nationwide Scale |
|
|
255 | (3) |
|
|
256 | (1) |
|
8.3.2 Assessing Performance and Sensitivities |
|
|
256 | (1) |
|
8.3.2.1 Correlation with Ground Truth Data |
|
|
256 | (2) |
|
8.3.2.2 Ratio and Spatial Patterns |
|
|
258 | (1) |
|
8.3.2.3 Temporality and Sensitivity |
|
|
258 | (1) |
|
|
258 | (9) |
|
8.4.1 Relations between HDAs' User Counts and Ground Truth |
|
|
258 | (2) |
|
8.4.2 Spatial Patterns of Ratios Between User Counts and Population Counts |
|
|
260 | (1) |
|
8.4.3 Temporality of Correlations |
|
|
260 | (6) |
|
8.4.4 Sensitivity to the Duration of Observation |
|
|
266 | (1) |
|
8.4.5 Sensitivity to Criteria Choice |
|
|
266 | (1) |
|
8.5 Discussion and Conclusion |
|
|
267 | (6) |
|
|
270 | (3) |
|
Section 3 Big Data in Official Statistics |
|
|
273 | (114) |
|
9 Big Data Initiatives in Official Statistics |
|
|
275 | (28) |
|
|
|
|
275 | (1) |
|
9.2 Some Characteristics of the Changing Survey Landscape |
|
|
276 | (4) |
|
9.3 Current Strategies to Handle the Changing Survey Landscape |
|
|
280 | (5) |
|
|
281 | (1) |
|
9.3.2 Forming Partnerships |
|
|
281 | (1) |
|
9.3.3 Cooperation Between European NSIs |
|
|
282 | (1) |
|
9.3.4 Creating Big Data Centers |
|
|
282 | (1) |
|
9.3.5 Experimental Statistics |
|
|
283 | (1) |
|
9.3.6 Organizing Hackathons |
|
|
283 | (1) |
|
9.3.7 IT Infrastructure, Tools, and Methods |
|
|
284 | (1) |
|
9.4 The Potential of Big Data and the Use of New Methods in Official Statistics |
|
|
285 | (5) |
|
|
285 | (1) |
|
9.4.1.1 Green Areas in the Swedish City of Lidingo |
|
|
285 | (1) |
|
9.4.1.2 Innovative Companies |
|
|
285 | (1) |
|
9.4.1.3 Coding Commodity Flow Survey |
|
|
286 | (1) |
|
|
287 | (1) |
|
|
287 | (1) |
|
9.4.2.2 Expenditure Surveys |
|
|
288 | (1) |
|
9.4.2.3 Examples of Improving Statistics by Adjusting for Bias |
|
|
288 | (1) |
|
|
289 | (1) |
|
|
289 | (1) |
|
|
289 | (1) |
|
9.4.4.1 Consumer Price Index (CPI) |
|
|
289 | (1) |
|
|
289 | (1) |
|
9.4.4.3 ISCO and NACE Coding at Statistics Finland |
|
|
290 | (1) |
|
|
290 | (3) |
|
|
293 | (2) |
|
9.6.1 Allowing Access to Data |
|
|
293 | (1) |
|
9.6.2 Providing Access to Data |
|
|
294 | (1) |
|
|
295 | (8) |
|
|
296 | (7) |
|
10 Big Data in Official Statistics: A Perspective from Statistics Netherlands |
|
|
303 | (36) |
|
|
|
|
|
303 | (1) |
|
10.2 Big Data and Official Statistics |
|
|
304 | (1) |
|
10.3 Examples of Big Data in Official Statistics |
|
|
305 | (4) |
|
|
305 | (1) |
|
|
306 | (1) |
|
10.3.3 Social Media Messages |
|
|
307 | (1) |
|
|
308 | (1) |
|
10.4 Principles for Assessing the Quality of Big Data Statistics |
|
|
309 | (7) |
|
|
310 | (1) |
|
10.4.2 Models in Official Statistics |
|
|
311 | (1) |
|
10.4.3 Objectivity and Reliability |
|
|
312 | (2) |
|
|
314 | (1) |
|
10.4.5 Some Examples of Quality Assessments of Big Data Statistics |
|
|
315 | (1) |
|
10.5 Integration of Big Data with Other Statistical Sources |
|
|
316 | (9) |
|
10.5.1 Big Data as Auxiliary Data |
|
|
316 | (1) |
|
10.5.2 Size of the Internet Economy |
|
|
317 | (2) |
|
10.5.3 Improving the Consumer Confidence Index |
|
|
319 | (2) |
|
10.5.4 Big Data and the Quality of Gross National Product Estimates |
|
|
321 | (1) |
|
10.5.5 Google Trends for Nowcasting |
|
|
322 | (1) |
|
10.5.6 Multisource Statistics: Combination of Survey and Sensor Data |
|
|
323 | (1) |
|
10.5.7 Combining Administrative and Open Data Sources to Complete Energy Statistics |
|
|
324 | (1) |
|
10.6 Disclosure Control with Big Data |
|
|
325 | (2) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
326 | (1) |
|
10.7 The Way Ahead: A Chance for Paradigm Fusion |
|
|
327 | (3) |
|
10.7.1 Measurement and Selection Bias |
|
|
328 | (1) |
|
|
329 | (1) |
|
|
329 | (1) |
|
10.7.4 Phenomenon-Oriented Statistics |
|
|
330 | (1) |
|
|
330 | (9) |
|
|
331 | (6) |
|
|
337 | (2) |
|
11 Mining the New Oil for Official Statistics |
|
|
339 | (20) |
|
|
|
|
|
|
339 | (2) |
|
11.2 Statistical Inference for Binary Variables from Nonprobability Samples |
|
|
341 | (2) |
|
11.3 Integrating Data Source B Subject to Undercoverage Bias |
|
|
343 | (1) |
|
11.4 Integrating Data Sources Subject to Measurement Errors |
|
|
344 | (1) |
|
11.5 Integrating Probability Sample A Subject to Unit Nonresponse |
|
|
345 | (2) |
|
|
347 | (3) |
|
11.7 Examples of Official Statistics Applications |
|
|
350 | (3) |
|
|
353 | (1) |
|
|
354 | (5) |
|
|
354 | (3) |
|
|
357 | (2) |
|
12 Investigating Alternative Data Sources to Reduce Respondent Burden in United States Census Bureau Retail Economic Data Products |
|
|
359 | (28) |
|
|
|
359 | (3) |
|
12.1.1 Overview of the Economic Directorate |
|
|
360 | (1) |
|
|
361 | (1) |
|
12.1.3 Overview of the Census Bureau Retail Programs |
|
|
361 | (1) |
|
|
362 | (4) |
|
|
366 | (3) |
|
12.3.1 Background on Point-of-Sale Data |
|
|
366 | (2) |
|
|
368 | (1) |
|
|
369 | (12) |
|
12.4.1 Selection of Retailers |
|
|
370 | (1) |
|
12.4.2 National-Level Data |
|
|
371 | (4) |
|
|
375 | (2) |
|
|
377 | (4) |
|
|
381 | (6) |
|
|
384 | (1) |
|
|
384 | (1) |
|
|
384 | (3) |
|
Section 4 Combining Big Data with Survey Statistics: Methods and Applications |
|
|
387 | (148) |
|
13 Effects of Incentives in Smartphone Data Collection |
|
|
389 | (26) |
|
|
|
|
|
|
|
389 | (1) |
|
13.2 The Influence of Incentives on Participation |
|
|
390 | (2) |
|
13.3 Institut fur Arbeitsmarkt- und Berufsforschung (IAB)-SMART Study Design |
|
|
392 | (6) |
|
13.3.1 Sampling Frame and Sample Restrictions |
|
|
393 | (1) |
|
13.3.2 Invitation and Data Request |
|
|
394 | (3) |
|
13.3.3 Experimental Design for Incentive Study |
|
|
397 | (1) |
|
|
397 | (1) |
|
|
398 | (7) |
|
|
398 | (2) |
|
13.4.2 Number of Initially Activated Data-Sharing Functions |
|
|
400 | (1) |
|
13.4.3 Deactivating Functions |
|
|
401 | (1) |
|
|
402 | (1) |
|
|
403 | (2) |
|
|
405 | (10) |
|
13.5.1 Limitations and Future Research |
|
|
407 | (5) |
|
|
412 | (3) |
|
14 Using Machine Learning Models to Predict Attrition in a Survey Panel |
|
|
415 | (20) |
|
|
|
415 | (3) |
|
|
417 | (1) |
|
|
418 | (5) |
|
|
418 | (1) |
|
14.2.2 Support Vector Machines |
|
|
419 | (1) |
|
|
420 | (1) |
|
14.2.4 Evaluation Criteria |
|
|
420 | (2) |
|
14.2.4.1 Tuning Parameters |
|
|
422 | (1) |
|
|
423 | (2) |
|
14.3.1 Which Are the Important Predictors? |
|
|
425 | (1) |
|
|
425 | (3) |
|
14.A Questions Used in the Analysis |
|
|
428 | (7) |
|
|
431 | (4) |
|
15 Assessing Community Wellbeing Using Google Street-View and Satellite Imagery |
|
|
435 | (52) |
|
|
|
|
|
|
435 | (2) |
|
|
437 | (14) |
|
15.2.1 Sampling Units and Frames |
|
|
437 | (1) |
|
|
438 | (1) |
|
15.2.2.1 Study Outcomes from Survey Data |
|
|
438 | (2) |
|
15.2.2.2 Study Predictors from Built Environment Data |
|
|
440 | (7) |
|
15.2.2.3 Study Predictors from - Geospatial Imagery |
|
|
447 | (3) |
|
15.2.2.4 Model Development, Testing, and Evaluation |
|
|
450 | (1) |
|
|
451 | (6) |
|
|
451 | (4) |
|
|
455 | (1) |
|
|
456 | (1) |
|
|
457 | (2) |
|
15.A Amazon Mechanical Turk Questionnaire |
|
|
459 | (2) |
|
|
461 | (2) |
|
15.C Descriptive Statistics |
|
|
463 | (6) |
|
15.D Stepwise AIC OLS Regression Models |
|
|
469 | (3) |
|
15.E Generalized Linear Models via Penalized Maximum Likelihood with k-Fold Cross-Validation |
|
|
472 | (5) |
|
15.F Heat Maps - Actual vs. Model-Based Outcomes |
|
|
477 | (10) |
|
|
485 | (2) |
|
16 Nonparametric Bootstrap and Small Area Estimation to Mitigate Bias in Crowdsourced Data: Simulation Study and Application to Perceived Safety |
|
|
487 | (32) |
|
|
|
|
|
487 | (2) |
|
16.2 The Rise of Crowdsourcing and Implications |
|
|
489 | (1) |
|
16.3 Crowdsourcing Data to Analyze Social Phenomena: Limitations |
|
|
490 | (2) |
|
16.3.1 Self-Selection Bias |
|
|
490 | (1) |
|
16.3.2 Unequal Participation |
|
|
491 | (1) |
|
16.3.3 Underrepresentation of Certain Areas and Times |
|
|
492 | (1) |
|
16.3.4 Unreliable Area-Level Direct Estimates and Difficulty to Interpret Results |
|
|
492 | (1) |
|
16.4 Previous Approaches for Reweighting Crowdsourced Data |
|
|
492 | (1) |
|
16.5 A New Approach: Small Area Estimation Under a Nonparametric Bootstrap Estimator |
|
|
493 | (3) |
|
16.5.1 Step 1: Nonparametric Bootstrap |
|
|
494 | (2) |
|
16.5.2 Step 2: Area-Level Model-Based Small Area Estimation |
|
|
496 | (1) |
|
|
496 | (7) |
|
16.6.1 Population Generation |
|
|
497 | (1) |
|
16.6.2 Sample Selection and Simulation Steps |
|
|
497 | (2) |
|
|
499 | (4) |
|
16.7 Case Study: Safety Perceptions in London |
|
|
503 | (8) |
|
16.7.1 The Spatial Study of Safety Perceptions |
|
|
503 | (1) |
|
|
504 | (1) |
|
16.7.2.1 Place Pulse 2.0 Dataset |
|
|
504 | (2) |
|
16.7.2.2 Area-Level Covariates |
|
|
506 | (1) |
|
|
506 | (1) |
|
16.7.3.1 Model Diagnostics and External Validation |
|
|
506 | (4) |
|
16.7.3.2 Mapping Safety Perceptions at Neighborhood Level |
|
|
510 | (1) |
|
16.8 Discussion and Conclusions |
|
|
511 | (8) |
|
|
513 | (6) |
|
17 Using Big Data to Improve Sample Efficiency |
|
|
519 | (16) |
|
|
|
|
|
17.1 Introduction and Background |
|
|
519 | (4) |
|
17.2 Methods to More Efficiently Sample Unregistered Boat-Owning Households |
|
|
523 | (7) |
|
17.2.1 Model 1: Spatial Boat Density Model |
|
|
525 | (1) |
|
17.2.2 Model 2: Address-Level Boat-Ownership Propensity |
|
|
526 | (4) |
|
|
530 | (3) |
|
|
533 | (2) |
|
|
534 | (1) |
|
|
534 | (1) |
|
Section 5 Combining Big Data with Survey Statistics: Tools |
|
|
535 | (90) |
|
18 Feedback Loop: Using Surveys to Build and Assess Registration-Based Sample Religious Flags for Survey Research |
|
|
537 | (24) |
|
|
|
537 | (1) |
|
|
538 | (1) |
|
|
539 | (1) |
|
|
540 | (1) |
|
|
541 | (2) |
|
|
543 | (2) |
|
|
545 | (1) |
|
|
545 | (7) |
|
18.9 Considering Systematic Matching Rates |
|
|
552 | (2) |
|
18.10 Discussion and Conclusions |
|
|
554 | (7) |
|
|
557 | (4) |
|
19 Artificial Intelligence and Machine Learning Derived Efficiencies for Large-Scale Survey Estimation Efforts |
|
|
561 | (36) |
|
|
|
|
561 | (1) |
|
|
562 | (1) |
|
|
563 | (1) |
|
19.3 Accelerating the MEPS Imputation Processes: Development of Fast - Track MEPS Analytic Files |
|
|
563 | (9) |
|
19.3.1 MEPS Data Files and Variables |
|
|
566 | (1) |
|
19.3.2 Identification of Predictors of Medical Care Sources of Payment |
|
|
567 | (4) |
|
19.3.2.1 Class Variables Used in the Imputation |
|
|
571 | (1) |
|
19.3.3 Weighted Sequential Hot Deck Imputation |
|
|
572 | (1) |
|
19.A Building the Prototype |
|
|
572 | (3) |
|
19.4.1 Learning from the Data: Results for the 2012 MEPS |
|
|
573 | (2) |
|
19.5 An Artificial Intelligence Approach to Fast-Track MEPS Imputation |
|
|
575 | (13) |
|
19.5.1 Why Artificial Intelligence for Health-Care Cost Prediction |
|
|
577 | (1) |
|
19.5.1.1 Imputation Strategies |
|
|
578 | (2) |
|
19.5.1.2 Testing of Imputation Strategies |
|
|
580 | (1) |
|
|
580 | (1) |
|
19.5.1.4 Raw Data Extraction |
|
|
581 | (1) |
|
19.5.1.5 Attribute Selection |
|
|
582 | (2) |
|
19.5.1.6 Inter-Variable Correlation |
|
|
584 | (1) |
|
19.5.1.7 Multi-Output Random Forest |
|
|
584 | (1) |
|
|
585 | (3) |
|
|
588 | (9) |
|
|
592 | (1) |
|
|
593 | (4) |
|
20 Worldwide Population Estimates for Small Geographic Areas: Can We Do a Better Job? |
|
|
597 | (28) |
|
|
|
|
|
|
597 | (1) |
|
|
598 | (2) |
|
20.3 Gridded Population Estimates |
|
|
600 | (8) |
|
|
600 | (1) |
|
20.3.2 Basic Gridded Population Models |
|
|
601 | (1) |
|
|
601 | (1) |
|
|
602 | (1) |
|
|
603 | (1) |
|
|
604 | (1) |
|
20.3.7 Challenges, Pros, and Cons of Gridded Population Estimates |
|
|
605 | (3) |
|
20.4 Population Estimates in Surveys |
|
|
608 | (5) |
|
20.4.1 Standard Sampling Strategies |
|
|
608 | (1) |
|
20.4.2 Gridded Population Sampling from 1 km × 1 km Grid Cells |
|
|
609 | (1) |
|
|
609 | (2) |
|
20.4.3 Gridded Population Sampling from 100 m × 100 m Grid Cells |
|
|
611 | (1) |
|
20.4.3.1 GridSample R Package |
|
|
611 | (1) |
|
20.4.3.2 GridSample2.0 and www.GridSample.org |
|
|
611 | (2) |
|
20.4.4 Implementation of Gridded Population Surveys |
|
|
613 | (1) |
|
|
613 | (3) |
|
20.6 Conclusions and Next Steps |
|
|
616 | (9) |
|
|
617 | (1) |
|
|
617 | (8) |
|
Section 6 The Fourth Paradigm, Regulations, Ethics, Privacy |
|
|
625 | (108) |
|
21 Reproducibility in the Era of Big Data: Lessons for Developing Robust Data Management and Data Analysis Procedures |
|
|
627 | (30) |
|
|
|
|
|
|
627 | (1) |
|
|
627 | (2) |
|
21.3 Challenges Researchers Face in the Era of Big Data and Reproducibility |
|
|
629 | (1) |
|
|
630 | (2) |
|
21.5 Reliability and Validity of Administrative Data |
|
|
632 | (1) |
|
|
632 | (14) |
|
|
632 | (1) |
|
|
633 | (1) |
|
21.6.3 The Administrative Data |
|
|
634 | (1) |
|
21.6.4 The Six Research Fallacies |
|
|
635 | (1) |
|
21.6.4.1 More Data Are Better! |
|
|
635 | (2) |
|
21.6.4.2 Merging Is About Matching by IDs/Getting the Columns to Align |
|
|
637 | (2) |
|
21.6.4.3 Saving Your Syntax Is Enough to Ensure Reproducibility |
|
|
639 | (2) |
|
21.6.4.4 Transparency in Your Process Ensures Transparency in Your Final Product |
|
|
641 | (2) |
|
21.6.4.5 Administrative Data Are Higher Quality Than Self-Reported Data |
|
|
643 | (1) |
|
21.6.4.6 If Relevant Administrative Data Exist, They Will Help Answer Your Research Question |
|
|
644 | (2) |
|
|
646 | (11) |
|
|
649 | (5) |
|
|
654 | (3) |
|
22 Combining Active and Passive Mobile Data Collection: A Survey of Concerns |
|
|
657 | (26) |
|
|
|
|
|
|
657 | (2) |
|
|
659 | (2) |
|
22.2.1 Concern with Smartphone Data Collection |
|
|
659 | (2) |
|
22.2.2 Differential Concern across Subgroups of Users |
|
|
661 | (1) |
|
|
661 | (5) |
|
|
662 | (1) |
|
|
662 | (1) |
|
|
662 | (1) |
|
|
663 | (1) |
|
|
663 | (1) |
|
|
664 | (2) |
|
|
666 | (4) |
|
|
670 | (3) |
|
|
673 | (1) |
|
22.A.1 Frequency of Smartphone Use |
|
|
673 | (1) |
|
|
673 | (1) |
|
22.A3 Smartphone Activities |
|
|
674 | (1) |
|
22.A.4 General Privacy Concern |
|
|
674 | (1) |
|
|
675 | (8) |
|
|
679 | (1) |
|
|
679 | (4) |
|
23 Attitudes Toward Data Linkage: Privacy, Ethics, and the Potential for Harm |
|
|
683 | (30) |
|
|
|
|
23.1 Introduction: Big Data and the Federal Statistical System in the United States |
|
|
683 | (1) |
|
|
684 | (6) |
|
23.2.1 Focus Groups 2015 and 2016 |
|
|
685 | (4) |
|
23.2.2 Cognitive Interviews |
|
|
689 | (1) |
|
|
690 | (18) |
|
23.3.1 What Do Respondents Say They Expect and Believe About the Federal Government's Stewardship of Data? |
|
|
690 | (1) |
|
|
690 | (5) |
|
|
695 | (2) |
|
23.3.1.3 Trust in Statistics |
|
|
697 | (1) |
|
23.3.2 How Do Expectations and Beliefs About the Federal Government's Stewardship of Data Change or Remain When Asked About Data Linkage or Sharing? |
|
|
698 | (3) |
|
23.3.3 Under What Circumstances Do Respondents Support Sharing or Linking Data? |
|
|
701 | (5) |
|
23.3.4 What Fears and Preoccupations Worry Respondents When Asked About Data Sharing in the Federal Government? |
|
|
706 | (1) |
|
|
706 | (1) |
|
|
707 | (1) |
|
23.4 Discussion: Toward an Ethical Framework |
|
|
708 | (5) |
|
|
709 | (1) |
|
23.4.2 Transparency in Need for Data and Potential Uses of Data |
|
|
709 | (1) |
|
23.4.3 Connecting Data Collections to Benefits |
|
|
709 | (1) |
|
|
710 | (3) |
|
24 Moving Social Science into the Fourth Paradigm: The Data Life Cycle |
|
|
713 | (20) |
|
|
24.1 Consequences and Reality of the Availability of Big Data and Massive Compute Power for Survey Research and Social Science |
|
|
717 | (1) |
|
|
717 | (1) |
|
|
718 | (1) |
|
|
718 | (1) |
|
|
718 | (1) |
|
24.2 Technical Challenges for Data-Intensive Social Science Research |
|
|
718 | (5) |
|
|
719 | (1) |
|
24.2.2 Uncertainty Characterization and Quantification Or True, Useful, and New Information: Where Is It? |
|
|
720 | (2) |
|
|
722 | (1) |
|
24.3 The Solution: Social Science Researchers Become "Data-Aware" |
|
|
723 | (2) |
|
|
725 | (2) |
|
24.4.1 Acquire/Create/Collect |
|
|
725 | (1) |
|
|
725 | (1) |
|
|
726 | (1) |
|
|
726 | (1) |
|
|
727 | (1) |
|
24.5 Bridge the Gap Between Silos |
|
|
727 | (2) |
|
|
729 | (4) |
|
|
729 | (4) |
Index |
|
733 | |