Muutke küpsiste eelistusi

Data Science for Public Policy 2021 ed. [Kõva köide]

  • Formaat: Hardback, 363 pages, kõrgus x laius: 279x210 mm, kaal: 1289 g, 111 Illustrations, color; 12 Illustrations, black and white; XIV, 363 p. 123 illus., 111 illus. in color., 1 Hardback
  • Sari: Springer Series in the Data Sciences
  • Ilmumisaeg: 01-Sep-2021
  • Kirjastus: Springer Nature Switzerland AG
  • ISBN-10: 3030713512
  • ISBN-13: 9783030713515
  • Kõva köide
  • Hind: 62,59 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 73,64 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Hardback, 363 pages, kõrgus x laius: 279x210 mm, kaal: 1289 g, 111 Illustrations, color; 12 Illustrations, black and white; XIV, 363 p. 123 illus., 111 illus. in color., 1 Hardback
  • Sari: Springer Series in the Data Sciences
  • Ilmumisaeg: 01-Sep-2021
  • Kirjastus: Springer Nature Switzerland AG
  • ISBN-10: 3030713512
  • ISBN-13: 9783030713515
This textbook presents the essential tools and core concepts of data science to public officials, policy analysts, and economists among others in order to further their application in the public sector. An expansion of the quantitative economics frameworks presented in policy and business schools, this book emphasizes the process of asking relevant questions to inform public policy. Its techniques and approaches emphasize data-driven practices, beginning with the basic programming paradigms that occupy the majority of an analyst’s time and advancing to the practical applications of statistical learning and machine learning. The text considers two divergent, competing perspectives to support its applications, incorporating techniques from both causal inference and prediction. Additionally, the book includes open-sourced data as well as live code, written in R and presented in notebook form, which readers can use and modify to practice working with data.
Preface vii
1 An Introduction 1(4)
1.1 Why we wrote this book
2(1)
1.2 What we assume
2(1)
1.3 How this book is structured
3(2)
2 The Case for Programming 5(8)
2.1 Doing visual analytics since the 1780s
5(2)
2.2 How does programming work?
7(1)
2.3 Setting up R and RStudio
8(3)
2.3.1 Installing R
8(1)
2.3.2 Installing RStudio
9(1)
2.3.3 DIY: Running your first code snippet
10(1)
2.4 Making the case for open-source software
11(2)
3 Elements of Programming 13(20)
3.1 Data are everywhere
13(1)
3.2 Data types
14(3)
3.2.1 numeric
14(1)
3.2.2 character
14(1)
3.2.3 logical
14(2)
3.2.4 factor
16(1)
3.2.5 date
16(1)
3.2.6 The class function
16(1)
3.3 Objects in R
17(1)
3.4 R's object classes
18(4)
3.4.1 vector
18(1)
3.4.2 matrix
18(1)
3.4.3 data.frame
19(1)
3.4.4 list
20(1)
3.4.5 The class function, v2
21(1)
3.4.6 More classes
22(1)
3.5 Packages
22(2)
3.5.1 Base R and the need to extend functionality
22(1)
3.5.2 Installing packages
22(1)
3.5.3 Loading packages
23(1)
3.5.4 Package management and pacman
23(1)
3.6 Data input/output
24(3)
3.6.1 Directories
24(2)
3.6.2 Load functions
26(1)
3.6.3 Datasets
26(1)
3.7 Finding help
27(1)
3.7.1 Help function
27(1)
3.7.2 Google and online communities
27(1)
3.8 Beyond this chapter
27(2)
3.8.1 Best practices
27(1)
3.8.2 Further study
28(1)
3.9 DIY: Loading solar energy data from the web
29(4)
4 Transforming Data 33(28)
4.1 Importing and assembling data
34(4)
4.1.1 Loading files
35(3)
4.2 Manipulating values
38(7)
4.2.1 Text manipulation functions
39(1)
4.2.2 Regular Expressions (RegEx)
40(3)
4.2.3 DIY: Working with PII
43(1)
4.2.4 Working with dates
44(1)
4.3 The structure of data
45(6)
4.3.1 Matrix or data frame?
45(1)
4.3.2 Array indexes
45(1)
4.3.3 Subsetting
46(1)
4.3.4 Sorting and re-ordering
47(1)
4.3.5 Aggregating data
48(1)
4.3.6 Reshaping data
49(2)
4.4 Control structures
51(5)
4.4.1 If statement
52(1)
4.4.2 For-loops
53(2)
4.4.3 While
55(1)
4.5 Functions
56(1)
4.6 Beyond this chapter
57(4)
4.6.1 Best practices
57(1)
4.6.2 Further study
58(3)
5 Record Linkage 61(22)
5.1 Edward Kennedy, Bill de Blasio, and Bayerische Motoren Werke
61(1)
5.2 How does record linkage work?
62(1)
5.3 Pre-processing the data
63(3)
5.4 De-duplication
66(1)
5.5 Deterministic record linkage
67(3)
5.6 Comparison functions
70(4)
5.6.1 Edit distances
70(1)
5.6.2 Phonetic algorithms
71(2)
5.6.3 New tricks, same heuristics
73(1)
5.7 Probabilistic record linkage
74(2)
5.8 Data privacy
76(1)
5.9 DIY: Matching people in the UK-UN sanction lists
77(3)
5.10 Beyond this chapter
80(3)
5.10.1 Best practices
80(1)
5.10.2 Further study
81(2)
6 Exploratory Data Analysis 83(30)
6.1 Visually detecting patterns
83(2)
6.2 The gist of EDA
85(2)
6.3 Visualizing distributions
87(7)
6.3.1 Skewed variables
92(2)
6.4 Exploring missing values
94(9)
6.4.1 Encodings
94(1)
6.4.2 Missing value functions
95(1)
6.4.3 Exploring missingness
96(2)
6.4.4 Treating missingness
98(5)
6.5 Analyzing time series
103(2)
6.6 Finding visual correlations
105(4)
6.6.1 Visual analysis on high-dimensional datasets
108(1)
6.7 Beyond this chapter
109(4)
7 Regression Analysis 113(26)
7.1 Measuring and predicting the preferences of society
113(1)
7.2 Simple linear regression
114(7)
7.2.1 Mean squared error
116(1)
7.2.2 Ordinary least squares
117(1)
7.2.3 DIY: A simple hedonic model
118(3)
7.3 Checking for linearity
121(2)
7.4 Multiple regression
123(14)
7.4.1 Non-linearities
124(1)
7.4.2 Discrete variables
125(2)
7.4.3 Discontinuities
127(1)
7.4.4 Measures of model fitness
128(1)
7.4.5 DIY: Choosing between models
129(3)
7.4.6 DIY: Housing prices over time
132(5)
7.5 Beyond this chapter
137(2)
8 Framing Classification 139(24)
8.1 Playing with fire
139(2)
8.1.1 FireCast
139(1)
8.1.2 What's a classifier?
140(1)
8.2 The basics of classifiers
141(5)
8.2.1 The anatomy of a classifier
141(1)
8.2.2 Finding signal in classification contexts
142(1)
8.2.3 Measuring accuracy
142(4)
8.3 Logistic regression
146(10)
8.3.1 The social science workhorse
146(1)
8.3.2 Telling the story from coefficients
147(1)
8.3.3 How are coefficients learned?
148(1)
8.3.4 In practice
148(2)
8.3.5 DIY: Expanding health care coverage
150(6)
8.4 Regularized regression
156(5)
8.4.1 From regularization to interpretation
158(1)
8.4.2 DIY: Re-visiting health care coverage
158(3)
8.5 Beyond this chapter
161(2)
9 Three Quantitative Perspectives 163(22)
9.1 Descriptive analysis
164(1)
9.2 Causal inference
165(9)
9.2.1 Potential outcomes framework
166(1)
9.2.2 Regression' discontinuity
167(5)
9.2.3 Difference-in-differences
172(2)
9.3 Prediction
174(8)
9.3.1 Understanding accuracy
175(5)
9.3.2 Model validation
180(2)
9.4 Beyond this chapter
182(3)
10 Prediction 185(32)
10.1 The role of algorithms
185(2)
10.2 Data science pipelines
187(2)
10.3 K-Nearest Neighbors (k-NN)
189(6)
10.3.1 Under the hood
190(2)
10.3.2 DIY: Predicting the extent of storm damage
192(3)
10.4 Tree-based learning
195(15)
10.4.1 Classification and Regression Trees (CART)
196(5)
10.4.2 Random forests
201(2)
10.4.3 In practice
203(1)
10.4.4 DIY: Wage prediction with CART and random forests
204(6)
10.5 An introduction to other algorithms
210(5)
10.5.1 Gradient boosting
211(1)
10.5.2 Neural networks
212(3)
10.6 Beyond this chapter
215(2)
11 Cluster Analysis 217(20)
11.1 Things closer together are more related
217(1)
11.2 Foundational concepts
218(1)
11.3 k-means
219(7)
11.3.1 Under the hood
219(2)
11.3.2 In Practice
221(2)
11.3.3 DIY: Clustering for economic development
223(3)
11.4 Hierarchical clustering
226(8)
11.4.1 Under the hood
227(2)
11.4.2 In Practice
229(1)
11.4.3 DIY: Clustering time series
230(4)
11.5 Beyond this chapter
234(3)
12 Spatial Data 237(22)
12.1 Anticipating climate impacts
237(2)
12.2 Classes of spatial data
239(1)
12.3 Rasters
239(5)
12.3.1 Raster files
241(1)
12.3.2 Rasters and math
242(1)
12.3.3 DIY: Working with raster math
242(2)
12.4 Vectors
244(12)
12.4.1 Vector files
244(1)
12.4.2 Converting points to spatial objects
245(1)
12.4.3 Coordinate Reference Systems
246(2)
12.4.4 DIY: Converting coordinates into point vectors
248(1)
12.4.5 Reading shapefiles
249(1)
12.4.6 Spatial joins
250(2)
12.4.7 DIY: Analyzing spatial relationships
252(4)
12.5 Beyond this chapter
256(3)
13 Natural Language 259(24)
13.1 Transforming text into data
260(6)
13.1.1 Processing textual data
260(2)
13.1.2 TF-IDF
262(1)
13.1.3 Document similarities
263(1)
13.1.4 DIY: Basic text processing
263(3)
13.2 Sentiment Analysis
266(5)
13.2.1 Sentiment lexicons
267(1)
13.2.2 Calculating sentiment scores
267(2)
13.2.3 DIY: Scoring text for sentiment
269(2)
13.3 Topic modeling
271(9)
13.3.1 A conceptual base
271(1)
13.3.2 How do topics models work?
272(1)
13.3.3 DIY: Finding topics in presidential speeches
273(7)
13.4 Beyond this chapter
280(3)
13.4.1 Best practices
280(1)
13.4.2 Further study
281(2)
14 The Ethics of Data Science 283(16)
14.1 An emerging debate
283(1)
14.2 Bias
284(5)
14.2.1 Sampling bias
285(2)
14.2.2 Measurement bias
287(2)
14.2.3 Prejudicial bias
289(1)
14.3 Fairness
289(2)
14.3.1 Score-based fairness
290(1)
14.3.2 Accuracy-based fairness
290(1)
14.3.3 Other considerations
291(1)
14.4 Transparency and Interpretability
291(4)
14.4.1 Interpretability
292(1)
14.4.2 Explainability
293(2)
14.5 Privacy
295(2)
14.5.1 An evolving landscape
295(1)
14.5.2 Privacy strategies
295(2)
14.6 Beyond this chapter
297(2)
15 Developing Data Products 299(18)
15.1 Meeting people where they are
299(2)
15.2 Designing for impact
301(3)
15.2.1 Identify a user need
301(1)
15.2.2 Size up the situation
302(1)
15.2.3 Build a lean "V1"
303(1)
15.2.4 Test and evaluate its impact, then iterate
303(1)
15.3 Communicating data science projects
304(4)
15.3.1 Presentations
304(2)
15.3.2 Written reports
306(2)
15.4 Reporting dashboards
308(3)
15.5 Prediction products
311(2)
15.5.1 Prioritization and targeting lists
311(1)
15.5.2 Scoring engines
311(2)
15.6 Continuing to hone your craft
313(2)
15.7 Where to next?
315(2)
16 Building Data Teams 317(14)
16.1 Establishing a baseline
317(3)
16.2 Operating models
320(6)
16.2.1 Center of excellence
320(1)
16.2.2 Hack teams
321(2)
16.2.3 Consultancy
323(1)
16.2.4 Matrix organizations
324(2)
16.3 Identifying roles
326(2)
16.3.1 The manager
326(1)
16.3.2 Analytics roles
326(1)
16.3.3 Data product roles
327(1)
16.3.4 Titles in the civil service system
328(1)
16.4 The hiring process
328(2)
16.4.1 Job postings and application review
328(1)
16.4.2 Interviews
329(1)
16.5 Final thoughts
330(1)
Appendix A: Planning a Data Product 331(4)
Key Questions
331(4)
Appendix B: Interview Questions 335(8)
Getting to know the candidate
335(1)
Business acumen
335(1)
Project experience
335(1)
Whiteboard questions
336(5)
Statistics
336(1)
Causal inference
337(1)
Estimation versus prediction
337(1)
Machine learning
338(1)
Model evaluation
339(1)
Communication and visualization
339(1)
Programming
340(1)
Take-home questions
341(2)
References 343(14)
Index 357
Jeffrey C. Chen: (1) Affiliated Researcher, Bennett Institute for Public Policy, University of Cambridge Edward A. Rubin: (1) Assistant Professor, University of Oregon (Dept. of Economics) Gary J. Cornwall: (1) Research Economist, U.S. Bureau of Economic Analysis