Preface |
|
xi | |
Acknowledgements |
|
xv | |
|
|
1 | (9) |
|
1.1 Concepts and definitions |
|
|
2 | (5) |
|
|
2 | (1) |
|
1.1.2 Statistical disclosure control |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
4 | (3) |
|
1.2 An approach to Statistical Disclosure Control |
|
|
7 | (2) |
|
1.2.1 Why is confidentiality protection needed? |
|
|
7 | (1) |
|
1.2.2 What are the key characteristics and uses of the data? |
|
|
8 | (1) |
|
1.2.3 What disclosure risks need to be protected against? |
|
|
8 | (1) |
|
1.2.4 Disclosure control methods |
|
|
8 | (1) |
|
|
9 | (1) |
|
1.3 The chapters of the handbook |
|
|
9 | (1) |
|
2 Ethics, principles, guidelines and regulations - a general background |
|
|
10 | (13) |
|
|
10 | (1) |
|
2.2 Ethical codes and the new ISI code |
|
|
11 | (5) |
|
2.2.1 ISI Declaration on Professional Ethics |
|
|
11 | (1) |
|
2.2.2 New ISI Declaration on Professional Ethics |
|
|
12 | (3) |
|
2.2.3 European Statistics Code of Practice |
|
|
15 | (1) |
|
2.3 UNECE principles and guidelines |
|
|
16 | (3) |
|
2.3.1 UNECE Principles and Guidelines on Confidentiality Aspects of Data Integration |
|
|
18 | (1) |
|
2.3.2 Future activities on the UNECE principles and guidelines |
|
|
19 | (1) |
|
|
19 | (4) |
|
2.4.1 Committee on Statistical Confidentiality |
|
|
20 | (1) |
|
2.4.2 European Statistical System Committee |
|
|
20 | (3) |
|
|
23 | (108) |
|
|
23 | (1) |
|
|
24 | (12) |
|
3.2.1 Stage 1: Assess need for confidentiality protection |
|
|
24 | (3) |
|
3.2.2 Stage 2: Key characteristics and use of microdata |
|
|
27 | (3) |
|
3.2.3 Stage 3: Disclosure risk |
|
|
30 | (2) |
|
3.2.4 Stage 4: Disclosure control methods |
|
|
32 | (2) |
|
3.2.5 Stage 5: Implementation |
|
|
34 | (2) |
|
3.3 Definitions of disclosure |
|
|
36 | (2) |
|
3.3.1 Definitions of disclosure scenarios |
|
|
37 | (1) |
|
3.4 Definitions of disclosure risk |
|
|
38 | (5) |
|
3.4.1 Disclosure risk for categorical quasi-identifiers |
|
|
39 | (1) |
|
3.4.2 Notation and assumptions |
|
|
40 | (1) |
|
3.4.3 Disclosure risk for continuous quasi-identifiers |
|
|
41 | (2) |
|
3.5 Estimating re-identification risk |
|
|
43 | (8) |
|
3.5.1 Individual risk based on the sample: Threshold rule |
|
|
44 | (1) |
|
3.5.2 Estimating individual risk using sampling weights |
|
|
44 | (3) |
|
3.5.3 Estimating individual risk by Poisson model |
|
|
47 | (1) |
|
3.5.4 Further models that borrow information from other sources |
|
|
48 | (1) |
|
3.5.5 Estimating per record risk via heuristics |
|
|
49 | (1) |
|
3.5.6 Assessing risk via record linkage |
|
|
50 | (1) |
|
3.6 Non-perturbative microdata masking |
|
|
51 | (2) |
|
|
51 | (1) |
|
|
52 | (1) |
|
3.6.3 Top and bottom coding |
|
|
53 | (1) |
|
|
53 | (1) |
|
3.7 Perturbative microdata masking |
|
|
53 | (25) |
|
3.7.1 Additive noise masking |
|
|
54 | (3) |
|
3.7.2 Multiplicative noise masking |
|
|
57 | (3) |
|
|
60 | (12) |
|
3.7.4 Data swapping and rank swapping |
|
|
72 | (1) |
|
|
73 | (1) |
|
|
73 | (1) |
|
|
74 | (1) |
|
|
74 | (4) |
|
|
78 | (1) |
|
3.8 Synthetic and hybrid data |
|
|
78 | (22) |
|
3.8.1 Fully synthetic data |
|
|
79 | (5) |
|
3.8.2 Partially synthetic data |
|
|
84 | (2) |
|
|
86 | (12) |
|
3.8.4 Pros and cons of synthetic and hybrid data |
|
|
98 | (2) |
|
3.9 Information loss in microdata |
|
|
100 | (10) |
|
3.9.1 Information loss measures for continuous data |
|
|
101 | (7) |
|
3.9.2 Information loss measures for categorical data |
|
|
108 | (2) |
|
3.10 Release of multiple files from the same microdata set |
|
|
110 | (1) |
|
|
111 | (5) |
|
|
111 | (2) |
|
|
113 | (2) |
|
|
115 | (1) |
|
|
116 | (15) |
|
3.12.1 Microdata files at Statistics Netherlands |
|
|
116 | (2) |
|
3.12.2 The European Labour Force Survey microdata for research purposes |
|
|
118 | (3) |
|
3.12.3 The European Structure of Earnings Survey microdata for research purposes |
|
|
121 | (7) |
|
3.12.4 NHIS-linked mortality data public use file, USA |
|
|
128 | (2) |
|
3.12.5 Other real case instances |
|
|
130 | (1) |
|
|
131 | (52) |
|
|
131 | (7) |
|
4.1.1 Magnitude tabular data: Basic terminology |
|
|
131 | (1) |
|
4.1.2 Complex tabular data structures: Hierarchical and linked tables |
|
|
132 | (2) |
|
|
134 | (3) |
|
4.1.4 Protection concepts |
|
|
137 | (1) |
|
4.1.5 Information loss concepts |
|
|
137 | (1) |
|
4.1.6 Implementation: Software, guidelines and case study |
|
|
138 | (1) |
|
4.2 Disclosure risk assessment I: Primary sensitive cells |
|
|
138 | (14) |
|
|
138 | (2) |
|
|
140 | (12) |
|
4.3 Disclosure risk assessment II: Secondary risk assessment |
|
|
152 | (5) |
|
4.3.1 Feasibility interval |
|
|
152 | (2) |
|
|
154 | (1) |
|
4.3.3 Singleton and multi cell disclosure |
|
|
155 | (1) |
|
4.3.4 Risk models for hierarchical and linked tables |
|
|
155 | (2) |
|
4.4 Non-perturbative protection methods |
|
|
157 | (6) |
|
|
157 | (1) |
|
4.4.2 The concept of cell suppression |
|
|
157 | (1) |
|
4.4.3 Algorithms for secondary cell suppression |
|
|
158 | (3) |
|
4.4.4 Secondary cell suppression in hierarchical and linked tables |
|
|
161 | (2) |
|
4.5 Perturbative protection methods |
|
|
163 | (3) |
|
4.5.1 A pre-tabular method: Multiplicative noise |
|
|
165 | (1) |
|
4.5.2 A post-tabular method: Controlled tabular adjustment |
|
|
165 | (1) |
|
4.6 Information loss measures for tabular data |
|
|
166 | (2) |
|
4.6.1 Cell costs for cell suppression |
|
|
166 | (1) |
|
|
167 | (1) |
|
4.6.3 Information loss measures to evaluate the outcome of table protection |
|
|
167 | (1) |
|
4.7 Software for tabular data protection |
|
|
168 | (5) |
|
4.7.1 Empirical comparison of cell suppression algorithms |
|
|
169 | (4) |
|
4.8 Guidelines: Setting up an efficient table model systematically |
|
|
173 | (5) |
|
4.8.1 Defining spanning variables |
|
|
174 | (1) |
|
4.8.2 Response variables and mapping rules |
|
|
175 | (3) |
|
|
178 | (5) |
|
4.9.1 Response variables and mapping rules of the case study |
|
|
178 | (1) |
|
4.9.2 Spanning variables of the case study |
|
|
179 | (1) |
|
4.9.3 Analysing the tables of the case study |
|
|
179 | (2) |
|
4.9.4 Software issues of the case study |
|
|
181 | (2) |
|
|
183 | (25) |
|
|
183 | (1) |
|
|
184 | (7) |
|
5.2.1 Individual attribute disclosure |
|
|
185 | (1) |
|
5.2.2 Group attribute disclosure |
|
|
186 | (1) |
|
5.2.3 Disclosure by differencing |
|
|
187 | (3) |
|
5.2.4 Perception of disclosure risk |
|
|
190 | (1) |
|
|
191 | (2) |
|
|
191 | (1) |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
193 | (6) |
|
|
193 | (1) |
|
5.4.2 ABS cell perturbation |
|
|
193 | (1) |
|
|
194 | (5) |
|
|
199 | (2) |
|
|
201 | (3) |
|
|
201 | (1) |
|
5.6.2 Optimal, first feasible and RAPID solutions |
|
|
202 | (1) |
|
5.6.3 Protection provided by controlled rounding |
|
|
203 | (1) |
|
|
204 | (4) |
|
|
204 | (1) |
|
5.7.2 Australian and New Zealand Censuses |
|
|
205 | (3) |
|
|
208 | (35) |
|
|
208 | (1) |
|
6.2 Research data centres |
|
|
209 | (1) |
|
|
209 | (1) |
|
|
210 | (1) |
|
|
211 | (1) |
|
6.6 Guidelines on output checking |
|
|
211 | (25) |
|
|
211 | (1) |
|
|
212 | (3) |
|
6.6.3 Rules for output checking |
|
|
215 | (9) |
|
6.6.4 Organisational/procedural aspects of output checking |
|
|
224 | (9) |
|
6.6.5 Researcher training |
|
|
233 | (3) |
|
6.7 Additional issues concerning data access |
|
|
236 | (1) |
|
6.7.1 Examples of disclaimers |
|
|
236 | (1) |
|
|
236 | (1) |
|
|
237 | (6) |
|
6.8.1 The US Census Bureau Microdata Analysis System |
|
|
237 | (2) |
|
6.8.2 Remote access at Statistics Netherlands |
|
|
239 | (4) |
Glossary |
|
243 | (18) |
References |
|
261 | (18) |
Author index |
|
279 | (3) |
Subject index |
|
282 | |