Muutke küpsiste eelistusi

Statistical Disclosure Control [Other digital carrier]

  • Formaat: Other digital carrier, 500 pages, kõrgus x laius x paksus: 250x150x15 mm, kaal: 666 g
  • Ilmumisaeg: 06-Jul-2012
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1118348230
  • ISBN-13: 9781118348239
Teised raamatud teemal:
Statistical Disclosure Control
  • Formaat: Other digital carrier, 500 pages, kõrgus x laius x paksus: 250x150x15 mm, kaal: 666 g
  • Ilmumisaeg: 06-Jul-2012
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1118348230
  • ISBN-13: 9781118348239
Teised raamatud teemal:

A reference to answer all your statistical confidentiality questions.

This handbook provides technical guidance on statistical disclosure control and on how to approach the problem of balancing the need to provide users with statistical outputs and the need to protect the confidentiality of respondents. Statistical disclosure control is combined with other tools such as administrative, legal and IT in order to define a proper data dissemination strategy based on a risk management approach.

The key concepts of statistical disclosure control are presented, along with the methodology and software that can be used to apply various methods of statistical disclosure control. Numerous examples and guidelines are also featured to illustrate the topics covered.

Statistical Disclosure Control:

  • Presents a combination of both theoretical and practical solutions
  • Introduces all the key concepts and definitions involved with statistical disclosure control.
  • Provides a high level overview of how to approach problems associated with confidentiality.
  • Provides a broad-ranging review of the methods available to control disclosure.
  • Explains the subtleties of group disclosure control.
  • Features examples throughout the book along with case studies demonstrating how particular methods are used.
  • Discusses microdata, magnitude and frequency tabular data, and remote access issues.
  • Written by experts within leading National Statistical Institutes.

Official statisticians, academics and market researchers who need to be informed and make decisions on disclosure limitation will benefit from this book. 

Preface xi Acknowledgements xv 1 Introduction 1 1.1 Concepts and
definitions 2 1.1.1 Disclosure 2 1.1.2 Statistical disclosure control 3
1.1.3 Tabular data 3 1.1.4 Microdata 3 1.1.5 Risk and utility 4 1.2 An
approach to Statistical Disclosure Control 7 1.2.1 Why is confidentiality
protection needed? 7 1.2.2 What are the key characteristics and uses of the
data? 8 1.2.3 What disclosure risks need to be protected against? 8 1.2.4
Disclosure control methods 8 1.2.5 Implementation 9 1.3 The chapters of the
handbook 9 2 Ethics, principles, guidelines and regulations a general
background 10 2.1 Introduction 10 2.2 Ethical codes and the new ISI code 11
2.2.1 ISI Declaration on Professional Ethics 11 2.2.2 New ISI Declaration
on Professional Ethics 12 2.2.3 European Statistics Code of Practice 15 2.3
UNECE principles and guidelines 16 2.3.1 UNECE Principles and Guidelines on
Confidentiality Aspects of Data Integration 18 2.3.2 Future activities on
the UNECE principles and guidelines 19 2.4 Laws 19 2.4.1 Committee on
Statistical Confidentiality 20 2.4.2 European Statistical System Committee
20 3 Microdata 23 3.1 Introduction 23 3.2 Microdata concepts 24 3.2.1
Stage 1: Assess need for confidentiality protection 24 3.2.2 Stage 2: Key
characteristics and use of microdata 27 3.2.3 Stage 3: Disclosure risk 30
3.2.4 Stage 4: Disclosure control methods 32 3.2.5 Stage 5: Implementation
34 3.3 Definitions of disclosure 36 3.3.1 Definitions of disclosure
scenarios 37 3.4 Definitions of disclosure risk 38 3.4.1 Disclosure risk
for categorical quasi-identifiers 39 3.4.2 Notation and assumptions 40
3.4.3 Disclosure risk for continuous quasi-identifiers 41 3.5 Estimating
re-identification risk 43 3.5.1 Individual risk based on the sample:
Threshold rule 44 3.5.2 Estimating individual risk using sampling weights 44
3.5.3 Estimating individual risk by Poisson model 47 3.5.4 Further models
that borrow information from other sources 48 3.5.5 Estimating per record
risk via heuristics 49 3.5.6 Assessing risk via record linkage 50 3.6
Non-perturbative microdata masking 51 3.6.1 Sampling 51 3.6.2 Global
recoding 52 3.6.3 Top and bottom coding 53 3.6.4 Local suppression 53 3.7
Perturbative microdata masking 53 3.7.1 Additive noise masking 54 3.7.2
Multiplicative noise masking 57 3.7.3 Microaggregation 60 3.7.4 Data
swapping and rank swapping 72 3.7.5 Data shuffling 73 3.7.6 Rounding 73
3.7.7 Re-sampling 74 3.7.8 PRAM 74 3.7.9 MASSC 78 3.8 Synthetic and hybrid
data 78 3.8.1 Fully synthetic data 79 3.8.2 Partially synthetic data 84
3.8.3 Hybrid data 86 3.8.4 Pros and cons of synthetic and hybrid data 98
3.9 Information loss in microdata 100 3.9.1 Information loss measures for
continuous data 101 3.9.2 Information loss measures for categorical data 108
3.10 Release of multiple files from the same microdata set 110 3.11
Software 111 3.11.1 -argus 111 3.11.2 sdcMicro 113 3.11.3 IVEware 115
3.12 Case studies 116 3.12.1 Microdata files at Statistics Netherlands 116
3.12.2 The European Labour Force Survey microdata for research purposes 118
3.12.3 The European Structure of Earnings Survey microdata for research
purposes 121 3.12.4 NHIS-linked mortality data public use file, USA 128
3.12.5 Other real case instances 130 4 Magnitude tabular data 131 4.1
Introduction 131 4.1.1 Magnitude tabular data: Basic terminology 131 4.1.2
Complex tabular data structures: Hierarchical and linked tables 132 4.1.3
Risk concepts 134 4.1.4 Protection concepts 137 4.1.5 Information loss
concepts 137 4.1.6 Implementation: Software, guidelines and case study 138
4.2 Disclosure risk assessment I: Primary sensitive cells 138 4.2.1 Intruder
scenarios 138 4.2.2 Sensitivity rules 140 4.3 Disclosure risk assessment
II: Secondary risk assessment 152 4.3.1 Feasibility interval 152 4.3.2
Protection level 154 4.3.3 Singleton and multi cell disclosure 155 4.3.4
Risk models for hierarchical and linked tables 155 4.4 Non-perturbative
protection methods 157 4.4.1 Global recoding 157 4.4.2 The concept of cell
suppression 157 4.4.3 Algorithms for secondary cell suppression 158 4.4.4
Secondary cell suppression in hierarchical and linked tables 161 4.5
Perturbative protection methods 163 4.5.1 A pre-tabular method:
Multiplicative noise 165 4.5.2 A post-tabular method: Controlled tabular
adjustment 165 4.6 Information loss measures for tabular data 166 4.6.1
Cell costs for cell suppression 166 4.6.2 Cell costs for CTA 167 4.6.3
Information loss measures to evaluate the outcome of table protection 167
4.7 Software for tabular data protection 168 4.7.1 Empirical comparison of
cell suppression algorithms 169 4.8 Guidelines: Setting up an efficient
table model systematically 173 4.8.1 Defining spanning variables 174 4.8.2
Response variables and mapping rules 175 4.9 Case studies 178 4.9.1
Response variables and mapping rules of the case study 178 4.9.2 Spanning
variables of the case study 179 4.9.3 Analysing the tables of the case study
179 4.9.4 Software issues of the case study 181 5 Frequency tables 183 5.1
Introduction 183 5.2 Disclosure risks 184 5.2.1 Individual attribute
disclosure 185 5.2.2 Group attribute disclosure 186 5.2.3 Disclosure by
differencing 187 5.2.4 Perception of disclosure risk 190 5.3 Methods 191
5.3.1 Pre-tabular 191 5.3.2 Table re-design 192 5.3.3 Post-tabular 193 5.4
Post-tabular methods 193 5.4.1 Cell suppression 193 5.4.2 ABS cell
perturbation 193 5.4.3 Rounding 194 5.5 Information loss 199 5.6 Software
201 5.6.1 Introduction 201 5.6.2 Optimal, first feasible and RAPID
solutions 202 5.6.3 Protection provided by controlled rounding 203 5.7 Case
studies 204 5.7.1 UK Census 204 5.7.2 Australian and New Zealand Censuses
205 6 Data access issues 208 6.1 Introduction 208 6.2 Research data
centres 209 6.3 Remote execution 209 6.4 Remote access 210 6.5 Licensing
211 6.6 Guidelines on output checking 211 6.6.1 Introduction 211 6.6.2
General approach 212 6.6.3 Rules for output checking 215 6.6.4
Organisational/procedural aspects of output checking 224 6.6.5 Researcher
training 233 6.7 Additional issues concerning data access 236 6.7.1
Examples of disclaimers 236 6.7.2 Output description 236 6.8 Case studies
237 6.8.1 The US Census Bureau Microdata Analysis System 237 6.8.2 Remote
access at Statistics Netherlands 239 Glossary 243 References 261 Author
index 279 Subject index 282