Klienditugi: 7440010 (E-R 10-18)

E-raamat: Methodological Developments in Data Linkage

2.00/5 (2 hinnangut Goodreads-ist)

Chris Dibben (University of Edinburgh, UK), Harvey Goldstein (University of Bristol and University College London, UK), Katie Harron (London School of Hygiene and Tropical Medicine, UK)

Teised formaadid

Other digital carrier (Hind: 98,58 €) - 05-Feb-2016

Formaat: EPUB+DRM
Sari: Wiley Series in Probability and Statistics
Ilmumisaeg: 22-Sep-2015
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9781119072485

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 87,62 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
Raamatukogudele

Formaat: EPUB+DRM
Sari: Wiley Series in Probability and Statistics
Ilmumisaeg: 22-Sep-2015
Kirjastus: John Wiley & Sons Inc
Keel: eng
ISBN-13: 9781119072485

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

A comprehensive compilation of new developments in data linkage methodology

The increasing availability of large administrative databases has led to a dramatic rise in the use of data linkage, yet the standard texts on linkage are still those which describe the seminal work from the 1950-60s, with some updates. Linkage and analysis of data across sources remains problematic due to lack of discriminatory and accurate identifiers, missing data and regulatory issues. Recent developments in data linkage methodology have concentrated on bias and analysis of linked data, novel approaches to organising relationships between databases and privacy-preserving linkage.

Methodological Developments in Data Linkage brings together a collection of contributions from members of the international data linkage community, covering cutting edge methodology in this field. It presents opportunities and challenges provided by linkage of large and often complex datasets, including analysis problems, legal and security aspects, models for data access and the development of novel research areas. New methods for handling uncertainty in analysis of linked data, solutions for anonymised linkage and alternative models for data collection are also discussed.

Key Features:

Presents cutting edge methods for a topic of increasing importance to a wide range of research areas, with applications to data linkage systems internationally

Covers the essential issues associated with data linkage today

Includes examples based on real data linkage systems, highlighting the opportunities, successes and challenges that the increasing availability of linkage data provides

Novel approach incorporates technical aspects of both linkage, management and analysis of linked data

This book will be of core interest to academics, government employees, data holders, data managers, analysts and statisticians who use administrative data. It will also appeal to researchers in a variety of areas, including epidemiology, biostatistics, social statistics, informatics, policy and public health.

Foreword

Contributors

xiii

1 Introduction

(7)

Katie Harron

Harvey Goldstein

Chris Dibben

1.1 Introduction: data linkage as it exists

(1)

1.2 Background and issues

(1)

1.3 Data linkage methods

(2)

1.3.1 Deterministic linkage

(1)

1.3.2 Probabilistic linkage

(1)

1.3.3 Data preparation

(1)

1.4 Linkage error

(1)

1.5 Impact of linkage error on analysis of linked data

(1)

1.6 Data linkage: the future

(1)

2 Probabilistic linkage

(28)

William E. Winkler

2.1 Introduction

(2)

2.2 Overview of methods

(13)

2.2.1 The Fellegi-Sunter model of record linkage

(3)

2.2.2 Learning parameters

(7)

2.2.3 Additional methods for matching

(2)

2.2.4 An empirical example

(1)

2.3 Data preparation

(5)

2.3.1 Description of a matching project

(1)

2.3.2 Initial file preparation

(1)

2.3.3 Name standardisation and parsing

(1)

2.3.4 Address standardisation and parsing

(1)

2.3.5 Summarising comments on preprocessing

(1)

2.4 Advanced methods

(7)

2.4.1 Estimating false-match rates without training data

(4)

2.4.2 Adjusting analyses for linkage error

(3)

2.5 Concluding comments

(1)

3 The data linkage environment

(27)

Chris Dibben

Mark Elliot

Heather Gowans

Darren Lightfoot

3.1 Introduction

(1)

3.2 The data linkage context

(5)

3.2.1 Administrative or routine data

(1)

3.2.2 The law and the use of administrative (personal) data for research

(4)

3.2.3 The identifiability problem in data linkage

(1)

3.3 The tools used in the production of functional anonymity through a data linkage environment

(8)

3.3.1 Governance, rules and the researcher

(1)

3.3.2 Application process, ethics scrutiny and peer review

(1)

3.3.3 Shaping `safe' behaviour: training, sanctions, contracts and licences

(1)

3.3.4 `Safe' data analysis environments

(3)

3.3.5 Fragmentation: separation of linkage process and temporary linked data

(3)

3.4 Models for data access and data linkage

(4)

3.4.1 Single centre

(1)

3.4.2 Separation of functions: firewalls within single centre

(2)

3.4.3 Separation of functions: TTP linkage

(1)

3.4.4 Secure multiparty computation

(1)

3.5 Four case study data linkage centres

(8)

3.5.1 Population Data BC

(4)

3.5.2 The Secure Anonymised Information Linkage Databank, United Kingdom

(1)

3.5.3 Centre for Data Linkage (Population Health Research Network), Australia

(2)

3.5.4 The Centre for Health Record Linkage, Australia

(1)

3.6 Conclusion

(1)

4 Bias in data linkage studies

(20)

Megan Bohensky

4.1 Background

(2)

4.2 Description of types of linkage error

(3)

4.2.1 Missed matches from missing linkage variables

(1)

4.2.2 Missed matches from inconsistent case ascertainment

(1)

4.2.3 False matches: Description of cases incorrectly matched

(2)

4.3 How linkage error impacts research findings

(10)

4.3.1 Results

(7)

4.3.2 Assessment of linkage bias

(3)

4.4 Discussion

(5)

4.4.1 Potential biases in the review process

(1)

4.4.2 Recommendations and implications for practice

(4)

5 Secondary analysis of linked data

(26)

Raymond Chambers

Gunky Kim

5.1 Introduction

(1)

5.2 Measurement error issues arising from linkage

(2)

5.2.1 Correct links, incorrect links and non-links

(1)

5.2.2 Characterising linkage errors

(1)

5.2.3 Characterising errors from non-linkage

(1)

5.3 Models for different types of linking errors

(4)

5.3.1 Linkage errors under binary linking

(2)

5.3.2 Linkage errors under multi-linking

(1)

5.3.3 Incomplete linking

(1)

5.3.4 Modelling the linkage error

(1)

5.4 Regression analysis using complete binary-linked data

(5)

5.4.1 Linear regression

(4)

5.4.2 Logistic regression

(1)

5.5 Regression analysis using incomplete binary-linked data

(4)

5.5.1 Linear regression using incomplete sample to register linked data

(2)

5.6 Regression analysis with multi-linked data

(8)

5.6.1 Uncorrelated multi-linking: Complete linkage

100

(1)

5.6.2 Uncorrelated multi-linking: Sample to register linkage

101

(4)

5.6.3 Correlated multi-linkage

105

(1)

5.6.4 Incorporating auxiliary population information

105

(2)

5.7 Conclusion and discussion

107

(2)

6 Record linkage: A missing data problem

109

(16)

Harvey Goldstein

Katie Harron

6.1 Introduction

109

(2)

6.2 Probabilistic Record Linkage (PRL)

111

(1)

6.3 Multiple Imputation (MI)

112

(1)

6.4 Prior-Informed Imputation (PII)

113

(2)

6.4.1 Estimating matching probabilities

115

(1)

6.5 Example 1: Linking electronic healthcare data to estimate trends in bloodstream infection

115

(3)

6.5.1 Methods

115

(2)

6.5.2 Results

117

(1)

6.5.3 Conclusions

118

(1)

6.6 Example 2: Simulated data including non-random linkage error

118

(4)

6.6.1 Methods

118

(1)

6.6.2 Results

119

(3)

6.7 Discussion

122

(3)

6.7.1 Non-random linkage error

122

(1)

6.7.2 Strengths and limitations: Handling linkage error

122

(1)

6.7.3 Implications for data linkers and data users

123

(2)

7 Using graph databases to manage linked data

125

(45)

James M. Farrow

7.1 Summary

125

(1)

7.2 Introduction

126

(5)

7.2.1 Flat approach

127

(1)

7.2.2 Oops, your legacy is showing

128

(1)

7.2.3 Shortcomings

128

(3)

7.3 Graph approach

131

(8)

7.3.1 Overview of graph concepts

131

(2)

7.3.2 Graph queries versus relational queries

133

(3)

7.3.3 Comparison of data in flat database versus graph database

136

(1)

7.3.4 Relaxing the notion of `truth'

137

(1)

7.3.5 Not a linkage approach per se but a management approach which enables novel linkage approaches

138

(1)

7.3.6 Linkage engine independent

139

(1)

7.3.7 Separates out linkage from cluster identification phase (and clerical review)

139

(1)

7.4 Methodologies

139

(17)

7.4.1 Overview of storage and extraction approach

140

(1)

7.4.2 Overall management of data as collections

141

(1)

7.4.3 Data loading

142

(1)

7.4.4 Identification of equivalence sets and deterministic linkage

143

(1)

7.4.5 Probabilistic linkage

144

(1)

7.4.6 Clerical review

144

(1)

7.4.7 Determining cut-off thresholds

145

(2)

7.4.8 Final cluster extraction

147

(1)

7.4.9 Graph partitioning

147

(3)

7.4.10 Data management/curation

150

(1)

7.4.11 User interface challenges

150

(4)

7.4.12 Final cluster extraction

154

(1)

7.4.13 A typical end-to-end workflow

155

(1)

7.5 Algorithm implementation

156

(2)

7.5.1 Graph traversal

156

(1)

7.5.2 Cluster identification

157

(1)

7.5.3 Partitioning visitor

158

(1)

7.5.4 Encapsulating edge following policies

158

(1)

7.5.5 Graph partitioning

158

(1)

7.5.6 Insertion of review links

158

(1)

7.5.7 How to migrate while preserving current clusters

158

(1)

7.6 New approaches facilitated by graph storage approach

158

(9)

7.6.1 Multiple threshold extraction

160

(5)

7.6.2 Possibility of returning graph to end users

165

(1)

7.6.3 Optimised cluster analysis

166

(1)

7.6.4 Other link types

167

(1)

7.7 Conclusion

167

(3)

8 Large-scale linkage for total populations in official statistics

170

(31)

Owen Abbott

Peter Jones

Martin Ralphs

8.1 Introduction

170

(1)

8.2 Current practice in record linkage for population censuses

171

(7)

8.2.1 Introduction

171

(1)

8.2.2 Case study: the 2011 England and Wales Census assessment of coverage

172

(6)

8.3 Population-level linkage in countries that operate a population register: register-based censuses

178

(4)

8.3.1 Introduction

178

(1)

8.3.2 Case study 1: Finland

179

(1)

8.3.3 Case study 2: The Netherlands Virtual Census

180

(1)

8.3.4 Case study 3: Poland

180

(1)

8.3.5 Case study 4: Germany

181

(1)

8.3.6 Summary

181

(1)

8.4 New challenges in record linkage: the Beyond 2011 Programme

182

(17)

8.4.1 Introduction

182

(1)

8.4.2 Beyond 2011 linking methodology

183

(1)

8.4.3 The anonymisation process in Beyond 2011

184

(1)

8.4.4 Beyond 2011 linkage strategy using pseudonymised data

185

(10)

8.4.5 Linkage quality

195

(2)

8.4.6 Next steps

197

(1)

8.4.7 Conclusion

198

(1)

8.5 Summary

199

(2)

9 Privacy-preserving record linkage

201

(25)

Rainer Schnell

9.1 Introduction

201

(1)

9.2
Chapter outline

202

(1)

9.3 Linking with and without personal identification numbers

202

(4)

9.3.1 Linking using a trusted third party

203

(1)

9.3.2 Linking with encrypted PIDs

204

(1)

9.3.3 Linking with encrypted quasi-identifiers

204

(1)

9.3.4 PPRL in decentralised organisations

204

(2)

9.4 PPRL approaches

206

(3)

9.4.1 Phonetic codes

206

(1)

9.4.2 High-dimensional embeddings

206

(1)

9.4.3 Reference tables

207

(1)

9.4.4 Secure multiparty computations for PPRL

207

(1)

9.4.5 Bloom filter-based PPRL

207

(2)

9.5 PPRL for very large databases: blocking

209

(4)

9.5.1 Blocking for PPRL with Bloom filters

210

(1)

9.5.2 Blocking Bloom filters with MBT

211

(1)

9.5.3 Empirical comparison of blocking techniques for Bloom filters

211

(2)

9.5.4 Current recommendations for linking very large datasets with Bloom filters

213

(1)

9.6 Privacy considerations

213

(4)

9.6.1 Probability of attacks

214

(1)

9.6.2 Kind of attacks

215

(1)

9.6.3 Attacks on Bloom filters

215

(2)

9.7 Hardening Bloom filters

217

(7)

9.7.1 Randomly selected hash values

218

(1)

9.7.2 Random bits

218

(2)

9.7.3 Avoiding padding

220

(1)

9.7.4 Standardising the length of identifiers

220

(1)

9.7.5 Sampling bits for composite Bloom filters

221

(1)

9.7.6 Rehashing

221

(2)

9.7.7 Salting keys with record-specific data

223

(1)

9.7.8 Fake injections

223

(1)

9.7.9 Evaluation of Bloom filter hardening procedures

223

(1)

9.8 Future research

224

(1)

9.9 PPRL research and implementation with national databases

225

(1)

10 Summary

226

(7)

Katie Harron

Chris Dibben

Harvey Goldstein

10.1 Introduction

226

(1)

10.2 Part 1: Data linkage as it exists today

226

(1)

10.3 Part 2: Analysis of linked data

227

(2)

10.3.1 Quality of identifiers

227

(1)

10.3.2 Quality of linkage methods

228

(1)

10.3.3 Quality of evaluation

228

(1)

10.4 Part 3: Data linkage in practice: new developments

229

(2)

10.5 Concluding remarks

231

(2)

References

233

(20)

Index

253

Editors:

Katie Harron, London School of Hygiene and Tropical Medicine, UK

Harvey Goldstein, University of Bristol and University College London, UK

Chris Dibben, University of Edinburgh, UK

Lisainfo e-raamatute kohta