Klienditugi: 7440010 (E-R 10-18)

Analysis of Integrated Data [Kõva köide]

Edited by Li-Chun Zhang (Department of Social Statistics, University of Southampton, UK), Edited by Raymond L. Chambers (University of Wollongong, Australia)

Formaat: Hardback, 272 pages, kõrgus x laius: 234x156 mm, kaal: 526 g, 55 Tables, black and white; 22 Line drawings, black and white; 22 Illustrations, black and white
Sari: Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences
Ilmumisaeg: 08-May-2019
Kirjastus: Chapman & Hall/CRC
ISBN-10: 1498727980
ISBN-13: 9781498727983

Teised raamatud teemal:

Psychological methodology - (Hetkel poes: 2 nimetust)
Probability & statistics - (Hetkel poes: 2 nimetust)
Economic statistics

Kõva köide
Hind: 151,30 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 272 pages, kõrgus x laius: 234x156 mm, kaal: 526 g, 55 Tables, black and white; 22 Line drawings, black and white; 22 Illustrations, black and white
Sari: Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences
Ilmumisaeg: 08-May-2019
Kirjastus: Chapman & Hall/CRC
ISBN-10: 1498727980
ISBN-13: 9781498727983

Teised raamatud teemal:

Psychological methodology - (Hetkel poes: 2 nimetust)
Probability & statistics - (Hetkel poes: 2 nimetust)
Economic statistics

Püsilink: https://www.kriso.ee/db/9781498727983.html

Märksõnad:

The advent of "Big Data" has brought with it a rapid diversification of data sources, requiring analysis that accounts for the fact that these data have often been generated and recorded for different reasons. Data integration involves combining data residing in different sources to enable statistical inference, or to generate new statistical data for purposes that cannot be served by each source on its own. This can yield significant gains for scientific as well as commercial investigations.

However, valid analysis of such data should allow for the additional uncertainty due to entity ambiguity, whenever it is not possible to state with certainty that the integrated source is the target population of interest. Analysis of Integrated Data aims to provide a solid theoretical basis for this statistical analysis in three generic settings of entity ambiguity: statistical analysis of linked datasets that may contain linkage errors; datasets created by a data fusion process, where joint statistical information is simulated using the information in marginal data from non-overlapping sources; and estimation of target population size when target units are either partially or erroneously covered in each source.

Covers a range of topics under an overarching perspective of data integration.

Focuses on statistical uncertainty and inference issues arising from entity ambiguity.

Features state of the art methods for analysis of integrated data.

Identifies the important themes that will define future research and teaching in the statistical analysis of integrated data.

Analysis of Integrated Data is aimed primarily at researchers and methodologists interested in statistical methods for data from multiple sources, with a focus on data analysts in the social sciences, and in the public and private sectors.

Preface

xiii

Contributors

1 Introduction

(12)

Raymond L. Chambers

1.1 Why this book?

(2)

1.2 The structure of this book

(8)

1.3 Summary

(2)

References

(2)

2 On secondary analysis of datasets that cannot be linked without errors

(26)

Li-Chun Zhang

2.1 Introduction

(3)

2.1.1 Related work

(1)

2.1.2 Outline of investigation

(1)

2.2 The linkage data structure

(4)

2.2.1 Definitions

(1)

2.2.2 Agreement partition of match space

(2)

2.3 On maximum likelihood estimation

(2)

2.4 On analysis under the comparison data model

(8)

2.4.1 Linear regression under the linkage model

(2)

2.4.2 Linear regression under the comparison data model

(1)

2.4.3 Comparison data modelling (I)

(2)

2.4.4 Comparison data modelling (II)

(3)

2.5 On link subset analysis

(4)

2.5.1 Non-informative balanced selection

(3)

2.5.2 Illustration for the C-PR data

(1)

2.6 Concluding remarks

(5)

Bibliography

(4)

3 Capture-recapture methods in the presence of linkage errors

(34)

Loredana Di Consiglio

Tiziana Tuoto

Li-Chun Zhang

3.1 Introduction

(1)

3.2 The capture-recapture model: short formalization and notation

(2)

3.3 The linkage models and the linkage errors

(5)

3.3.1 The Fellegi and Sunter linkage model

(2)

3.3.2 Definition and estimation of linkage errors

(1)

3.3.3 Bayesian approaches to record linkage

(2)

3.4 The DSE in the presence of linkage errors

(10)

3.4.1 The Ding and Fienberg estimator

(1)

3.4.2 The modified Ding and Fienberg estimator

(1)

3.4.3 Some remarks

(3)

3.4.4 Examples

(5)

3.5 Linkage-error adjustments in the case of multiple lists

(8)

3.5.1 Log-linear model-based estimators

(3)

3.5.2 An alternative modelling approach

(1)

3.5.3 A Bayesian proposal

(1)

3.5.4 Examples

(3)

3.6 Concluding remarks

(8)

Bibliography

(7)

4 An overview on uncertainty and estimation in statistical matching

(28)

Pier Luigi Conti

Daniela Marella

Mauro Scanu

4.1 Introduction

(2)

4.2 Statistical matching problem: notations and technicalities

(2)

4.3 The joint distribution of variables not jointly observed: estimation and uncertainty

(10)

4.3.1 Matching error

(2)

4.3.2 Bounding the matching error via measures of uncertainty

(4)

4.4 Statistical matching for complex sample surveys

(7)

4.4.1 Technical assumptions on the sample designs

(2)

4.4.2 A proposal for choosing a matching distribution

(1)

4.4.3 Reliability of the matching distribution

(2)

4.4.4 Evaluation of the matching reliability as a hypothesis problem

(1)

4.5 Conclusions and pending issues: relationship between the statistical matching problem and ecological inference

(7)

Bibliography

(5)

5 Auxiliary variable selection in a statistical matching problem

101

(20)

Marcello D'Orazio

Marco Di Zio

Mauro Scanu

5.1 Introduction

101

(2)

5.2 Choice of the matching variables

103

(8)

5.2.1 Traditional methods based on association

104

(1)

5.2.2 Choosing the matching variables by uncertainty reduction

105

(1)

5.2.3 An illustrative example

106

(3)

5.2.4 The penalised uncertainty measure

109

(2)

5.3 Simulations with European Social Survey data

111

(6)

5.4 Conclusions

117

(4)

Bibliography

117

(4)

6 Minimal inference from incomplete 2 × 2-tables

121

(16)

Li-Chun Zhang

Raymond L. Chambers

6.1 Introduction

121

(4)

6.2 Corroboration

125

(2)

6.3 Maximum corroboration set

127

(3)

6.4 High assurance estimation of $$0

130

(1)

6.5 A corroboration test

131

(1)

6.6 Application: missing OCBGT data

132

(5)

Bibliography

133

(4)

7 Dual- and multiple-system estimation with fully and partially observed covariates

137

(32)

Peter G. M. van der Heijden

Paul A. Smith

Joe Whittaker

Maarten Cruyff

Bart F. M. Bakker

7.1 Introduction

138

(2)

7.2 Theory concerning invariant population-size estimates

140

(6)

7.2.1 Terminology and properties

140

(2)

7.2.2 Example

142

(2)

7.2.3 Graphical representation of log-linear models

144

(1)

7.2.4 Three registers

145

(1)

7.3 Applications of invariant population-size estimation

146

(2)

7.3.1 Modelling strategies with active and passive covariates

146

(1)

7.3.2 Working with invariant population-size estimates

147

(1)

7.4 Dealing with partially observed covariates

148

(6)

7.4.1 Framework for population-size estimation with partially observed covariates

148

(2)

7.4.2 Example

150

(2)

7.4.3 Interaction graphs for models with incomplete covariates

152

(1)

7.4.4 Results of model fitting

152

(2)

7.5 Precision and sensitivity

154

(3)

7.5.1 Precision

154

(2)

7.5.2 Sensitivity

156

(1)

7.5.3 Comparison of the EM algorithm with the classical model

157

(1)

7.6 An application when the same variable is measured differently in both registers

157

(4)

7.6.1 Example: Injuries in road accidents in the Netherlands

158

(2)

7.6.2 More detailed breakdown of transport mode in accidents

160

(1)

7.7 Discussion

161

(8)

7.7.1 Alternative approaches

161

(3)

7.7.2 Quality issues

164

(1)

Bibliography

165

(4)

8 Estimating population size in multiple record systems with uncertainty of state identification

169

(28)

Davide Di Cecco

8.1 Introduction

169

(3)

8.2 A latent class model for capture-recapture

172

(9)

8.2.1 Decomposable models

174

(2)

8.2.2 Identifiability

176

(1)

8.2.3 EM algorithm

176

(2)

8.2.4 Fixing parameters

178

(1)

8.2.5 A mixture of different components

178

(1)

8.2.6 Model selection

179

(2)

8.3 Observed heterogeneity of capture probabilities

181

(5)

8.3.1 Use of covariates

181

(1)

8.3.2 Incomplete lists

182

(4)

8.4 Evaluating the interpretation of the latent classes

186

(1)

8.5 A Bayesian approach

187

(10)

8.5.1 MCMC algorithm

189

(2)

8.5.2 Simulations results

191

(1)

Bibliography

192

(5)

9 Log-linear models of erroneous list data

197

(22)

Li-Chun Zhang

9.1 Introduction

197

(2)

9.2 Log-linear models of incomplete contingency tables

199

(1)

9.3 Modelling marginally classified list errors

200

(6)

9.3.1 The models

200

(3)

9.3.2 Maximum likelihood estimation

203

(1)

9.3.3 Estimation based on list-survey data

204

(2)

9.4 Model selection with zero degree of freedom

206

(6)

9.4.1 Latent likelihood ratio criterion

206

(3)

9.4.2 Illustration

209

(3)

9.5 Homelessness data in the Netherlands

212

(7)

9.5.1 Data and previous study

212

(1)

9.5.2 Analysis allowing for erroneous enumeration

213

(4)

Bibliography

217

(2)

10 Sampling design and analysis using geo-referenced data

219

(28)

Danila Filipponi

Federica Piersimoni

Roberto Benedetti

Maria Michela Dickson

Giuseppe Espa

Diego Giuliani

10.1 Introduction

219

(2)

10.2 Geo-referenced data and potential locational errors

221

(1)

10.3 A brief review of spatially balanced sampling methods

222

(4)

10.3.1 Local pivotal methods

223

(1)

10.3.2 Spatially correlated Poisson sampling

224

(1)

10.3.3 Balanced sampling through the cube method

225

(1)

10.3.4 Local cube method

225

(1)

10.4 Spatial sampling for estimation of under-coverage rate

226

(6)

10.5 Business surveys in the presence of locational errors

232

(7)

10.6 Conclusions

239

(8)

Bibliography

240

(7)

Index

247

Li-Chun Zhang is Professor in Social Statistics at the University of Southampton, UK, Senior Researcher at Statistics Norway, Norway, and Professor in Official Statistics at the University of Oslo, Norway.

Raymond Chambers is Professor of Statistical Methodology at the University of Wollongong, Australia.

Analysis of Integrated Data [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv