Muutke küpsiste eelistusi

E-raamat: Data and Information Quality: Dimensions, Principles and Techniques

  • Formaat - PDF+DRM
  • Hind: 135,23 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book provides a systematic and comparative description of the vast number of research issues related to the quality of data and information. It does so by delivering a sound, integrated and comprehensive overview of the state of the art and future development of data and information quality in databases and information systems.To this end, it presents an extensive description of the techniques that constitute the core of data and information quality research, including record linkage (also called object identification), data integration, error localization and correction, and examines the related techniques in a comprehensive and original methodological framework. Quality dimension definitions and adopted models are also analyzed in detail, and differences between the proposed solutions are highlighted and discussed. Furthermore, while systematically describing data and information quality as an autonomous research area, paradigms and influences deriving from other areas, s

uch as probability theory, statistical data analysis, data mining, knowledge representation, and machine learning are also included. Last not least, the book also highlights very practical solutions, such as methodologies, benchmarks for the most effective techniques, case studies, and examples.The book has been written primarily for researchers in the fields of databases and information management or in natural sciences who are interested in investigating properties of data and information that have an impact on the quality of experiments, processes and on real life. The material presented is also sufficiently self-contained for masters or PhD-level courses, and it covers all the fundamentals and topics without the need for other textbooks. Data and information system administrators and practitioners who deal with systems exposed to data-quality issues and as a result need a systematization of the field and practical methods in the area will also benefit from the combination of

concrete practical approaches with sound theoretical formalisms.

Introduction to Information Quality.- Data Quality Dimensions.- Information Quality Dimensions for Maps and Texts.- Data Quality Issues in Linked open data.- Quality Of Images.- Models for Information Quality.- Activities for Information Quality.- Object Identification.- Recent Advances in Object Identification.- Data Quality Issues in Data Integration Systems.- Information Quality in Use.- Methodologies for Information Quality Assessment and Improvement.- Information Quality in Healthcare.- Quality of Web Data and Quality of Big Data: Open Problems.- References.- Index.

Arvustused

This book addresses the dimensions, principles, and techniques to ensure that data and information conform to the necessary quality requirements. Information and communication technology (ICT) professionals who touch in any way upon data and information quality should find this book mandatory reading. its serious depth and breadth would seem to merit building an advanced course on data and information quality around it, so computer science students would be yet another audience. (David G. Hill, Computing Reviews, computingreviews.com, October, 2016)

1 Introduction to Information Quality
1(20)
1.1 Introduction
1(1)
1.2 Why Information Quality Is Relevant
2(3)
1.2.1 Private Initiatives
3(1)
1.2.2 Public Initiatives
4(1)
1.3 Introduction to the Concept of Information Quality
5(2)
1.4 Information Quality and Information Classifications
7(2)
1.5 Information Quality and Types of Information Systems
9(2)
1.6 Main Research Issues and Application Domains
11(7)
1.6.1 Research Issues in Information Quality
12(1)
1.6.2 Application Domains in Information Quality
13(3)
1.6.3 Research Areas Related to Information Quality
16(2)
1.7 Standardization Efforts in Information Quality
18(1)
1.8 Summary
19(2)
2 Data Quality Dimensions
21(32)
2.1 Introduction
21(1)
2.2 A Classification Framework for Data and Information Quality Dimensions
22(1)
2.3 Accuracy Cluster
23(5)
2.3.1 Structural Accuracy Dimensions
24(3)
2.3.2 Time-Related Accuracy Dimensions
27(1)
2.4 Completeness Cluster
28(5)
2.4.1 Completeness of Relational Data
29(3)
2.4.2 Completeness of Web Data
32(1)
2.5 Accessibility Cluster
33(2)
2.6 Consistency Cluster
35(2)
2.6.1 Integrity Constraints
35(2)
2.6.2 Data Edits
37(1)
2.7 Approaches to the Definition of Data Quality Dimensions
37(7)
2.7.1 Theoretical Approach
38(1)
2.7.2 Empirical Approach
39(1)
2.7.3 Intuitive Approach
40(1)
2.7.4 A Comparative Analysis of the Dimension Definitions
41(2)
2.7.5 Trade-Offs Between Dimensions
43(1)
2.8 Schema Quality Dimensions
44(6)
2.8.1 Accuracy Cluster
45(1)
2.8.2 Completeness Cluster
45(1)
2.8.3 Redundancy Cluster
46(2)
2.8.4 Readability Cluster
48(2)
2.9 Summary
50(3)
3 Information Quality Dimensions for Maps and Texts
53(34)
3.1 Introduction
53(1)
3.2 From Data Quality Dimensions to Information Quality Dimensions
54(1)
3.3 Information Quality in Maps
55(7)
3.3.1 Conceptual Structure of Maps and Quality Dimensions of Maps
57(3)
3.3.2 Levels of Abstraction and Quality of Maps
60(2)
3.4 Information Quality in Semistructured Texts
62(14)
3.4.1 Accuracy Cluster
64(1)
3.4.2 Readability Cluster
64(4)
3.4.3 Consistency Cluster
68(5)
3.4.4 Other Issues Investigated in the Area of Text Comprehension
73(1)
3.4.5 Accessibility Cluster
74(1)
3.4.6 Text Quality in Administrative Documents
75(1)
3.5 Information Quality in Law Texts
76(10)
3.5.1 Accuracy Cluster
79(1)
3.5.2 Redundancy Cluster
80(1)
3.5.3 Readability Cluster
81(1)
3.5.4 Accessibility Cluster
81(2)
3.5.5 Consistency Cluster
83(1)
3.5.6 Global Quality Index
84(2)
3.6 Summary
86(1)
4 Data Quality Issues in Linked Open Data
87(26)
4.1 Introduction
87(1)
4.2 Semantic Web Standards and Linked Data
88(10)
4.2.1 The Web and the Rationale for Linked Data
88(1)
4.2.2 Semantic Web Standards
89(7)
4.2.3 Linked Data
96(2)
4.3 Quality Dimensions in Linked Open Data
98(12)
4.3.1 Accuracy Cluster
99(4)
4.3.2 Completeness Cluster
103(1)
4.3.3 Redundancy Cluster
104(2)
4.3.4 Readability Cluster
106(1)
4.3.5 Accessibility Cluster
106(3)
4.3.6 Consistency Cluster
109(1)
4.4 Interrelationships Between Dimensions
110(2)
4.5 Summary
112(1)
5 Quality of Images
113(24)
5.1 Introduction
113(2)
5.2 Image Quality Models and Dimensions
115(6)
5.3 Image Quality Assessment Approaches
121(6)
5.3.1 Subjective Approaches to Assessment
121(2)
5.3.2 Objective Approaches
123(4)
5.4 Quality Assessment and Image Production Workflow
127(2)
5.5 Quality Assessment in High-Quality Image Archives
129(4)
5.6 Video Quality Assessment
133(1)
5.7 Summary
134(3)
6 Models for Information Quality
137(18)
6.1 Introduction
137(1)
6.2 Extensions of Structured Data Models
138(7)
6.2.1 Conceptual Models
138(2)
6.2.2 Logical Models for Data Description
140(1)
6.2.3 The Polygen Model for Data Manipulation
141(1)
6.2.4 Data Provenance
142(3)
6.3 Extensions of Semistructured Data Models
145(2)
6.4 Management Information System Models
147(7)
6.4.1 Models for Process Description: The IP-MAP Model
147(2)
6.4.2 Extensions of IP-MAP
149(1)
6.4.3 Information Models
150(4)
6.5 Summary
154(1)
7 Activities for Information Quality
155(22)
7.1 Introduction
155(1)
7.2 Information Quality Activities: Generalities
156(1)
7.3 Quality Composition
157(11)
7.3.1 Models and Assumptions
160(1)
7.3.2 Dimensions
161(3)
7.3.3 Accuracy
164(1)
7.3.4 Completeness
165(3)
7.4 Error Localization and Correction
168(6)
7.4.1 Localize and Correct Inconsistencies
169(2)
7.4.2 Incomplete Data
171(1)
7.4.3 Discovering Outliers
172(2)
7.5 Summary
174(3)
8 Object Identification
177(40)
8.1 Introduction
177(1)
8.2 Historical Perspective
178(1)
8.3 Object Identification for Different Data Types
179(2)
8.4 The High-Level Process for Object Identification
181(2)
8.5 Details on the Steps for Object Identification
183(5)
8.5.1 Preprocessing
183(1)
8.5.2 Search Space Reduction
184(1)
8.5.3 Distance-Based Comparison Functions
185(2)
8.5.4 Decision
187(1)
8.6 Probabilistic Techniques
188(7)
8.6.1 The Fellegi and Sunter Theory and Extensions
188(6)
8.6.2 A Cost-Based Probabilistic Technique
194(1)
8.7 Empirical Techniques
195(9)
8.7.1 Sorted Neighborhood Method and Extensions
195(3)
8.7.2 The Priority Queue Algorithm
198(1)
8.7.3 A Technique for Complex Structured Data: Delphi
199(2)
8.7.4 XML Duplicate Detection: DogmatiX
201(1)
8.7.5 Other Empirical Methods
202(2)
8.8 Knowledge-Based Techniques
204(5)
8.8.1 Choice Maker
204(1)
8.8.2 A Rule-Based Approach: Intelliclean
205(2)
8.8.3 Learning Methods for Decision Rules: Atlas
207(2)
8.9 Quality Assessment
209(6)
8.9.1 Qualities and Related Metrics
209(2)
8.9.2 Search Space Reduction Methods
211(1)
8.9.3 Comparison Functions
211(1)
8.9.4 Decision Methods
211(3)
8.9.5 Results
214(1)
8.10 Summary
215(2)
9 Recent Advances in Object Identification
217(62)
9.1 Introduction
217(2)
9.2 Quality Assessment
219(6)
9.2.1 Qualities for Reduction
220(1)
9.2.2 Qualities for the Comparison and Decision Step
220(2)
9.2.3 General Analyses and Recommendations
222(1)
9.2.4 Hints on Frameworks for OID Techniques Evaluation
223(2)
9.3 Preprocessing
225(2)
9.4 Search Space Reduction
227(6)
9.4.1 Introduction to Techniques for Search Space Reduction
227(1)
9.4.2 Indexing Techniques
227(4)
9.4.3 Learnable, Adaptive, and Context-Based Reduction Techniques
231(2)
9.5 Comparison and Decision
233(25)
9.5.1 Extensions of the Fellegi and Sunter Probabilistic Model
234(1)
9.5.2 Knowledge in the Comparison Function
235(3)
9.5.3 Contextual Knowledge in Decision
238(6)
9.5.4 Other Types of Knowledge in Decision
244(2)
9.5.5 Incremental Techniques
246(6)
9.5.6 Multiple Decision Models
252(1)
9.5.7 Object Identification at Query Time
253(2)
9.5.8 OID Evolutive Maintenance
255(3)
9.6 Domain-Specific Object Identification Techniques
258(4)
9.6.1 Personal Names
259(2)
9.6.2 Businesses
261(1)
9.7 Object Identification Techniques for Maps and Images
262(10)
9.7.1 Map Matching: Location-Based Matching
263(2)
9.7.2 Map Matching: Location- and Feature-Based Matching
265(1)
9.7.3 Map and Orthoimage Matching
266(4)
9.7.4 Digital Gazetteer Data Matching
270(2)
9.8 Privacy Preserving Object Identification
272(4)
9.8.1 Privacy Requirements
273(2)
9.8.2 Matching Techniques
275(1)
9.8.3 Analysis and Evaluation
276(1)
9.8.4 Practical Aspects
276(1)
9.9 Summary
276(3)
10 Data Quality Issues in Data Integration Systems
279(30)
10.1 Introduction
279(2)
10.2 Generalities on Data Integration Systems
281(1)
10.2.1 Query Processing
282(2)
10.3 Techniques for Quality-Driven Query Processing
284(6)
10.3.1 The QP-alg: Quality-Driven Query Planning
284(2)
10.3.2 DaQuinCIS Query Processing
286(2)
10.3.3 Fusionplex Query Processing
288(2)
10.3.4 Comparison of Quality-Driven Query Processing Techniques
290(1)
10.4 Instance-Level Conflict Resolution
290(14)
10.4.1 Classification of Instance-Level Conflicts
291(2)
10.4.2 Overview of Techniques
293(10)
10.4.3 Comparison of Instance-Level Conflict Resolution Techniques
303(1)
10.5 Inconsistencies in Data Integration: A Theoretical Perspective
304(3)
10.5.1 A Formal Framework for Data Integration
304(1)
10.5.2 The Problem of Inconsistency
305(2)
10.6 Summary
307(2)
11 Information Quality in Use
309(44)
11.1 Introduction
309(2)
11.2 A Historical Perspective on Information Quality in Business Processes and Decision Making
311(1)
11.3 Models of Utility and Objective vs. Contextual Metrics
312(7)
11.4 Cost-Benefit Classifications for Data Quality
319(6)
11.4.1 Cost Classifications
319(5)
11.4.2 Benefits Classification
324(1)
11.5 Methodologies for Cost-Benefit Management of Information Quality
325(9)
11.6 How to Relate Contextual Quality Metrics with Utility
334(3)
11.7 Information Quality and Decision Making
337(14)
11.7.1 Relationships Between Information Quality and Decision Making
338(1)
11.7.2 Information Quality Usage in the Decision Process
339(7)
11.7.3 Decision Making and Information Overload
346(2)
11.7.4 Value-Driven Decision Making
348(3)
11.8 Summary
351(2)
12 Methodologies for Information Quality Assessment and Improvement
353(50)
12.1 Introduction
353(1)
12.2 Basics on Information Quality Methodologies
354(7)
12.2.1 Inputs and Outputs
354(3)
12.2.2 Classification of Methodologies
357(1)
12.2.3 Comparison Among Information-Driven and Process-Driven Strategies
358(2)
12.2.4 Basic Common Phases Among Methodologies
360(1)
12.3 Comparison of Methodologies
361(7)
12.3.1 Assessment Phase
362(2)
12.3.2 Improvement Phase
364(2)
12.3.3 Strategies and Techniques
366(1)
12.3.4 Comparison of Methodologies: Summary
366(2)
12.4 Detailed Comparative Analysis of Three General-Purpose Methodologies
368(8)
12.4.1 The TDQM Methodology
369(2)
12.4.2 The TIQM
371(3)
12.4.3 The Istat Methodology
374(2)
12.5 Assessment Methodologies
376(3)
12.6 The CDQM
379(7)
12.6.1 Reconstruct the State of Data
379(1)
12.6.2 Reconstruct Business Processes
380(1)
12.6.3 Reconstruct Macroprocesses and Rules
381(1)
12.6.4 Check Problems with Users
382(1)
12.6.5 Measure Data Quality
382(1)
12.6.6 Set New Target IQ Levels
383(1)
12.6.7 Choose Improvement Activities
384(1)
12.6.8 Choose Techniques for Data Activities
384(1)
12.6.9 Find Improvement Processes
385(1)
12.6.10 Choose the Optimal Improvement Process
386(1)
12.7 A Case Study in the e-Government Area
386(12)
12.7.1 Reconstruct the State of Data
388(1)
12.7.2 Reconstruct Business Processes
388(1)
12.7.3 Reconstruct Macroprocesses and Rules
389(1)
12.7.4 Check Problems with Users
390(1)
12.7.5 Measure Data Quality
391(1)
12.7.6 Set New Target Data Quality Levels
392(1)
12.7.7 Choose Improvement Activities
393(3)
12.7.8 Choose Techniques for Data Activities
396(1)
12.7.9 Find Improvement Processes
396(1)
12.7.10 Choose the Optimal Improvement Process
397(1)
12.8 Extension of CDQM to Heterogeneous Information Types
398(4)
12.9 Summary
402(1)
13 Information Quality in Healthcare
403(18)
13.1 Introduction
403(1)
13.2 Definitions and Scopes
404(1)
13.3 Inherent Challenges of Healthcare
405(4)
13.3.1 Multiple Uses, Users, and Applications
406(3)
13.4 Health Information Quality Dimensions, Methodologies, and Initiatives
409(4)
13.5 The Relevance of Information Quality in the Healthcare Domain
413(5)
13.5.1 Health Information Quality and Its Consequences on Healthcare
417(1)
13.6 Summary
418(3)
14 Quality of Web Data and Quality of Big Data: Open Problems
421(1)
14.1 Introduction
421(2)
14.2 Two Relevant Paradigms for Web Data Quality: Trustworthiness and Provenance
423(9)
14.2.1 Trustworthiness
423(4)
14.2.2 Provenance
427(5)
14.3 Web Object Identification
432(7)
14.3.1 Object Identification and Time Variability
433(4)
14.3.2 Object Identification and Quality
437(2)
14.4 Quality of Big Data: A Classification of Big Data Sources
439(1)
14.5 Source-Specific Quality Issues in Sensor Data
440(4)
14.5.1 Information Quality in Sensors and Sensor Networks
440(3)
14.5.2 Techniques for Data Cleaning in Sensors and Sensor Networks
443(1)
14.6 Domain-Specific Quality Issues: Official Statistics
444(4)
14.6.1 On the Quality of Big Data for Official Statistics
445(1)
14.6.2 A Case Study
446(2)
14.7 Summary
448
Erratum to: Data and Information Quality: Dimensions, Principles and Techniques 1(450)
References 451(32)
Index 483
Carlo Batini is full professor of Computer Engineering since 1986, initially at Sapienza Università di Roma, then since 2002 at University of Milano Bicocca. His research interests include eGoverment, information systems and data base modeling and design, data and information quality, and service science. From 1995 to 2003 he was a member of the board of directors of the Authority for Information Technology in Public Administration, where he headed several large scale projects for the modernization of public administration.

Monica Scannapieco is a researcher at Istat, the Italian National Institute of Statistics since 2006. She earned a University Degree in Computer Engineering with honors and a Ph.D. in Computer Engineering at Sapienza - Università di Roma. She is the author of more than 100 papers mainly on data quality, privacy preservation and data integration, published in leading conferences and journals in databases and information systems. She has been involved inseveral European research projects on data quality and data integration.