Preface |
|
xv | |
Acknowledgments |
|
xvii | |
|
|
1 | (6) |
|
1.1 Motivation and Overview |
|
|
1 | (2) |
|
1.2 Data Profiling and Data Mining |
|
|
3 | (1) |
|
|
4 | (2) |
|
1.4 Organization of This Book |
|
|
6 | (1) |
|
|
7 | (4) |
|
2.1 Single-Column Analysis |
|
|
7 | (2) |
|
|
9 | (1) |
|
|
9 | (2) |
|
|
11 | (8) |
|
|
11 | (1) |
|
|
11 | (3) |
|
3.3 Data Types, Patterns, and Domains |
|
|
14 | (1) |
|
|
15 | (1) |
|
3.5 Approximate Statistics |
|
|
16 | (1) |
|
3.6 Summary and Discussion |
|
|
17 | (2) |
|
|
19 | (56) |
|
4.1 Dependency Definitions |
|
|
19 | (5) |
|
4.1.1 Functional Dependencies |
|
|
21 | (1) |
|
4.1.2 Unique Column Combinations |
|
|
22 | (1) |
|
4.1.3 Inclusion Dependencies |
|
|
23 | (1) |
|
4.2 Search Space and Data Structures |
|
|
24 | (7) |
|
4.2.1 Lattices and Search Space Sizes |
|
|
24 | (3) |
|
4.2.2 Position List Indexes and Search Space Validation |
|
|
27 | (2) |
|
|
29 | (1) |
|
|
30 | (1) |
|
4.3 Discovering Unique Column Combinations |
|
|
31 | (8) |
|
|
32 | (2) |
|
|
34 | (1) |
|
|
35 | (2) |
|
|
37 | (1) |
|
|
38 | (1) |
|
4.4 Discovering Functional Dependencies |
|
|
39 | (16) |
|
|
41 | (1) |
|
|
42 | (3) |
|
|
45 | (1) |
|
|
45 | (1) |
|
|
46 | (2) |
|
|
48 | (2) |
|
|
50 | (1) |
|
|
51 | (4) |
|
4.5 Discovering Inclusion Dependencies |
|
|
55 | (20) |
|
4.5.1 SQL-Based IND Validation |
|
|
57 | (3) |
|
|
60 | (1) |
|
|
61 | (1) |
|
|
62 | (2) |
|
|
64 | (2) |
|
|
66 | (2) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
72 | (3) |
|
5 Relaxed and Other Dependencies |
|
|
75 | (12) |
|
5.1 Relaxing the Extent of a Dependency |
|
|
75 | (3) |
|
5.1.1 Partial Dependencies |
|
|
76 | (1) |
|
5.1.2 Conditional Dependencies |
|
|
76 | (2) |
|
5.2 Relaxing Attribute Comparisons |
|
|
78 | (5) |
|
5.2.1 Metric and Matching Dependencies |
|
|
78 | (3) |
|
5.2.2 Order and Sequential Dependencies |
|
|
81 | (2) |
|
5.3 Approximating the Dependency Discovery |
|
|
83 | (1) |
|
5.4 Generalizing Functional Dependencies |
|
|
83 | (4) |
|
|
84 | (1) |
|
5.4.2 Multivalued Dependencies |
|
|
84 | (3) |
|
|
87 | (6) |
|
|
87 | (1) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
90 | (1) |
|
|
91 | (2) |
|
7 Profiling Non-Relational Data |
|
|
93 | (4) |
|
|
93 | (1) |
|
|
94 | (1) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (1) |
|
|
97 | (6) |
|
|
97 | (2) |
|
|
99 | (4) |
|
9 Data Profiling Challenges |
|
|
103 | (8) |
|
9.1 Functional Challenges |
|
|
103 | (5) |
|
9.1.1 Profiling Dynamic Data |
|
|
103 | (1) |
|
9.1.2 Interactive Profiling |
|
|
104 | (1) |
|
9.1.3 Profiling tor Integration |
|
|
105 | (1) |
|
9.1.4 Interpreting Profiling Results |
|
|
106 | (2) |
|
9.2 Non-Functional Challenges |
|
|
108 | (3) |
|
9.2.1 Efficiency and Scalability |
|
|
108 | (1) |
|
9.2.2 Profiling on New Architectures |
|
|
109 | (1) |
|
9.2.3 Benchmarking Profiling Methods |
|
|
109 | (2) |
|
|
111 | (2) |
Bibliography |
|
113 | (22) |
Authors' Biographies |
|
135 | |