|
|
ix | |
Preface |
|
xv | |
Acknowledgments |
|
xvii | |
|
Checking Values of Character Variables |
|
|
|
|
1 | (1) |
|
Using Proc Freq to List Values |
|
|
1 | (1) |
|
Description of the Raw Data File Patients.Txt |
|
|
2 | (5) |
|
Using a Data Step to Check for Invalid Values |
|
|
7 | (2) |
|
Describing the Verify, Trim, Missing, and Notdigit Functions |
|
|
9 | (4) |
|
Uisng Proc Print with a Where Statement to List Invalid Values |
|
|
13 | (2) |
|
Using Formats to Check for Invalid Values |
|
|
15 | (3) |
|
Using Informats to Remove Invalid Values |
|
|
18 | (5) |
|
Checking Values of Numeric Variables |
|
|
|
|
23 | (1) |
|
Using Proc Means, Proc Tabulate, and Proc Univariate to Look for Outliers |
|
|
24 | (10) |
|
Using an Ods Select Statement to List Extreme Values |
|
|
34 | (1) |
|
Using Proc Univariate Options to List More Exterme Observations |
|
|
35 | (2) |
|
Using Proc Univariate to Look for Highest and Lowest Values by Percentage |
|
|
37 | (6) |
|
Using Proc Rank to Look for Highest and Lowest Values by Percentage |
|
|
43 | (4) |
|
Presenting a Program to List the Highest and Ten Values |
|
|
47 | (3) |
|
Presenting a Macoro to List the Highest and Lowest ``n''Values |
|
|
50 | (2) |
|
Using Proc Print with a Where Statement to List Invalid Data Values |
|
|
52 | (2) |
|
Using a Data Step to Check for Out-of-Range Values |
|
|
54 | (1) |
|
Identifying Invalid Values versus Missing Values |
|
|
55 | (2) |
|
Listing Invalid (Character) Values in the Error Report |
|
|
57 | (3) |
|
Creating a Macro for Range Checking |
|
|
60 | (2) |
|
Checking Ranges for Several Variables |
|
|
62 | (4) |
|
Using Formats to Check for Invalid Values |
|
|
66 | (2) |
|
Using Informats to Filter Invalid Values |
|
|
68 | (3) |
|
Checking a Range Using an Algorithm Based on Standard Deviation |
|
|
71 | (2) |
|
Detecting Outliers Based on a Trimmed Mean and Standard Deviation |
|
|
73 | (3) |
|
Presenting a Macro Based on Trimmed Statistics |
|
|
76 | (4) |
|
Using the Trim Option of Proc Univariate and Ods to Compute Trimmed Statistics |
|
|
80 | (6) |
|
Checking a Range Based on the Interquartile Range |
|
|
86 | (5) |
|
Checking for Missing Values |
|
|
|
|
91 | (1) |
|
|
91 | (2) |
|
Using Proc Means and Proc Freq to Count Missing Values |
|
|
93 | (3) |
|
Using Data Step Approaches to Identify and Count Missing Values |
|
|
96 | (4) |
|
Searching for a Specific Numeric Value |
|
|
100 | (2) |
|
Creating a Macro to Search for Specific Numeric Values |
|
|
102 | (3) |
|
|
|
|
105 | (1) |
|
Checking Ranges for Dates (Using a Data Step) |
|
|
106 | (1) |
|
Checking Ranges for Dates (Using Proc Print) |
|
|
107 | (1) |
|
Checking for Invalid Dates |
|
|
108 | (3) |
|
Working with Dates in Nonstandard Form |
|
|
111 | (2) |
|
Creating a SAS Date When the Day of the Month Is Missing |
|
|
113 | (1) |
|
Suspending Error Checking for Known Invalid Dates |
|
|
114 | (3) |
|
Looking for Duplicates and ``n'' Observations per Subject |
|
|
|
|
117 | (1) |
|
Eliminating Duplicates by Using Proc Sort |
|
|
117 | (6) |
|
Detecting Duplicates by Using Data Step Approaches |
|
|
123 | (3) |
|
Using Proc Freq to Deted Duplicate ID's |
|
|
126 | (3) |
|
Selecting Patients with Duplicate Observations by Using a Macro List and SQL |
|
|
129 | (1) |
|
Identifying Subjects with ``n`` Observations Each (Data Step Approach) |
|
|
130 | (2) |
|
Identifying Subjects with ``n`` Observations Each (Using Proc Freq) |
|
|
132 | (3) |
|
Working with Multiple Files |
|
|
|
|
135 | (1) |
|
Checking for an ID in Each of Two Files |
|
|
135 | (3) |
|
Checking for an ID in Each of ``n`` Files |
|
|
138 | (2) |
|
|
140 | (3) |
|
More Complicated Multi-File Rules |
|
|
143 | (4) |
|
Checking That the Dates Are in the Proper Order |
|
|
147 | (2) |
|
Double Entry and Verification (Proc Compare) |
|
|
|
|
149 | (1) |
|
Conducting a Simple Comparison of Two Data Sets |
|
|
150 | (9) |
|
Using Proc Compare with Two Data Sets That Have an Unequal Number of Observations |
|
|
159 | (2) |
|
Comparing Two Data Sets When Some Variables Are Not in Bothe Data Sets |
|
|
161 | (4) |
|
Some Proc Sql Solutions to Data Cleaning |
|
|
|
|
165 | (1) |
|
A Quick Review of Proc Sql |
|
|
166 | (1) |
|
Checking for Invalid Character Values |
|
|
166 | (2) |
|
|
168 | (1) |
|
Checking a Range Using an Algorithm Based on the Standard Deviation |
|
|
169 | (1) |
|
Checking for Missing Values |
|
|
170 | (2) |
|
|
172 | (1) |
|
|
173 | (1) |
|
Identifying Subjects with ``n`` Observations Each |
|
|
174 | (1) |
|
Checking for an ID in Each of Two Files |
|
|
174 | (2) |
|
More Complicatd Multi-File Rules |
|
|
176 | (5) |
|
|
|
|
181 | (1) |
|
|
181 | (1) |
|
|
182 | (2) |
|
Reviewing the Update Statement |
|
|
184 | (3) |
|
Creating Integrity Constraints and Audit Trails |
|
|
|
Introducing SAS Integrity Constraints |
|
|
187 | (1) |
|
Demonstrating General Integrity Constraints |
|
|
188 | (5) |
|
Deleting an Integrity Constraint Using Proc Datasets |
|
|
193 | (1) |
|
Creating an Audit Trail Data Set |
|
|
193 | (7) |
|
Demonstrating an Integrity Constraint Involving More than One Variable |
|
|
200 | (2) |
|
Demonstrating a Referential Constraint |
|
|
202 | (3) |
|
Attempting to Delete a Primary Key When a Foreing Key Still Exists |
|
|
205 | (2) |
|
Attempting to Add a Name to the Child Data Set |
|
|
207 | (1) |
|
Demonstrating the Cascade Feature of a Referential Constraint |
|
|
208 | (2) |
|
Demonstrating the Set Null Feature of a Referential Constaint |
|
|
210 | (1) |
|
Demonstrating How to Delete a Referential Constraint |
|
|
211 | (2) |
|
Dataflux and dfpower Studio |
|
|
|
|
213 | (2) |
|
|
215 | (2) |
|
Appendix: Listing of Raw Data Files and SAS Programs |
|
|
|
Programs and Raw Data Files Used in This Book |
|
|
217 | (1) |
|
Description of the Raw Data File Patients.txt |
|
|
217 | (1) |
|
Layout for the Data File Patients.txt |
|
|
218 | (1) |
|
Listing of Raw Data File Patients.txt |
|
|
218 | (1) |
|
Program to Create the SAS Data Set Patients |
|
|
219 | (1) |
|
Listing of Raw Data File Patients2.txt |
|
|
220 | (1) |
|
Program to Create the SAS Data Set Patients2 |
|
|
221 | (1) |
|
Program to Create the SAS Data Set AE (Adverse Events) |
|
|
221 | (1) |
|
Program to Create the SAS Data Set LAB_Test |
|
|
222 | (1) |
|
Listings of the Data Cleaning Macros Used in This Book |
|
|
222 | (17) |
Index |
|
239 | |