Muutke küpsiste eelistusi

Cody's Data Cleaning Techniques Using SAS, Second Edition 2nd Revised ed. [Pehme köide]

  • Formaat: Paperback / softback, 268 pages, kõrgus x laius x paksus: 235x191x14 mm, kaal: 467 g, 1, black & white illustrations
  • Ilmumisaeg: 01-May-2008
  • Kirjastus: SAS Publishing
  • ISBN-10: 1599946599
  • ISBN-13: 9781599946597
Teised raamatud teemal:
  • Formaat: Paperback / softback, 268 pages, kõrgus x laius x paksus: 235x191x14 mm, kaal: 467 g, 1, black & white illustrations
  • Ilmumisaeg: 01-May-2008
  • Kirjastus: SAS Publishing
  • ISBN-10: 1599946599
  • ISBN-13: 9781599946597
Teised raamatud teemal:
Now a retired professor from the Robert Wood Johnson Medical School, Cody is a private consultant and national instructor for SAS, and the author or coauthor of numerous books on SAS. He offers novice and experienced SAS programmers a practical guide to detecting and correcting data errors while learning to apply DATA step programming techniques and SAS procedures. The material has been updated to cover the many new functions in SAS, and includes a new chapter on integrity constraints and audit trails, several macros to make data cleaning tasks easier, and a short description of an SAS product called DataFlux for performing advanced data cleaning techniques such as address standardization and fuzzy matching. Annotation ©2008 Book News, Inc., Portland, OR (booknews.com)

Thoroughly updated for SAS 9, this second edition addresses tasks that nearly every SAS programmer needs to do - that is, make sure that data errors are located and corrected. Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify for your own special data cleaning needs. Each topic is developed through specific examples, and every program and macro is explained in detail.You'll learn how to -find and correct errors in character and numeric values -develop programming techniques related to dates and missing values -use SQL approaches to data cleaning -develop techniques for correcting your data errors -use integrity constraints and audit trails to prevent errors from being added to a clean data set Novice and experienced SAS users will discover ways to detect and correct data errors while learning how to apply DATA step programming techniques and SAS procedures. SAS Products and Releases: Base SAS: 9.2, 9.1.3, 9.1.2, 9.1, 9.0 SAS/STAT: 9.2, 9.1.3, 9.1.2, 9.1, 9.0 Operating Systems: All
List of Programs
ix
Preface xv
Acknowledgments xvii
Checking Values of Character Variables
Introduction
1(1)
Using Proc Freq to List Values
1(1)
Description of the Raw Data File Patients.Txt
2(5)
Using a Data Step to Check for Invalid Values
7(2)
Describing the Verify, Trim, Missing, and Notdigit Functions
9(4)
Uisng Proc Print with a Where Statement to List Invalid Values
13(2)
Using Formats to Check for Invalid Values
15(3)
Using Informats to Remove Invalid Values
18(5)
Checking Values of Numeric Variables
Introduction
23(1)
Using Proc Means, Proc Tabulate, and Proc Univariate to Look for Outliers
24(10)
Using an Ods Select Statement to List Extreme Values
34(1)
Using Proc Univariate Options to List More Exterme Observations
35(2)
Using Proc Univariate to Look for Highest and Lowest Values by Percentage
37(6)
Using Proc Rank to Look for Highest and Lowest Values by Percentage
43(4)
Presenting a Program to List the Highest and Ten Values
47(3)
Presenting a Macoro to List the Highest and Lowest ``n''Values
50(2)
Using Proc Print with a Where Statement to List Invalid Data Values
52(2)
Using a Data Step to Check for Out-of-Range Values
54(1)
Identifying Invalid Values versus Missing Values
55(2)
Listing Invalid (Character) Values in the Error Report
57(3)
Creating a Macro for Range Checking
60(2)
Checking Ranges for Several Variables
62(4)
Using Formats to Check for Invalid Values
66(2)
Using Informats to Filter Invalid Values
68(3)
Checking a Range Using an Algorithm Based on Standard Deviation
71(2)
Detecting Outliers Based on a Trimmed Mean and Standard Deviation
73(3)
Presenting a Macro Based on Trimmed Statistics
76(4)
Using the Trim Option of Proc Univariate and Ods to Compute Trimmed Statistics
80(6)
Checking a Range Based on the Interquartile Range
86(5)
Checking for Missing Values
Introduction
91(1)
Inspecting the SAS Log
91(2)
Using Proc Means and Proc Freq to Count Missing Values
93(3)
Using Data Step Approaches to Identify and Count Missing Values
96(4)
Searching for a Specific Numeric Value
100(2)
Creating a Macro to Search for Specific Numeric Values
102(3)
Working with Dates
Introduction
105(1)
Checking Ranges for Dates (Using a Data Step)
106(1)
Checking Ranges for Dates (Using Proc Print)
107(1)
Checking for Invalid Dates
108(3)
Working with Dates in Nonstandard Form
111(2)
Creating a SAS Date When the Day of the Month Is Missing
113(1)
Suspending Error Checking for Known Invalid Dates
114(3)
Looking for Duplicates and ``n'' Observations per Subject
Introduction
117(1)
Eliminating Duplicates by Using Proc Sort
117(6)
Detecting Duplicates by Using Data Step Approaches
123(3)
Using Proc Freq to Deted Duplicate ID's
126(3)
Selecting Patients with Duplicate Observations by Using a Macro List and SQL
129(1)
Identifying Subjects with ``n`` Observations Each (Data Step Approach)
130(2)
Identifying Subjects with ``n`` Observations Each (Using Proc Freq)
132(3)
Working with Multiple Files
Introduction
135(1)
Checking for an ID in Each of Two Files
135(3)
Checking for an ID in Each of ``n`` Files
138(2)
A Macro for ID Checking
140(3)
More Complicated Multi-File Rules
143(4)
Checking That the Dates Are in the Proper Order
147(2)
Double Entry and Verification (Proc Compare)
Introduction
149(1)
Conducting a Simple Comparison of Two Data Sets
150(9)
Using Proc Compare with Two Data Sets That Have an Unequal Number of Observations
159(2)
Comparing Two Data Sets When Some Variables Are Not in Bothe Data Sets
161(4)
Some Proc Sql Solutions to Data Cleaning
Introduction
165(1)
A Quick Review of Proc Sql
166(1)
Checking for Invalid Character Values
166(2)
Checking for Outliers
168(1)
Checking a Range Using an Algorithm Based on the Standard Deviation
169(1)
Checking for Missing Values
170(2)
Range Checking for Dates
172(1)
Checking for Duplicates
173(1)
Identifying Subjects with ``n`` Observations Each
174(1)
Checking for an ID in Each of Two Files
174(2)
More Complicatd Multi-File Rules
176(5)
Correcting Errors
Introduction
181(1)
Hardcoding Corrections
181(1)
Describing Named Input
182(2)
Reviewing the Update Statement
184(3)
Creating Integrity Constraints and Audit Trails
Introducing SAS Integrity Constraints
187(1)
Demonstrating General Integrity Constraints
188(5)
Deleting an Integrity Constraint Using Proc Datasets
193(1)
Creating an Audit Trail Data Set
193(7)
Demonstrating an Integrity Constraint Involving More than One Variable
200(2)
Demonstrating a Referential Constraint
202(3)
Attempting to Delete a Primary Key When a Foreing Key Still Exists
205(2)
Attempting to Add a Name to the Child Data Set
207(1)
Demonstrating the Cascade Feature of a Referential Constraint
208(2)
Demonstrating the Set Null Feature of a Referential Constaint
210(1)
Demonstrating How to Delete a Referential Constraint
211(2)
Dataflux and dfpower Studio
Introduction
213(2)
Examples
215(2)
Appendix: Listing of Raw Data Files and SAS Programs
Programs and Raw Data Files Used in This Book
217(1)
Description of the Raw Data File Patients.txt
217(1)
Layout for the Data File Patients.txt
218(1)
Listing of Raw Data File Patients.txt
218(1)
Program to Create the SAS Data Set Patients
219(1)
Listing of Raw Data File Patients2.txt
220(1)
Program to Create the SAS Data Set Patients2
221(1)
Program to Create the SAS Data Set AE (Adverse Events)
221(1)
Program to Create the SAS Data Set LAB_Test
222(1)
Listings of the Data Cleaning Macros Used in This Book
222(17)
Index 239