Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Cody's Data Cleaning Techniques Using SAS, Third Edition

4.19/5 (27 hinnangut Goodreads-ist)

Ron Cody

Formaat: 234 pages
Ilmumisaeg: 15-Mar-2017
Kirjastus: SAS Institute
Keel: eng
ISBN-13: 9781635260694

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 33,92 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 234 pages
Ilmumisaeg: 15-Mar-2017
Kirjastus: SAS Institute
Keel: eng
ISBN-13: 9781635260694

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Find errors and clean up data easily using SAS®! Thoroughly updated, Cody's Data Cleaning Techniques Using SAS, Third Edition, addresses tasks that nearly every data analyst needs to do - that is, make sure that data errors are located and corrected. Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify which will make your job of data cleaning easier, faster, and more efficient.Building on both the author's experience gained from teaching a data cleaning course for over 10 years, and advances in SAS, this third edition includes four new chapters, covering topics such as the use of Perl regular expressions for checking the format of character values (such as zip codes or email addresses) and how to standardize company names and addresses.With this book, you will learn how to: •find and correct errors in character and numeric values •develop programming techniques related to dates and missing values •deal with highly skewed data •develop techniques for correcting your data errors •use integrity constraints and audit trails to prevent errors from being added to a clean data set

List of Programs

About This Book

About The Author

xvii

Acknowledgments

xix

Introduction

xxi

Chapter 1 Working with Character Data

(20)

Introduction

(3)

Using PROC FREQ to Detect Character Variable Errors

(2)

Changing the Case of All Character Variables in a Data Set

(2)

A Summary of Some Character Functions (Useful for Data Cleaning)

(3)

Upcase, Lowcase, And Propcase

(1)

Notdigit, Notalpha, And Notalnum

(1)

Verify

(1)

Compbl

(1)

Compress

(1)

Missing

(1)

Trimn And Strip

(1)

Checking that a Character Value Conforms to a Pattern

(1)

Using a DATA Step to Detect Character Data Errors

(1)

Using PROC PRINT with a WHERE Statement to Identify Data Errors

(1)

Using Formats to Check for Invalid Values

(2)

Creating Permanent Formats

(1)

Removing Units from a Value

(1)

Removing Non-Printing Characters from a Character Value

(1)

Conclusions

(2)

Chapter 2 Using Perl Regular Expressions to Detect Data Errors

(10)

Introduction

(1)

Describing the Syntax of Regular Expressions

(2)

Checking for Valid ZIP Codes and Canadian Postal Codes

(2)

Searching for Invalid Email Addresses

(1)

Verifying Phone Numbers

(1)

Converting All Phone Numbers to a Standard Form

(1)

Developing a Macro to Test Regular Expressions

(1)

Conclusions

(2)

Chapter 3 Standardizing Data

(14)

Introduction

(1)

Using Formats to Standardize Company Names

(2)

Creating a Format from a SAS Data Set

(3)

Using TRANWRD and Other Functions to Standardize Addresses

(2)

Using Regular Expressions to Help Standardize Addresses

(2)

Performing a "Fuzzy" Match between Two Files

(4)

Conclusions

(1)

Chapter 4 Data Cleaning Techniques for Numeric Data

(28)

Introduction

(1)

Using PROC UNIVARIATE to Examine Numeric Variables

(4)

Describing an ODS Option to List Selected Portions of the Output

(3)

Listing Output Objects Using the Statement TRACE ON

(1)

Using a PROC UNIVARIATE Option to List More Extreme Values

(1)

Presenting a Program to List the 10 Highest and Lowest Values

(2)

Presenting a Macro to List the n Highest and Lowest Values

(3)

Describing Two Programs to List the Highest and Lowest Values by Percentage

(8)

Using PROC UNIVARIATE

(2)

Presenting a Macro to List the Highest and Lowest n% Values

(2)

Using PROC RANK

(4)

Using Pre-Determined Ranges to Check for Possible Data Errors

(1)

Identifying Invalid Values versus Missing Values

(2)

Checking Ranges for Several Variables and Generating a Single Report

(3)

Conclusions

(1)

Chapter 5 Automatic Outlier Detection for Numeric Data

(14)

Introduction

(1)

Automatic Outlier Detection (Using Means and Standard Deviations)

(2)

Detecting Outliers Based on a Trimmed Mean and Standard Deviation

(3)

Describing a Program that Uses Trimmed Statistics for Multiple Variables

(3)

Presenting a Macro Based on Trimmed Statistics

(2)

Detecting Outliers Based on the Interquartile Range

(3)

Conclusions

(1)

Chapter 6 More Advanced Techniques for Finding Errors in Numeric Data

(22)

Introduction

(1)

Introducing the Banking Data Set

(4)

Running the Auto Outliers Macro on Bank Deposits

(1)

Identifying Outliers Within Each Account

(3)

Using Box Plots to Inspect Suspicious Deposits

(4)

Using Regression Techniques to Identify Possible Errors in the Banking Data

(5)

Using Regression Diagnostics to Identify Outliers

104

(4)

Conclusions

108

(1)

Chapter 7 Describing Issues Related to Missing and Special Values (Such as 999)

109

(14)

Introduction

109

(1)

Inspecting the SAS Log

109

(1)

Using PROC MEANS and PROC FREQ to Count Missing Values

110

(3)

Counting Missing Values for Numeric Variables

110

(1)

Counting Missing Values for Character Variables

111

(2)

Using DATA Step Approaches to Identify and Count Missing Values

113

(1)

Locating Patient Numbers for Records Where Patno Is Either Missing or Invalid

113

(4)

Searching for a Specific Numeric Value

117

(2)

Creating a Macro to Search for Specific Numeric Values

119

(2)

Converting Values Such as 999 to a SAS Missing Value

121

(1)

Conclusions

121

(2)

Chapter 8 Working with SAS Dates

123

(10)

Introduction

123

(1)

Changing the Storage Length for SAS Dates

123

(1)

Checking Ranges for Dates (Using a DATA Step)

124

(1)

Checking Ranges for Dates (Using PROC PRINT)

125

(1)

Checking for Invalid Dates

125

(3)

Working with Dates in Nonstandard Form

128

(1)

Creating a SAS Date When the Day of the Month Is Missing

129

(2)

Suspending Error Checking for Known Invalid Dates

131

(1)

Conclusions

131

(2)

Chapter 9 Looking for Duplicates and Checking Data with Multiple Observations per Subject

133

(14)

Introduction

133

(1)

Eliminating Duplicates by Using PROC SORT

133

(3)

Demonstrating a Possible Problem with the NODUPRECS Option

136

(2)

Reviewing First, and Last. Variables

138

(2)

Detecting Duplicates by Using DATA Step Approaches

140

(1)

Using PROC FREQ to Detect Duplicate IDs

141

(2)

Working with Data Sets with More Than One Observation per Subject

143

(1)

Identifying Subjects with n Observations Each (DATA Step Approach)

144

(2)

Identifying Subjects with n Observations Each (Using PROC FREQ)

146

(1)

Conclusions

146

(1)

Chapter 10 Working with Multiple Files

147

(8)

Introduction

147

(1)

Checking for an ID in Each of Two Files

147

(3)

Checking for an ID in Each of n Files

150

(2)

A Macro for ID Checking

152

(2)

Conclusions

154

(1)

Chapter 11 Using PROC COMPARE to Perform Data Verification

155

(8)

Introduction

155

(1)

Conducting a Simple Comparison of Two Data Files

155

(5)

Simulating Double Entry Verification Using PROC COMPARE

160

(1)

Other Features of PROC COMPARE

161

(1)

Conclusions

162

(1)

Chapter 12 Correcting Errors

163

(10)

Introduction

163

(1)

Hard Coding Corrections

163

(1)

Describing Named Input

164

(2)

Reviewing the UPDATE Statement

166

(2)

Using the UPDATE Statement to Correct Errors in the Patients Data Set

168

(3)

Conclusions

171

(2)

Chapter 13 Creating Integrity Constraints and Audit Trails

173

(22)

Introduction

173

(1)

Demonstrating General Integrity Constraints

174

(3)

Describing PROC APPEND

177

(1)

Demonstrating How Integrity Constraints Block the Addition of Data Errors

178

(1)

Adding Your Own Messages to Violations of an Integrity Constraint

179

(1)

Deleting an Integrity Constraint Using PROC DATASETS

180

(1)

Creating an Audit Trail Data Set

180

(3)

Demonstrating an Integrity Constraint Involving More Than One Variable

183

(3)

Demonstrating a Referential Constraint

186

(2)

Attempting to Delete a Primary Key When a Foreign Key Still Exists

188

(2)

Attempting to Add a Name to the Child Data Set

190

(1)

Demonstrating How to Delete a Referential Constraint

191

(1)

Demonstrating the CASCADE Feature of a Referential Constraint

191

(1)

Demonstrating the SET NULL Feature of a Referential Constraint

192

(1)

Conclusions

193

(2)

Chapter 14 A Summary of Useful Data Cleaning Macros

195

(10)

Introduction

195

(1)

A Macro to Test Regular Expressions

195

(1)

A Macro to List the n Highest and Lowest Values of a Variable

196

(1)

A Macro to List the n% Highest and Lowest Values of a Variable

197

(1)

A Macro to Perform Range Checks on Several Variables

198

(2)

A Macro that Uses Trimmed Statistics to Automatically Search for Outliers

200

(2)

A Macro to Search a Data Set for Specific Values Such as 999

202

(1)

A Macro to Check for ID Values in Multiple Data Sets

203

(1)

Conclusions

204

(1)

Index

205

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97816352606942e.html

Märksõnad:

Electronic data processing - Data preparation

E-raamat: Cody's Data Cleaning Techniques Using SAS, Third Edition

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv