Data Cleaning Techniques
Business Knowledge Series course
This course, which was completely rewritten to be compatible with the third edition of the book Cody's Data Cleaning Techniques Using SAS, will help greatly speed up the process of detecting and correcting errors in both character and numeric data. In addition, there are sections on standardizing data and using Perl regular expressions to ensure that character values conform to a specific pattern (such as ZIP codes, phone numbers, and email addresses).
Although the course concentrates on methods of identifying data errors, it also teaches some programming techniques that might be new to you. For example, by using some of the latest SAS functions, you can convert a phone number in just about any form into a standard form, in only two SAS statements!
The course teaches several methods of detecting errors in numeric data including range checking as well as several methods of automatic outlier detection. There are chapters devoted to data that involves multiple observations per subject, SAS dates, and projects that include multiple data sets. The class closes with a demonstration of an innovative process that leverages integrity constraints and audit trails to detect and programmatically clean dirty data before it even gets into your analysis data set.
All students taking this class are presented with either a printed version or PDF version of the new Data Cleaning book and are given access to dozens of macros that will greatly speed up the laborious process of cleaning your data.
Learn how to
Who should attend
Before attending this course, participants should have completed the SAS Programming 1 course. Completion of the SAS Programming 2: Data Manipulation Techniques course or a minimum of one year of SAS programming experience is also recommended.
This course addresses Base SAS, SAS/STAT software.
Checking Values of Character Variables