Data Cleaning Techniques
Business Knowledge Series course
There is a newer version of this course.Please see the schedule for the new Data Cleaning Techniques course.
This course teaches you how to detect errors in raw data source files as well as showing you how to identify and correct errors in character and numeric SAS data. You will learn a variety of techniques for detecting problems with more complex data structures, such as data sets requiring multiple observations per subject, or requiring entries for a single subject across multiple data sets. More than simply teaching techniques for detecting and fixing data errors, this is also an excellent SAS programming course. Novice and veteran SAS programmers alike will garner new and valuable programming tips and tricks.
Knowledge of the SAS macro language is not a prerequisite, but you will learn to run macros that perform a variety of data cleaning functions. As an example, the AUTO_OUTLIERS macro automatically reports on outliers in numeric data using a concept called "trimmed statistics."
The class closes with a demonstration of an innovative process that leverages integrity constraints and audit trails to detect and programmatically clean dirty data before it even gets into your analysis data set. After class, you'll have access to every program and macro used during class, as well as a personal copy of Cody's Data Cleaning Techniques, Second Edition.
Learn how to
Who should attend
Before attending this course, participants should have completed the SAS Programming 1: Essentials course. Completion of the SAS Programming 2: Data Manipulation Techniques course or a minimum of one year of SAS programming experience is also recommended.
This course addresses Base SAS, SAS/STAT software.
Checking Values of Character Variables