SAS Global Forum 2016 Proceedings

In the clinical research world, data accuracy plays a significant role in delivering quality results. Various validation methods are available to confirm data accuracy. Of these, double programming is the most highly recommended and commonly used method to demonstrate a perfect match between production and validation output. PROC COMPARE is one of the SAS^® procedures used to compare two data sets and confirm the accuracy In the current practice, whenever a program rerun happens, the programmer must manually review the output file to ensure an exact match. This is tedious, time-consuming, and error prone because there are more LST files to be reviewed and manual intervention is required. The proposed approach programmatically validates the output of PROC COMPARE in all the programs and generates an HTML output file with a Pass/Fail status flag for each output file with the following: 1. Data set name, label, and number of variables and observations 2. Number of observations not in both compared and base data sets 3. Number of variables not in both compared and base data sets The status is flagged as Pass whenever the output file meets the following 3N criteria: 1. NOTE: No unequal values were found. All values compared are exactly equal 2. Number of observations in base and compared data sets are equal 3. Number of variables in base and compared data sets are equal The SAS^® macro %threeN efficiently validates all the output files generated from PROC COMPARE in a short time and also expedites validation with accuracy. It reduces up to 90% of the time spent on the manual review of the PROC COMPARE output files.

Read the paper (PDF) | View the e-poster or slides (PDF)

Common tasks that we need to perform are merging or appending SAS^® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields' lengths have been changed and which fields have inconsistent data type issues.

Read the paper (PDF) | Watch the recording

A useful and often overlooked tool released with the SAS^® Business Intelligence suite to aid in ETL is SAS^® Data Integration Studio. This product gives users the ability to extract, transform, join, and load data from various database management systems (DBMSs), data marts, and other data stores by using a graphical interface and without having to code different credentials for each schema. It enables seamless promotion of code to a production system without the need to alter the code. And it is quite useful for deploying and scheduling jobs by using the schedule manager in SAS^® Management Console, because all code created by Data Integration Studio is optimized. Although this tool enables users to create code from scratch, one of its most useful capabilities is that it can take legacy SAS^® code and, with minimal alterations, have its data associations created and have all the properties of a job coded from scratch.