Running a Validation Process

Sample CDISC SDTM 3.1.3 Driver Program: validate_data.sas

Overview

Each SAS Clinical Standards Toolkit process uses a SAS driver program to set up the program execution flow. The following steps show the execution flow in a typical SAS driver program to perform the SAS Clinical Standards Toolkit validation. For example, the CDISC SDTM 3.1.3 validation driver program is in: sample study library directory/cdisc-sdtm-3.1.3–1.6.

Step 1: Define macro variables required by the validation process.

%let _cstStandard=CDISC-SDTM;
%let _cstStandardVersion=3.1.3;
%let _cstVersion=;
%let _cstCTPath=;
%let _cstCTMemname=;
%let _cstCTDescription=;
These macro variables are used as substitution parameters later in the driver program to reduce the number of code changes required.
%cst_setStandardProperties(_cstStandard=CST-FRAMEWORK,_cstSubType=initialize);
Initialize the minimum set of global macro variables used to run any SAS Clinical Standards Toolkit process. This includes the names of work data sets, default locations of files, and metadata used to populate the process Results data set.
Each registered standard should have its own initialize.properties. For each standard that is included in a specific process, the cst_setstandardproperties macro can be called at this point. Alternatively, type=properties records can be added to the SASReferences data set, and the properties are processed when the cstutil_allocatesasreferences macro is called. This latter approach is followed in the SDTM validate_data.sas driver program.
%cst_getRegisteredStandards(_cstOutputDS=work._cstStandards);
data _null_;
  set work._cstStandards (where=(standard="CST-FRAMEWORK"));
  call symputx('_cstVersion',strip(productrevision));
run;
Get the list of registered standards to determine the version of the SAS Clinical Standards Toolkit.
* Set Controlled Terminology version for this process  *;
%cst_getstandardsubtypes(_cstStandard=CDISC-TERMINOLOGY,_cstOutputDS=work._cstStdSubTypes);
data _null_;
  set work._cstStdSubTypes (where=(standardversion="&_cstStandard" and isstandarddefault='Y'));
  * User can override CT version of interest by specifying a different where clause:         *;
  * Example: (where=(standardversion="&_cstStandard" and standardsubtypeversion='201104'))   *;
  call symputx('_cstCTPath',path);
  call symputx('_cstCTMemname',memname);
  call symputx('_cstCTDescription',description);
run;


proc datasets lib=work nolist;
  delete _cstStandards _cstStdSubTypes;
quit;
Choose the default controlled terminology that is associated with the _cstStandard and _cstStandardVersion. Cleanup work files.
*********************************************************************************************;
* The following data step sets (at a minimum) the studyrootpath and studyoutputpath.  These *;
* are used to make the driver programs portable across platforms and allow the code to be   *;
* run with minimal modification. These macro variables by default point to locations within *;
* the cstSampleLibrary, set during install but modifiable thereafter.  The cstSampleLibrary *;
* is assumed to allow write operations by this driver module.                               *;
*********************************************************************************************;

%cstutil_setcstsroot;
data _null_;
  call symput('studyRootPath',cats("&_cstSRoot",
              "/cdisc-sdtm-3.1.3-&_cstVersion/sascstdemodata"));
  call symput('studyOutputPath',cats("&_cstSRoot",
              "/cdisc-sdtm-3.1.3-&_cstVersion/sascstdemodata"));
run;
Note: &_cstSRoot is set by the call to cstutil_setcstsroot to the location of the cstSampleLibrary that was defined during the product installation.
%let workPath=%sysfunc(pathname(work));
The workPath value provides the path to the Work directory. This directory is referenced within the sample study SASReferences data set path column. It is not required.

Step 2: Build and populate the SASReferences data set

%let _cstSetupSrc=SASREFERENCES;

*****************************************************************************************;
* One strategy to defining the required library and file metadata for a CST process     *;
*  is to optionally build SASReferences in the WORK library.  An example of how to do   *;
*  this follows.                                                                        *;
*                                                                                       *;
* The call to cstutil_processsetup below tells CST how SASReferences will be provided   *;
*  and referenced.  If SASReferences is built in work, the call to cstutil_processsetup *;
*  may, assuming all defaults, be as simple as  %cstutil_processsetup()                 *;
*****************************************************************************************;


*****************************************************************************************;
* Build the SASReferences data set                                                       *;
* column order:  standard, standardversion, type, subtype, sasref, reftype, iotype,      *;
*                filetype, allowoverwrite, relpathprefix, path, order, memname, comment  *;
* note that &_cstGRoot points to the Global Library root directory                       *;
* path and memname are not required for Global Library references - defaults will be used*;
******************************************************************************************;
%cst_createdsfromtemplate(_cstStandard=CST-FRAMEWORK, _cstType=control,_cstSubType=reference, 
                          _cstOutputDS=work.sasreferences);
proc sql;
  insert into work.sasreferences
  values ("CST-FRAMEWORK" "1.2" "messages" "" "messages" "libref"  "input"  "dataset"  
          "N"  "" "" 1 "" "")
  values ("&_cstStandard" "&_cstStandardVersion" "control" "validation" "cntl_v" "libref"
          "input" "dataset" "N" "" "&studyRootPath/control" . "validation_control.sas7bdat" "")
  [etc.]
  ;
quit;
The cst_createdsfromtemplate macro initializes the SASReferences data set that is required for SDTM validation. The SASReferences data set defines the location and name of each input metadata source, input data source, and output file that is created by the validation process, including the Validation Control data set. The Validation Control data set contains the set of checks to include in the validation process. The sample validate_data.sas driver program sets the path of the Validation Control data set to &studyRootPath/control and sets the name to validation_control.sas7bdat. Based on the code executed in step 1, this is the path:
sample study library directory/cdisc-sdtm-3.1.3/sascstdemodata/control/validation_control.sas7bdat.
For an explanation of the purpose and content of each SASReferences file, see SASReferences File. For a fully initialized SASReferences data set for SDTM validation, see Sample SASReferences File for CDISC SDTM Validation.

Step 3: Call the cstutil_processsetup macro.

The cstutil_processsetup macro completes process setup. It ensures that all SAS librefs and filerefs are allocated; all system options, macro autocall paths, and format search paths are set; and that all global macro variables that are required by the process have been appropriately initialized.
Note: For more information about the cstutil_processsetup macro, see the SAS Clinical Data Standards Toolkit: Macro API Documentation.
The cstutil_processsetup macro call:
%cstutil_processsetup();
in the validate_data.sas driver reflects the acceptance of the macro parameter defaults listed above.
The cstutil_processsetup macro parameter values tell the process where to find the SASReferences data set.
*********************************************************************;
* Set global macro variables for the location of the sasreferences  *;
* file (overrides default properties initialized above              *;
*********************************************************************;

%let _cstSASRefsName=&_cstSASReferencesName;
%let _cstSASRefsLoc=&_cstSASReferencesLocation;
The final setup step for the cstutil_processsetup macro is a call to the cstutil_allocatesasreferences utility macro. The SASReferences data set is now interpreted by the SAS Clinical Standards Toolkit. These actions complete the process:
  1. The cst_insertstandardsasrefs macro is called to insert paths into any records that are missing path information. The information is captured from the StandardSASReferences data set for each standard. For more information about how this works, see Inserting Information from Registered Standards into a SASReferences File.
  2. Multiple calls to the cstutilvalidatesasreferences macro are made to perform internal validation on the SASReferences data set.
    The validation performed by the cstutilvalidatesasreferences macro is described in theAssessing Structural Integrity and Content.
  3. All filerefs and librefs are allocated. (This action is contingent on the _cstReallocateSASRefs property or global macro variable value).
  4. Any property files are passed to the cst_setproperties macro to create global macro variables.
  5. The format search path is set if any type=fmtsearch records are found. This is based on the order specified.
  6. The autocall path is set if any type=autocall records are found. This is based on the order specified.
  7. A Messages data set is created to contain records from each referenced standard. This data set is based on the _cstMessages and _cstMessageOrder properties or global macro variable values. This data set is used for the duration of the process to add fully resolved messages to the Results data set.
At this point, all libraries should be allocated, all paths and global macros should be set, and the global status macro variable _cst_rc should be set to 0. The process is ready to proceed.
CAUTION:
The SASReferences data set is key to the process, and any errors will cause the process to fail.
This is a common process failure point because of the importance of the SASReferences data set. For tips on debugging problems with the SASReferences data set, see Special Topic: Debugging a Validation Process and Assessing Structural Integrity and Content.

Step 4: Run validation tasks.

* Run the standard-specific validation macro. ;
%sdtm_validate;
The sdtm_validate macro performs these tasks:
  1. The macro looks up the Validation Control data set reference from SASReferences.
  2. The macro re-sorts the Validation Control data set based on the _cstCheckSortOrder property or global macro variable value. This step is optional.
  3. Metadata about the validation process, such as the standard/version, key files referenced, and process datetimes, is added to the process Results data set.
  4. For each check in the Validation Control data set with a checkstatus > 0, this macro calls the check macro specified in the Validation Control codesource field. It passes all of the check metadata to the check macro.
  5. After all of the checks are run, these events happen:
    • The results are saved to the file specified in SASReferences (type=results, subtype=validationresults).
    • Any process results are summarized in the Metrics data set if specified.
    • The metrics are saved to the file specified in SASReferences (type=results, subtype=validationmetrics).
    • Various SAS Work files are cleaned up if needed.
For tips on debugging if unexpected errors occur, see Special Topic: Debugging a Validation Process.

Step 5: Clean up the session.

* Clean up the SAS Clinical Standards Toolkit process 
files, macro variables and macros.;
%cstutil_cleanupcstsession(
    _cstClearCompiledMacros=0,
    _cstClearLibRefs=0,
    _cstResetSASAutos=0,
    _cstResetCmpLib=0,
    _cstResetFmtSearch=0,
    _cstResetSASOptions=1,
    _cstDeleteFiles=1,
    _cstDeleteGlobalMacroVars=0);
This step is optional, and it is unnecessary with batch processing. You should not clean up prematurely or aggressively if additional SAS Clinical Standards Toolkit processes are to be run in the same interactive SAS session.
Note: For more information about the cstutil_cleanupcstsession macro, see the SAS Clinical Data Standards Toolkit: Macro API Documentation.

Validation Results and Metrics

For SAS Clinical Standards Toolkit validation processes, the primary products of each validation process are the Results data set and the Metrics data set. These data sets itemize and summarize the findings of the validation process.
Example of a Validation Results Data Set (#1) summarizes a sample validation process. Here are a few facts about the sample validation process:
  1. The validation process was run on CDISC SDTM 3.1.3 source data.
  2. It referenced a Validation Control data set that contained metadata for four checks.
  3. It included SASReferences records to persist the results as results.validation_results and results.validation_metrics.
    Note: In these displays, some rows have been hidden to reduce redundant examples.
    Example of a Validation Results Data Set (#1)
    Example of a Validation Results Data Set (#1)
    Example of a Validation Results Data Set (#2)
    Example of Validation Results data set (#2)
    Comments about the Validation Results Data Sets in Displays 6.9 and 6.10
    Lines
    Comment
    1,6,7
    Informational notes about processing the properties files.
    2
    Informational note saying that the creation of work.sasreferences was successful.
    3
    Informational note from cstutil_processsetup that informs you of the location of the SASReferences data set.
    4-5
    Informational notes that inform you that the process SASReferences data set passed internal validation using the cstutilvalidatesasreferences macro called from two different macros.
    8-17
    Informational summary that provides internal documentation about the process.
    18-19
    Checks SDTM0101 and SDTM0130 ran without error.
    20
    An error was detected in the SRCDATA.RS domain. The keyvalues column identifies the problem RS record, and the actual column reports the values that are in error.
    21-22
    Check SDTM0451 performs a terminology lookup for the AELLT column in SRCDATA.AE using the ctref.meddra data set. The ctref SAS libref was defined in the SASReferences type=referencecterm record pointing to the SAS library containing the medDRA data set. The keyvalues column identifies the problem AE record, and the actual column reports that the problem AELLT value in error was blank.
    Example of a Validation Metrics Data Set
    Example of a Validation Metrics Data Set
    Comments about the Validation Metrics Data Set
    Lines
    Comment
    1-2
    In check SDTM0101, 70 subjects and 5 date columns for each DM subject were evaluated.
    3
    Check SDTM0101 took one second to run using cstcheck_column.
    10
    Check SDTM0451 evaluated the AELLT column for each of the 106 SRCDATA.AE records.
    12
    A summary metric of unique check invocations.
    13
    A summary metric of the number of checks that failed to run. (These metrics are defined as distinct checkid and resultseq combinations in the Results data set where resultflag=-1).
    14-18
    Summary metric counts of the number of records, by type of metric, in the Results data set.
Here are some general observations:
  • The absence of a value in the results.checkid field can be used as an indicator of whether messaging has been set up. If the checkid field is nonmissing in a Results record, then messaging related to a specific validation check is available.
  • A resultseq value > 1 indicates a repeat invocation of a specific validation check. There should be differences in the Validation Control metadata for the specific validation check.
  • The seqno field is intended to be a record (message) counter in each specific check invocation. Generally, this value starts with 1 on the first record, and increments by 1 until the last record for each checkid and resultseq combination. One exception is with the Validation Control column reportAll=N. This signals the code to not write a record to the Results data set for each record in error. However, seqno continues to increment in this case, resulting in a gap in seqno values, with the last seqno approximating the total number of records in error.
A set of sample validation reports is available to summarize the SAS Clinical Standards Toolkit validation process results and metrics. For more information, see Reporting.