Running a Validation Process

Sample CDISC SDTM 3.1.1 Driver Program: validate_data.sas

Overview

Each SAS Clinical Standards Toolkit process uses a SAS driver module to set up the program execution flow. The following steps show the execution flow in a typical SAS driver module to perform the SAS Clinical Standards Toolkit validation. For example, in a SAS 9.3 deployment, the CDISC SDTM 3.1.2 validation driver module is in: !sasroot/../../SASClinicalStandardsToolkitSDTM312/1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata/programs/validate_data.sas

Step 1: Define the locations of the study data and the metadata.

/* There are several ways to define the study data and metadata 
locations. These include (but are not limited to):  
    - Pre-allocation of libraries through some user-defined set-up mechanism
    - Definition within a user-defined driver program such as this one
    - Full explicit definition within a work sasreferences control data set
    - Use of a global macro variable referenced within each sasreferences file
 
This driver program illustrates use of the last mechanism, setting the 
global macro variables studyRootPath and studyOutputPath, which are referenced
within the sample study sasreferences data set path column.
 
Note this example is dependent on the SAS version and installation folder structure. */
data _null_;
 select("&sysver");
  when("9.1")
  do;
   call symput('studyRootPath','!sasroot/../SASClinicalStandardsToolkitSDTM312
    /1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata');
   call symput('studyOutputPath','!sasroot/../SASClinicalStandardsToolkitSDTM312
    /1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata');
  end;
  otherwise do;
   call symput('studyRootPath','!sasroot/../../SASClinicalStandardsToolkitSDTM312
    /1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata');
   call symput('studyOutputPath','!sasroot/../../SASClinicalStandardsToolkitSDTM312
    /1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata');
   end;
  end;
run;
The &studyRootPath and &studyOutputPath variables facilitate code standardization and portability. They are not required.
%let workPath=%sysfunc(pathname(work));
The workPath value provides the path for the Work directory. This directory is referenced within the sample study SASReferences data set path column. It is not required.
* Note the number of calls should match the unique 
studyOutputPath subdirectories in sasreferences  *;
%****let studyOutputPath=users/myname/mystudy; *<--- example user override *;
%****csutil_createsubdir(_cstSubDir=results); *<--- example user override *;
The SAS Clinical Standards Toolkit processes normally create one or more output files. These files might reside in the Work directory or point to some external location. The &studyRootPath variable points to read-only locations in the !sasroot folder hierarchy. The &studyOutputPath variable points to writable locations for process output, often in the !sasroot folder hierarchy. UNIX users (or any users) might find it necessary to reset &studyOutputPath to some write-enabled location since the !sasroot directories are typically Write protected. For these users, calls to the %cstutil_createsubdir macro create any workpath subdirectories that are expected by SASReferences records and set &studyOutputPath to workpath.
%let _cstSetupSrc=SASREFERENCES;
%let _cstStandard=CDISC-SDTM;
%let _cstStandardVersion=3.1.2;
These convenience macro variables are used primarily for reporting purposes later in the validation process.

Step 2: Set the SAS Clinical Standards Toolkit framework properties and the global macro variables.

* Set properties provided as part of the CST-FRAMEWORK standard. ;
%cst_setstandardproperties(
_cstStandard=CST-FRAMEWORK,_cstSubType=initialize);
%cst_createds(_cstStandard=CST-FRAMEWORK, 
_cstType=control,_cstSubType=reference, 
_cstOutputDS=work.sasreferences);
Each registered standard should have its own initialize.properties. For each standard that is included in a specific process, the %cst_setstandardproperties macro can be called at this point. Alternatively, type=properties records can be added to the SASReferences data set, and properties are processed when the %cstutil_allocatesasreferences macro is called. This latter approach is followed in the SDTM validate_data.sas driver module.

Step 3: Set the version of the controlled terminology.

The following macro selects the version of the controlled terminology required by the validation process. In the following example, the standard is CDISC-Terminology. The %cst_getstandardsubtypes macro builds a work data set named work._cstStdSubTypes, which contains all records for the CDISC-Terminology standard. A DATA_NULL_step subsets the values to a specific &_cstStandard (in this case, CDISC-SDTM). The value of the variable isstandarddefault is equal to Y. Values from this single observation are used to create the macro values for &_cstCTRoot and &_cstCTDescription. You can override the version of the controlled terminology by setting &_cstCTRoot to any directory. (This is shown in the commented line.)
* Set Controlled Terminology version for this process  *;
%cst_getstandardsubtypes(_cstStandard=CDISC-TERMINOLOGY,_cstOutputDS=work._cstStdSubTypes);
data _null_;
set work._cstStdSubTypes (where=(standardversion="&_cstStandard" and isstandarddefault='Y'));
call symputx('_cstCTRoot',cats(lowcase(standardversion),'/',
lowcase(standardsubtypeversion)));
call symputx('_cstCTDescription',description);
run;
%**let _cstCTRoot=cdisc-sdtm/201003;  * <----- User can override 
CT version of interest  *;  
The &_cstCTRoot macro variable is used to set the controlled terminology directory in the work.sasreferences data set in the next step.

Step 4: Create an empty work.sasreferences data set to be populated in the validation process, and build the work.sasreferences data set.

Create work.sasreferences.
%cst_createds(_cstStandard=CST-FRAMEWORK, _cstType=control,
_cstSubType=reference, _cstOutputDS=work.sasreferences);
The validate_data.sas module initializes the SASReferences data set that is required for SDTM validation. The SASReferences data set defines the location and name of the Validation Control data set. The Validation Control data set contains the set of checks to be included in the validation process. The sample validate_data.sas driver program, sets the path of the Validation Control data set to &studyRootPath/control and name to validation_control.sas7bdat. In SAS 9.3, this translates to !sasroot/../../SASClinicalStandardsToolkitSDTM312/1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata/control/validation_control.sas7bdat. For an explanation of the purpose and content of each SASReferences file, see SASReferences File. For a fully initialized SASReferences data set for SDTM validation, see Sample SASReferences File for CDISC SDTM Validation.

Step 5: Call the %cstutil_processsetup macro.

The %cstutil_processsetup macro completes process setup. It ensures that all SAS librefs and filerefs are allocated; all system options, macro autocall paths, and format search paths are set; and that all global macro variables that are required by the process have been appropriately initialized.
The %cstutil_processsetup macro uses these parameters.
cstSASReferencesSource
This parameter determines what initial source setup should be based on. Valid values are SASREFERENCES (default) or RESULTS. If RESULTS is specified, then no other parameters are required, and setup responsibility is passed to the cstutil_reportsetup macro. The Results data set name must be passed to cstutil_reportsetup as libref.memname.
cstSASReferencesLocation
This parameter specifies the folder location of the SASReferences data set. (The default value is the path to the Work library.)
cstSASReferencesName
This parameter specifies the name of the SASReferences data set. (The default value is SASREFERENCES.)
The %cstutil_processsetup macro call:
%cstutil_processsetup();
in the validate_data.sas driver reflects the acceptance of the macro parameter defaults listed above.
The %cstutil_processsetup macro parameter values tell the process where to find the SASReferences data set.
*********************************************************************;
* Set global macro variables for the location of the sasreferences  *;
* file (overrides default properties initialized above              *;
*********************************************************************;

%let _cstSASRefsName=&_cstSASReferencesName;
%let _cstSASRefsLoc=&_cstSASReferencesLocation;
The final setup step for the %cstutil_processsetup macro is a call to the %cstutil_allocatesasreferences utility macro. The SASReferences data set is now interpreted by the SAS Clinical Standards Toolkit. These actions complete the process:
  1. The %cst_insertstandardsasrefs macro is called to insert paths into any records that are missing path information. The information is captured from the StandardSASReferences data set for each standard. For more information about how this works, see Inserting Information from Registered Standards into a SASReferences File.
  2. The %cstutil_checkds macro is called to perform internal validation on the SASReferences data set updated in the %cst_insertstandardsasrefs macro.
  3. All filerefs and librefs are allocated. (This action is contingent on the _cstReallocateSASRefs property or global macro variable value).
  4. Any property files are passed to the %cst_setproperties macro to create global macro variables.
  5. The format search path is set if any type=fmtsearch records are found. This is based on the order specified.
  6. The autocall path is set if any type=autocall records are found. This is based on the order specified.
  7. A Messages data set is created to contain records from each referenced standard. This data set is based on the _cstMessages and _cstMessageOrder properties or global macro variable values. This data set is used for the duration of the process to add fully resolved messages to the Results data set.
At this point, all libraries should be allocated, all paths and global macros should be set, and the global status macro variable _cst_rc should be set to 0. The process is ready to proceed.
CAUTION:
The SASReferences data set is key to the process, and any errors will cause the process to fail.
This is a common process failure point because of the importance of the SASReferences data set. For tips on debugging problems with the SASReferences data set, see Special Topic: Debugging a Validation Process.

Step 6: Run validation tasks.

* Run the standard-specific validation macro. ;
%sdtm_validate;
The %sdtm_validate macro performs these tasks:
  1. The macro looks up the Validation Control data set reference from SASReferences.
  2. The macro resorts the Validation Control data set based on the _cstCheckSortOrder property or global macro variable value. This step is optional.
  3. For each check in the Validation Control data set, this macro calls the check macro specified in the Validation Control codesource field. It passes all of the check metadata to the check macro.
  4. After all of the checks are run, these events happen:
    • The results are saved to the file specified in SASReferences (type=results, subtype=validationresults).
    • Any process results are summarized in the Metrics data set if specified.
    • The metrics are saved to the file specified in SASReferences (type=results, subtype=validationmetrics).
    • Various SAS Work files are cleaned up if needed.
For tips on debugging if unexpected errors occur, see Special Topic: Debugging a Validation Process.

Step 7: Clean up the session.

* Clean up the SAS Clinical Standards Toolkit process 
files, macro variables and macros.;
%*cstutil_cleanupcstsession(
     cstClearCompiledMacros=0
    ,cstClearLibRefs=0
    ,cstResetSASAutos=0
    ,cstResetFmtSearch=0
    ,cstResetSASOptions=1
    ,cstDeleteFiles=1
  ,cstDeleteGlobalMacroVars=0);
This step is optional, and it is unnecessary with batch processing. You should not clean up prematurely or aggressively if additional SAS Clinical Standards Toolkit processes are to be run in the same interactive SAS session.

Parameter Details

This table summarizes what the SAS Clinical Standards Toolkit attempts to do when each of the %cstutil_cleanupcstsession macro parameters is enabled.
Parameter Details for the %cstutil_cleanupcstsession Macro
Macro Parameter
Action Attempted
_cstClearCompiledMacros
Delete all macros from the work.sasmacr catalog.
_cstResetSASAutos
Reset the SASAutos path based on the value of the macro variable cstInitSASAutos. This macro parameter is typically set in the driver module to capture the SASAutos value at the start of the SAS Clinical Standards Toolkit process (before calling %cstutil_allocatesasreferences). This parameter is ignored if _cstInitSASAutos does not exist.
_cstClearLibRefs
Clear all filerefs and librefs included in SASReferences, except any autocall filerefs.
_cstResetFmtSearch
Reset the fmtsearch path based on the fmtsearch value at the start of the SAS Clinical Standards Toolkit process. This macro parameter is ignored if the work._cstsessionoptions data set does not exist. To support this functionality, this data set is created in the %cstutil_processsetup macro before calling the %cstutil_allocatesasreferences macro.
_cstResetSASOptions
Reset all SAS options back to their status at the start of the SAS Clinical Standards Toolkit process. This macro parameter is ignored if the work._cstsessionoptions data set does not exist. To support this functionality, this data set is created in the %cstutil_processsetup macro before calling the %cstutil_allocatesasreferences macro.
_cstDeleteFiles
Delete files if the global macro variable _cstDebug=0. Files are &_cstsasrefs, &_cstmessages, and work._cstsessionoptions.
_cstDeleteGlobalMacroVars
Call %symdel for all macro variables found in sashelp.vmacro (where=(lowcase(name) =:"_cst" and scope="GLOBAL")).

Validation Results and Metrics

For SAS Clinical Standards Toolkit validation processes, the primary products of each validation process are the Results data set and the Metrics data set. These data sets itemize and summarize the findings of the validation process.
Example of a Validation Results Data Set (#1) summarizes a sample validation process. Here are a few facts about the sample validation process:
  1. The validation process was run on CDISC SDTM 3.1.1 source data.
  2. It referenced a Validation Control data set that contained metadata for four checks.
  3. It included SASReferences records to persist the results as results.validation_results and results.validation_metrics.
    Note: In these displays, some rows have been hidden to reduce redundant examples.
    Example of a Validation Results Data Set (#1)
    Example of a Validation Results Data Set (#1)
    Note: In this display, some rows have been hidden to reduce redundant examples.
    Example of a Validation Results Data Set (#2)
    Example of Validation Results data set (#2)
    Comments about the Validation Results Data Sets in Displays 6.9 and 6.10
    Lines
    Comment
    1,4,5
    Informational notes about processing the properties files.
    2
    Informational note saying that the creation of work.sasreferences was successful.
    3
    Informational note from cstutil_processsetup that informs you of the location of the SASReferences data set.
    6-14
    Informational summary that provides internal documentation about the process.
    15-16
    Check SDTM0011 detected an error. SRCDATA.SUPPAE exists in the source metadata (source_tables), but does not exist in the reference metadata (reference_tables). This is a metadata-only check that runs against the source_columns metadata (WORK._CSTSRCCOLUMNMETADATA). A warning message is displayed informing you that the check processing was incomplete.
    17
    Check SDTM0218 informational note that is produced by the check macro cstcheck_notincodelist. Note is about the availability of fmtsearch format catalogs.
    18
    Check SDTM0218 completed successfully. No errors were detected.
    35
    Check SDTM0804 completed successfully. No problems were found in the comparison of SRCDATA.DS with SRCDATA.SV.
    53
    In check SDTM0804, one SRCDATA.SU record was found that had invalid VISIT and VISITNUM values, which are relative to records in the SV domain. The actual value in error is listed in the actual column. The keyvalues column identifies the specific record in error.
    89-90
    Check SDTM0851 completed successfully. No errors were detected. Two records (1 and 2 in the resultseq column) are listed because the check was run twice because there is a record for each of two checksources in the Validation Control data set.
    Note: In this display, some rows have been hidden to reduce redundant examples.
    Example of a Validation Metrics Data Set
    Example of a Validation Metrics Data Set
    Comments about the Validation Metrics Data Set
    Lines
    Comment
    1
    In check SDTM0011, 466 columns were evaluated.
    2
    Check SDTM0011 took one second to run using cstcheck_metamismatch.
    5-13
    Check SDTM0218 ran against eight domains. Record counts were provided for each domain. The check took two seconds to run using cstcheck_notincodelist.
    23-32
    Check SDTM0804 ran against nine domains. Record counts were provided for each domain. The check took two seconds to run using cstcheck_comparedomains.
    43-44
    Check SDTM0851 evaluated 28 records in the SRCDATA.CO domain. The check took one second to run using cstcheck_recmismatch.
    47
    A summary metric of unique check invocations. Example of a Validation Metrics Data Set does not itemize all eight checks.
    48
    A summary metric of the number of checks that failed to run. (These metrics are defined as distinct checkid and resultseq combinations in the Results data set where resultflag=-1).
    49-53
    Summary metric counts of the number of records, by type of metric, in the Results data set.
Note: In Example of a Validation Results Data Set (#1)and Example of a Validation Results Data Set (#2), some records in the validation Results data set have been deleted for brevity. This creates an inconsistency with the metrics listed in Example of a Validation Metrics Data Set.
Here are some general observations:
  • The absence of a value in the results.checkid field can be used as an indicator of whether messaging has been set up. If the checkid field is nonmissing in a Results record, then messaging related to a specific validation check is available.
  • A resultseq value > 1 indicates a repeat invocation of a specific validation check. There should be differences in the Validation Control metadata for the specific validation check.
  • The seqno field is intended to be a record (message) counter in each specific check invocation. Generally, this value starts with 1 on the first record, and increments by 1 until the last record for each checkid and resultseq combination. One exception is with the Validation Control column reportAll=N. This signals the code to not write a record to the Results data set for each record in error. However, seqno continues to increment in this case, resulting in a gap in seqno values, with the last seqno approximating the total number of records in error.
A set of sample validation reports is available to summarize the SAS Clinical Standards Toolkit validation process results and metrics. For more information, see Reporting.

Sample CDISC CRT-DDS 1.0 Driver Program: validate_crtdds_data.sas

The SAS Clinical Standards Toolkit validation of the SAS representation of the CDISC CRT-DDS standard follows the same pattern used for CDISC SDTM validation. A sample driver module, validate_crtdds_data.sas, is provided to perform process setup steps and to call the crtdds_validate.sas macro. For a more complete description of the validation of the SAS representation of the CDISC CRT-DDS standard, see XML-Based Standards. In this chapter, the use of the validate_crtdds_data driver module is described.