Building a Validation Process

Building a SAS Clinical Standards Toolkit validation process is similar to building any SAS Clinical Standards Toolkit process. The differences are the validation process inputs and outputs, as defined in the SASReferences data set, can differ, a standard-specific validate macro is called, and process output can include an optional Metrics data set.

SASReferences Customizations

A SAS Clinical Standards Toolkit validation process requires that you specify a reference standard with which the source data and metadata can be compared. The following three records, specific to the standard and standardversion of interest, should be included in the SASReferences data set:
Defining the Reference Standard in the SASReferences Data Set
Three records should be included in the SASReferences data set.
The empty path field signals that the path and memname information should be derived from the StandardSASReferences data set associated with the standard and standardversion. Including the referencecontrol and referencemetadata records is unique to validation process in the SAS Clinical Standards Toolkit.
SAS Clinical Standards Toolkit validation can include references to the following files:
  1. A validation-specific properties file.
    Defining the Validation-Specific Properties File in the SASReferences Data Set
    A validation-specific properties file
    The Validation.Properties file sets process global macro variables specific to validation, such as metrics. For a complete discussion of these properties, see Validation.Properties. For information about the derived global macro variables, see Overview. The Validation.Properties file is a required file to support SAS Clinical Standards Toolkit validation.
    For CDISC CRT-DDS, validation properties have been included in the standard-specific Initialize.Properties file. Validation properties do not need to be separately referenced in SASReferences.
  2. The output location of any process-generated Metrics data set.
    Defining the Metrics Output Location in the SASReferences Data Set
    Definition of the output location
    The Metrics data set provides a summary of the validation process, including error counts, processing time, and denominators for specific checks. For a complete discussion of validation metrics, see Validation Metrics and Validation Results and Metrics. For information about the global macro variables that govern metrics output, see Overview. The Metrics data set is typically output to the same location as the validation Results data set. This location is common to all SAS Clinical Standards Toolkit processes.
  3. The location of any libraries containing controlled terminology, format catalogs, and coding dictionary data sets.
    Defining Controlled Terminology in the SASReferences Data Set
    Defining controlled terminology in the SASReferences data set
    The type=fmtsearch records enable you to specify multiple format catalogs (for example, company-wide, compound, group-level, and study-level). Order in the format search path is set by the order field. The type=referencecterm record enables you to specify one or more lookup data sets (such as dictionary lookups like LOINC and MedDRA). These lookup data sets do not need to conform to a specific structure, and they do not need to be in a structure that can be read into a SAS format. Customized code (typically in the Validation Master codelogic field) is required to join domain data with each associated lookup data set.
  4. The location of the run-time Validation Control data set.
    Defining the Run-Time Validation Control Location in the SASReferences Data Set
    Defining the run-time Validation Control location in the SASReferences data set
The Validation Control data set is required and discussed in the following section.

Validation Control: Specification of Run-Time Checks

Each SAS Clinical Standards Toolkit validation process requires you to specify the validation checks to be run. This is accomplished by cloning, subsetting, or building a set of validation checks based on the Validation Master data set. (See Validation Check of Metadata: Validation Master.) The SAS Clinical Standards Toolkit assumes that each Validation Control data set is structurally equivalent to the Validation Master data set.
A sample CDISC SDTM 3.1.1 Validation Control data set is deployed to the following SAS 9.1.3 directory. (The deployed location for SAS 9.2 is different, but similar.)
!sasroot/../SASClinicalStandardsToolkitSDTM311/1.3/sample/cdisc-sdtm-3.1.1/sascstdemodata/control
By default, the Validation Control data set name is validation_control.sas7bdat.
As a required input to a validation process, the Validation Control data set must be referenced in the run-time SASReferences file. The following display shows how the SASReferences file and the Validation Control data set are defined in the sample CDISC SDTM 3.1.1 SASReferences data set:
Defining Validation Control and SASReferences Data Set Locations
Referencing the Validation Control data set in the run-time SASReferences file
The &studyRootPath value is assumed to have been set to !sasroot/../SASClinicalStandardsToolkitSDTM311/1.3/sample/cdisc-sdtm-3.1.1/sascstdemodata.
The following table provides examples of how to create a Validation Control data set from the Validation Master data set. The sample code is written assuming that the code will be submitted in a context where libraries have been allocated and the format search and autocall paths have been set.
Sample Code to Create Validation Control Data Set
Check Subset
Sample Code
All checks provided with the SAS Clinical Standards Toolkit.
data control.validation_control;
set refcntl.validation_master;
run;
Structural checks (metadata-only checks that do not require access to the domain data).
data control.validation_control;
set refcntl.validation_master (where=(upcase(checktype)="METADATA"));run;
Content checks (checks that require access to the domain data).
data control.validation_control;
set refcntl.validation_master (where=(upcase(checktype) ne "METADATA"));
run;
Checks with a production status.
data control.validation_control;
set refcntl.validation_master (where=(checkstatus>0));
run;
WebSDM checks (CDISC SDTM only).
data control.validation_control;
set refcntl.validation_master (where=(upcase(checksource)= "WEBSDM"));
run;
Sampling of checks, one for each check macro.
proc sort data=refcntl.validation_master out=work.control;
by codesource checkid;
run;
data work.control;
set work.control;
by codesource;
if first.codesource;
run;
proc sort data=work.control out=control.validation_control (label="Check sampler");
by checkid;
run;
CDISC SDTM 3.1.1 checks.
data control.validation_control;
set refcntl.validation_master (where=(standardVersion = "3.1.1" or standardVersion = "***"));
run;
All codelist-related checks (checks that use the cstcheck_notincodelist macro).
data control.validation_control;
set refcntl.validation_master (where=(upcase(checksource)="CSTCHECK_NOTINCODELIST"));
run;
All checks applicable to a specific domain.
%macro buildcheckdomainlist (_cstCheckDS=,_cstOutputDS=work._cstcheckdomains);
%let _cstOldCheckID=;
%let _cstCheckSeqCount=0;
data _null_;
if 0 then set &_cstCheckDS nobs=_numobs;
call symputx('_cstCheckCnt',_numobs);
stop;
run;
data &_cstOutputDS;
attrib checkid format=$8. label="Validation check identifier"
table format=$32. label="Table Name"
standardversion format=$20. label="Standard version"
checksource format=$40. label="Source of check"
resultseq format=8. label="Unique invocation of check";
stop;
run;
%do check=1 %to &_cstCheckCnt;
data _null_;
set &_cstCheckDS (keep=checkid standardversion checksource tablescope columnscope usesourcemetadata
firstObs=&check);
call symputx('_cstCheckID',checkid);
call symputx('_cstStandardVersion',standardversion);
call symputx('_cstChecksource',checksource);
call symputx('_cstTableScope',tablescope);
call symputx('_cstColumnScope',columnscope);
call symputx('_cstUseSourceMetadata',usesourcemetadata);
stop;
run;
%if &_cstCheckID=&_cstOldCheckID %then %do;
%let _cstCheckSeqCount=%eval(&_cstCheckSeqCount+1) ;
%end;
%else %let _cstCheckSeqCount=1;
%* Call macro to interpret tableScope and columnScope to build work._cstcolumnmetadata for each check *;
%* _cstDomSubOverride=Y parameter allows us to also look at check records with unequal sublist lengths *;
%cstutil_buildcollist(_cstFormatType=DATASET,_cstDomSubOverride=Y);
proc sql noprint;
create table work._csttempds as
select distinct table, "&_cstCheckID" as checkid length=8,
&_cstCheckSeqCount as resultseq,
"&_cstStandardVersion" as standardversion length=20,
"&_cstChecksource" as checksource length=40
from work._cstcolumnmetadata;
quit;
proc append base=&_cstOutputDS data=work._csttempds force;
run;
%let _cstOldCheckID=&_cstCheckID;
* Clear contents for next loop, in case of problems *;
data work._csttempds;
set work._csttempds;
if _n_=1 then stop;
run;
%end;
%mend;
%* Run this only once per stable reference validation_master - it takes a while... ;
%buildcheckdomainlist(_cstCheckDS=refcntl.validation_master);
%* The libname for validation_control is assigned in sasreferences. In the sample study it is cntl_v. This might need to be changed either in this macro or the call to it.;
%macro subsetdomainlist(_cstInputDS=work._cstcheckdomains,_cstOutputDS=cntl_v.validation_control,
_cstDomain=);
proc sql noprint;
create table &_cstOutputDS as
select vm.* from refcntl.validation_master vm
right join &_cstInputDS dom
on vm.checkid=dom.checkid and vm.standardversion=dom.standardversion and
vm.checksource=dom.checksource
where table="&_cstDomain";
quit;
%mend;
%* Example call: subset validation data set just to those checks for the specified domain ;
%* Returns all records for checkid/standardversion/checksource if any matches domain - needs tweaking... ;
%subsetdomainlist(_cstDomain=AE);
Generally, the SAS Clinical Standards Toolkit processes validation checks in the order in which they appear in the Validation Control data set. Each validation process honors the default validation property _cstCheckSortOrder. If this property is not set, then the data set order is assumed. As a part of the Validation Control derivation, checks can be sorted in any user-defined order. Or, _cstCheckSortOrder can be set to sort the Validation Control data set at run time by any fields in that data set.
Best Practice Recommendation: Users might find the prioritization of checks to be helpful in identifying problems early in the process, or for using as prerequisites for checks that follow.

Setting Properties for the Validation Process

Across all standards, the set of properties that are available for a validation process is extensive. (For the full list of properties, see Overview.) However, only a few properties are modified on a regular basis. These include:
  • _cstSASRefsLoc, If you want to point to another location for the SASReferences file.
  • _cstSASRefsName, which points to another SASReferences filename.
  • _cstSASRefs, which points to a specific libref.sasreferences file to use. (This file is typically in Work.)
  • _cstSubjectColumns, which resets the columns that identify a subject.
  • _cstReallocateSASRefs, which reallocates SAS librefs and filerefs in the same SAS session, typically when changing studies or standards.
  • _cstFMTLibraries, which modifies the format search path built from SASReferences. This change is most often used to add a reference to a Work format catalog.
  • _cstCheckSortOrder, which provides a set of Validation Control columns to resort the check processing order.
  • _cstMetrics, set to 1 to enable metrics calculations and reporting.
  • _cstDebug, which turns on or off debugging for the session.
  • _cstDebugOptions, which alters the SAS options when debugging.
These changes should be made before the process setup begins (as changes to the properties file), or after the process setup ends (as a series of %let statements in the code stream).
Best Practice Recommendation: Centralizing property changes in properties files, rather than distributing them in code segments, offers advantages for debugging and documenting processes. Properties are translated to global macro variables by calls to the cst_setstandardproperties or cst_setproperties framework utility macros during process setup. They are reported in the SAS log, and are generally documented in the process SASReferences file.