Special Topic: How the SAS Clinical Standards Toolkit Interprets Validation Check Metadata

Overview

Four Validation Master metadata fields are key to how the SAS Clinical Standards Toolkit processes source data and source metadata: usesourcemetadata, tablescope, columnscope, and codelogic.
The SAS Clinical Standards Toolkit uses usesourcemetadata to point to the correct metadata. If usesourcemetadata is set to Y, then the SAS Clinical Standards Toolkit knows that the source metadata (source_tables and source_columns) is to be used to derive the domains and columns to be evaluated for compliance to the standard. If usesourcemetadata is set to N, reference metadata (reference_tables and reference_columns) is to be used.
The SAS Clinical Standards Toolkit uses the tablescope and columnscope values to build the work._csttablemetadata and work._cstcolumnmetadata data sets. Based on the values of these fields, the SAS Clinical Standards Toolkit creates a subset of source metadata or reference metadata that represents the union of tablescope and columnscope. The SAS Clinical Standards Toolkit builds columns specified in columnscope that also exist in the tables specified in tablescope.
For those checks that use codelogic, the SAS Clinical Standards Toolkit builds local macro variables to communicate tablescope and columnscope settings to the code. Simple examples are each domain is interpreted as &_cstDSName, and each column is interpreted as &_cstColumn.
Code logic is run. If the check code logic is a statement (codetype=1 or 3), then _cstError=1 is generally set. If the check code logic is a DATA step or PROC SQL code segment (codetype=2 or 4), then work.cstproblems is created.

Case Study 1: CDISC SDTM Check SDTM0604

In this case study, whether the sequence numbers (**SEQ) used in various domains are consecutively incremented beginning at 1 for each USUBJID is determined.
There are specific values to assign to usesourcemetadata, tablescope, and columnscope to set up a proper test of sequence numbers. First, you want to include the domains you actually have (that is, source data and metadata). So, set usesourcemetadata to Y. Next, you want to test all domains that contain sequence numbers. So, set tablescope to _ALL_. Because each domain uses a domain-specific name for sequence number, set columnscope to "**SEQ".
This is the code logic for CDISC SDTM check SDTM0604:
%let _cstLastKey=%kscan(%quote(&_cstSubjectKeys),-1,",");
data work._cstproblems (drop=count);
 set &_cstDSName (keep=&_cstDSKeys &_cstColumn);
 by &_cstDSKeys;
 if first.&_cstLastKey then count=1;
 else count+1;
 if &_cstcolumn ne count then output;
run; 
These five macro variables are used in this code. They are representative of variables set in many of the check macros before calling code logic. See each validation check macro for local macro variables available to code logic.
  • _cstDSName is the name of the domain, as set in the calling code module.
  • _cstSubjectKeys is the set of keys that define a subject. It is set once as a global macro variable in a standard-specific properties file. For CDISC-SDTM, the value of _cstSubjectKeys is set to STUDYID USUBJID by default.
  • _cstDSKeys contains the data set keys for _cstDSName. Keys are derived from the table metadata for that domain (source_tables.keys).
  • _cstLastKey is the last subject key. In the CDISC SDTM case, the value is USUBJID.
  • _cstColumn is the column of interest (sequence number). This variable is specific to the _cstDSName domain.
Processing based on Validation Master metadata fields results in records being added to work._cstproblems for any record that does not match the record counter within the subject.
However, there are two records in the Validation Master check data set for the CDISC SDTM check SDTM0604. The tablescope and columnscope settings for each record differ from the previous description. The CDISC SDTM TS (Trial Summary) domain does not contain the subject key USUBJID. The previous code logic does run against the TS domain without failing. (But, the SAS log indicates a problem: NOTE: Variable first.USUBJID is uninitialized.). A better solution is offered in the Validation Master check data set with the two records.
Multiple Validation Check Invocations for a Specific CheckID
checkid
tablescope
columnscope
code logic
SDTM0604
_ALL_-TS
**SEQ
%let _cstLastKey=%kscan(%quote(&_cstSubjectKeys),-1,",");
data work._cstproblems (drop=count);
set &_cstDSName (keep=&_cstDSKeys &_cstColumn);
by &_cstDSKeys;
if first.&_cstLastKey then count=1;
else count+1;
if &_cstcolumn ne count then output;
run;
SDTM0604
TS
TSSEQ
data work._cstproblems;
set &_cstDSName (keep=&_cstDSKeys &_cstColumn);
if &_cstcolumn ne _n_ then output;
run;

Case Study 2: CDISC SDTM 3.1.1 Check SDTM0623

In this case study, whether the values for standard units (**STRESU) are consistent within each test code (**TESTCD) across all records in the CDISC SDTM findings domains is determined.
You want to include the domains you actually have (that is, source data and metadata). So, set usesourcemetadata to Y. Next, you want to test all findings domains, which typically contain these two domain columns (**STRESU and **TESTCD). So, you might want to set tablescope to CLASS:FINDINGS. Because you want to compare two columns in each domain, set columnscope to [**TESTCD][**STRESU]. (For more information about tablescope and columnscope syntax, see Column Descriptions of the Validation Master Data Set.)
Here is the code logic for CDISC SDTM check SDTM0623:
data work._cstunique;
   set work._cstunique;
          by &_cstColumn1 &_cstColumn2;
   if first.&_cstColumn1=0 or last.&_cstColumn1=0 then _checkError=1;
run;
proc sort data=&_cstDSName out=&_cstclds;
    by &_cstColumn1 &_cstColumn2;
run;
data work._cstuniqueerrors;
    merge work._cstunique (where=(_checkerror=1) in=un)
                 &_cstclds (in=ds);
      by &_cstColumn1 &_cstColumn2;
    if un and ds and first.&_cstColumn2;
run;
This case study shows how the SAS Clinical Standards Toolkit uses local macro variables for column comparisons. The columnscope syntax [**TESTCD][**STRESU] tells the SAS Clinical Standards Toolkit to create two sublists. The first sublist is for all TESTCD columns, and the second is for all STRESU columns. These are referenced as &_cstColumn1 and &_cstColumn2 in code logic, respectively.
In this case, the validation check macro that calls and interprets code logic output (cstcheck_notunique) reports all work._cstuniqueerrors records as failing this instance of CDISC SDTM check SDTM0623.
It fails now because of how it has been configured. The following sections show how to solve the problem. The generated Results data set contains this excerpt:
Example of a Results Data Set Excerpt for Check SDTM0623
Example of a Results Data Set Excerpt for Check SDTM0623
The actual and resultdetails values give clues about the problem. The SAS Clinical Standards Toolkit resolves the columnscope sublist [**TESTCD] to five columns. It resolves the sublist [**STRESU] to four columns. The SAS Clinical Standards Toolkit column comparisons require sublists of equal length so that valid comparisons can be made. There appears to be a findings domain that has TESTCD, but not STRESU. In this case, the domain IE does not have the column IESTRESU. Attempting to compare IETESTCD with LBSTRESU is not the intention.
Tablescope and columnscope syntax supports wildcarding and addition and subtraction operators. However, this flexible functionality is not required. You can submit explicit table and column references. CDISC SDTM check SDTM0623 could be defined in the Validation Master data set as shown here:
tablescope
columnscope
EG
[EGTESTCD][EGSTRESU]
LB
[LBTESTCD][LBSTRESU]
SC
[SCTESTCD][SCSTRESU]
VS
[VSTESTCD][VSSTRESU]
Consider this alternative definition for the check:
tablescope
columnscope
CLASS:FINDINGS-IE
[**TESTCD][**STRESU]
Both of the above definitions will run correctly, but do not yet match the record metadata for SDTM0623 in the SAS Validation Master data set:
tablescope
columnscope
CLASS:FINDINGS-LB-IE
[**TESTCD][**STRESU]
The reason LB is excluded from tablescope is because CDISC SDTM check SDTM0631 is a specific test of these LB domain columns (the Validation Master checksource and sourceid fields show SDTM0631 to be an implementation of the WebSDM check IR5006). SDTM0623 is simply a generalization of SDTM0631 to include all findings domains. There is no reason to redundantly test LB.