Special Topic: A Round-Trip Exercise Involving the CDISC SDTM and CDISC CRT-DDS Standards

Overview

The typical SAS Clinical Standards Toolkit workflow in support of the CDISC standards includes the definition and validation of SDTM submission data and the creation and validation of a define.xml file based on the SDTM domain data. This exercise demonstrates how you can read a define.xml file to extract the data and metadata for the purposes of re-creating the original source SDTM study. Re-creating the original source study has value as a stand-alone exercise, either to extract a new SDTM study from a define.xml file or to create a new SDTM study using information in a define.xml file as a template.
As a round-trip exercise, this task validates the performance of the %CRTDDS_WRITE and %CRTDDS_READ macros and allows a comparison of original and re-created SDTM metadata and data. This display details the high-level workflow for this exercise.
Round-Trip Process
Figure of the flow of round tripping the XML process

The Workflow

These steps describe the workflow in more detail. The first five steps describe the derivation of the CDISC CRT-DDS 1.0 define.xml file.
Note: Steps 1 to 6 can be used with CDISC Define-XML 2.0. However, steps 7 to 9 have not been implemented in the SAS Clinical Standards Toolkit for Define-XML 2.0.
  1. Access a study that contains valid CDISC SDTM data and metadata. This is a study that contains domain data (AE, DM, CO, and so on) and the SAS Clinical Standards Toolkit metadata about that SDTM study, such as source_tables and source_columns. The SAS Clinical Standards Toolkit also includes XSL style sheets, XMLMap files, and any metadata that is provided by SAS during the SAS Clinical Standards Toolkit installation.
  2. Use the set of sample driver programs that are provided in the SAS Clinical Standards Toolkit to define the input and output files for each process task and to invoke the macros that support each standard-specific task. The driver programs are designed to run with the sample studies, but can be modified as needed. New custom drivers can be created and used.
  3. Submit the create_crtdds_fromsdtm.sas driver program to access the %CRTDDS_SDTMTODEFINE macro, and create the 39 data sets that comprise the SAS representation of the CRT-DDS model. These 39 output data sets are written to the sample study library directory/cdisc-crtdds-1.0–1.7/data directory.
  4. Validate the CRT-DDS data sets by submitting the validate_crtdds_data.sas driver program. This step is optional.
  5. Create the define.xml file by submitting the create_crtdds_define.sas driver program. This driver program generates the define.xml file from the 39 CRT-DDS data sets that were created in step 3. It calls the %CSTUTILXMLVALIDATE macro to validate the XML file structure. The define.xml file is written to the sample study library directory/cdisc-crtdds-1.0–1.7/sourcexml directory.
    At this point, a valid define.xml file has been created from the SAS representation of the CRT-DDS model. In the next steps, the SDTM data and metadata is re-created using the XML read process.
  6. Submit the create_sascrtdds_fromxml.sas driver program. This driver program reads the define.xml file created in step 5, and generates the SAS representation of the CRT-DDS model using the %CRTDDS_READ macro. The data sets created in this step should match the data sets created in step 3. These data sets are written to the sample study library directory/cdisc-crtdds-1.0–1.7/deriveddata directory. This driver program generates the source_tables and source_columns data sets in the sample study library directory/cdisc-crtdds-1.0–1.7/derivedmetadata directory. By specifying new target folder locations (deriveddata and derivedmetadata), the data sets can be validated against the data sets that were created or referenced in step 3.
  7. SDTM domain data sets are created based on a reachable set of SAS transport files that are specified in the define.xml file. Submit the create_sasdata_fromxpt.sas SDTM driver program. For SDTM 3.1.2, the program is in the sample study library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/programs directory. This driver program accesses the %SDTMUTIL_CREATESASDATAFROMXPT macro to generate the SDTM domain data sets from the SAS transport files. Creation of the SAS transport files is not performed by the SAS Clinical Standards Toolkit. These files would have been produced as a prerequisite to the generation of the define.xml file as a part of the Electronic Common Technical Document preparation process. The %SDTMUTIL_CREATESASDATAFROMXPT macro assumes that the SAS transport files are reachable from a folder relative to the location of the referenced define.xml file. In the create_sasdata_fromxpt.sas SDTM driver program, the XPT files are read from the sample study library directory/cdisc-crtdds-1.0–1.7/transport directory. The generated data sets are written to the sample study library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/derived/data directory. At this point, the SDTM domain data sets should contain the same information as the original domain data sets that were accessed at the beginning of this process. By specifying a new target folder location, the SDTM data sets can be validated against those referenced in steps 1 and 3.
  8. Source metadata that describes the SDTM domains and columns is derived using information contained in the CRT-DDS data sets derived in step 6. Submit the create_sourcemetadata.sas SDTM driver program. For SDTM 3.1.2, it is installed in the sample study library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/programs directory. In this exercise, this driver program calls the %SDTMUTIL_CREATESRCMETAFROMCRTDDS macro, which uses a library of SAS data sets that capture define.xml metadata (typically derived using the %CRTDDS_READ macro). The output of this step is a set of SDTM metadata in the source_tables, source_columns, and source_study data sets. These data sets are written to the sample study library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/derived/metadata directory. At this point, the SDTM metadata should contain the same information as the original metadata that was accessed at the beginning of this process. By specifying a new target folder location, the SDTM metadata data sets can be validated against those referenced in steps 1 and 3.
  9. SAS formats that support SDTM controlled terminology are derived using information contained in the CRT-DDS data sets that were derived in step 6. Submit the create_formatsfromcrtdds.sas SDTM driver program. For SDTM 3.1.2, this program is installed in the sample study library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/programs directory. The driver program accesses the %SDTMUTIL_CREATEFORMATSFROMCRTDDS macro and generates the controlled terminology SAS format catalog based on codelists specified in the define.xml file. The derived SAS format catalog is written to the sample study library directory/cdiscsdtm-3.1.3–1.7/sascstdemodata/derived/formats directory. These formats should match those formats that were referenced by the SDTM columns at the beginning of this process. By specifying a new target folder location, the SAS format catalog can be validated against the catalog referenced in steps 1 and 3.
Once the round-trip exercise is complete, data derived from the process should match the original data. There might be some metadata collected that does not match exactly (particularly any date and time fields that collect real-time information). Differences can be detected by submitting PROC COMPARE on any of the derived data and metadata data sets against the original data and metadata data sets.

Running Multiple Driver Programs

CAUTION:
When running multiple driver programs, be aware that the SAS Clinical Standards Toolkit uses autocall macro libraries to contain and reference standard-specific code libraries.
Once the autocall path is set and one or more macros have been used in an autocall macro library, deallocation or reallocation of the autocall file reference cannot occur unless the autocall path is reset to exclude the specific file reference.
This becomes a problem with repeated calls to %CSTUTIL_PROCESSSETUP or %CSTUTIL_ALLOCATESASREFERENCES in the same SAS session. You might receive SAS errors, such as this one, unless you submit some specific SAS code:
ERROR - At least one file associated with fileref SDTMAUTO is still in use. ERROR - Error in the FILENAME statement.
If you call %CSTUTIL_PROCESSSETUP or %CSTUTIL_ALLOCATESASREFERENCES more than once in the same SAS session, by default the SAS Clinical Standards Toolkit does not attempt to reallocate SAS librefs and filerefs. Records are written to the process results data set noting (for example):
SAS libref from SASref=refmeta sasreferences record not allocated
Generally, if you are resubmitting the same process code again without changing the &_cststandard or &_cststandardversion global macro variables and you do not have references to different data or metadata libraries, there are no consequences. However, if you are attempting to change the standard or standard version in the same SAS session or you are attempting to reference different studies, code libraries, or terminology libraries, you must use the following code between each code submission:
%let _cstReallocateSASRefs=1;
%include "&_cstGRoot/standards/cst-framework-1.7/programs/resetautocallpath.sas";
In the driver programs provided with the SAS Clinical Standards Toolkit, the previous code is commented so that it is not submitted during run time.