Writing XML Files

Overview

Support of CDISC XML-based standards, such as CDISC CRT-DDS (define.xml) and CDISC ODM, includes the ability to render these files in SAS data set format and the ability to create model-specific XML files from a SAS data set representation of those standards.
In the SAS Clinical Standards Toolkit, you can create a CDISC CRT-DDS 1.0 define.xml file that references a CDISC SDTM 3.1.1 or 3.1.2 study. You can also create a CDISC ODM 1.3.0 XML file.
The next section outlines the basic workflow for the creation of model-specific XML files.

Basic Workflow

Here is the basic workflow for writing XML files:
  1. Build the SAS representation of a given XML-based standard by referencing an existing set of data and metadata about a clinical study, or by creating data and metadata about a new clinical study in the standard-specific SAS format.
  2. (Optional) Validate the SAS representation of the XML-based standard (to include foreign key relationships, value conformance to a set of expected values, and so on).
  3. Create a standardized intermediate cubeXML file using the data and metadata contained in the SAS representation of the standard.
  4. (Build and) reference a set of valid XSL style sheets for each target data set (such as ItemDefs.xsl).
  5. Use the SAS DATA step component JavaObj to read the cubeXML file using the XSL style sheets to create the target standard-specific XML file.
  6. (Optional) Validate the structure and syntax of the XML file that was created.

Creating the CDISC CRT-DDS 1.0 define.xml File

There are four key macros that are provided with the SAS Clinical Standards Toolkit that support creation of the define.xml file. The four macros are listed in the order in which they are executed:
  1. The crtdds_sdtmtodefine macro creates the 39 tables for the SAS representation of the CRT-DDS files from SDTM metadata. This macro, using SDTM table and column metadata as its source, populates a subset of 12 CRT-DDS data sets.
    Note: This macro replaces the crtdds_sdtm311todefine10 macro, which will be deprecated in future releases of the SAS Clinical Standards Toolkit.
  2. The crtdds_validate macro submits a set of validation checks based on what is defined in the Validation Control data set to validate the referenced SAS representation of the CRT-DDS files.
  3. The crtdds_write macro creates the define.xml file from the SAS representation of the CRT-DDS files.
  4. The crtdds_xmlvalidate macro validates that the XML file is syntactically correct. This macro is important if you customize the define.xml file outside of the workflow. For example, if you edit the define.xml file to add links for annotated CRF pages, this macro validates the syntax.
These macros are called by driver programs that are responsible for properly setting up each SAS Clinical Standards Toolkit process to perform a specific SAS Clinical Standards Toolkit task. Three sample driver modules are provided with the SAS Clinical Standards Toolkit CDISC CRT-DDS standard related to the creation of the define.xml file.
Here is the purpose of each of these driver programs:
  • The create_crtdds_from_sdtm.sas driver program sets up the required metadata and SASReferences data set for the sample study. It runs the crtdds_sdtmtodefine macro. It creates the SAS representation of the CRT¬DDS define data sets from the sample study SDTM data sets.
  • The validate_crtdds_data.sas driver program validates the SAS representation of the CRT-DDS define data sets based on the selected CRT-DDS validation checks. This driver program can be run multiple times until data validation has been reconciled.
  • The create_crtdds_define.sas driver program creates the define.xml file. It runs the crtdds_write and crtdds_xmlvalidate macros. This driver program creates and validates the XML syntax for the define.xml file. It should be noted that the create_crtdds10_define311.sas driver program has been replaced by the create_crtdds_define.sas driver program.
These driver programs are examples that are provided with the SAS Clinical Standards Toolkit. You can use these driver programs or create your own. The names of these driver programs are not important. However, the content is important and demonstrates how the various SAS Clinical Standards Toolkit framework macros are used to generate the required metadata files.

Sample Driver Program: create_crtdds_from_sdtm.sas

Overview

The create_crtdds_from_sdtm.sas driver program sets up the required environment variables and library references to initiate the crtdds_sdtmtodefine macro. This macro extracts data from the SDTM 3.1.1 or 3.1.2 metadata files. (For more information about the source_tables and source_columns data sets, see Source Metadata.) Depending on the available source information, the macro attempts to convert the information into the 39 tables that represent the SAS interpretation of the CDISC CRT¬DDS 1.0 model. All 39 data sets are created, but only those data sets with the available data are populated. The other tables contain zero observations.
These parameters must be set before submitting the macro:
Parameters for the create_crtdds_from_sdtm.sas Macro
Parameter
Required
Description
_cstOutLib
Yes
Identifies the library reference (LIBNAME) where the tables are created.
_cstSourceTables
Yes
A data set that contains the SDTM metadata for the domains to be included in the CRT-DDS file.
_cstSourceColumns
Yes
A data set that contains the SDTM metadata for the domain columns to be included in the CRT-DDS file.
_cstSourceStudy
Yes
A data set that contains the SDTM metadata for the studies to be included in the CRT-DDS file.
Here is an example of a call to the crtdds_sdtmtodefine macro:
%crtdds_sdtmtodefine(
_cstOutLib=srcdata,
_cstSourceTables=sampdata.source_tables,
_cstSourceColumns=sampdata.source_columns,
_cstSourceStudy=sampdata.source_study
);
In the example, the crtdds_sdtmtodefine macro sets _cstOutLib to srcdata. All of the CRT-DDS-defined tables are written to the SAS Srcdata library. The _cstSourceTables parameter accesses the source_tables data set that exists in the Sampdata library (sampdata.source_tables). The _cstSourceColumns parameter accesses the source_columns data set that exists in the Sampdata library (sampdata.source_columns). The _cstSourceStudy parameter accesses the source_study data set that exists in the sampdata library (sampdata.source_study).
The create_crtdds_from_sdtm.sas driver program is provided with SAS, and it is ready to run on any of the SDTM sample studies. The driver program can be run interactively or in batch. To run the program interactively, start a SAS session, and load the driver program into the SAS editor.
The driver program is located in:
!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are three input file references and one output reference that are key to successful completion of the create_crtdds_from_sdtm.sas driver program. Key Components of the SASReferences Data Set lists these files and data sets, and they are discussed in separate sections. In the sample create_crtdds_from_sdtm.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitSDTM312/&_cstVersion/sample/cdisc-sdtm-3.1.2/sascstdemodata
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
sourcemetadata
sampdata
libref
&studyRootPath/metadata
source_tables.sas7bdat
Input
sourcemetadata
sampdata
libref
&studyRootPath/metadata
source_columns.sas7bdat
Input
sourcemetadata
sampdata
libref
&studyRootPath/metadata
source_study.sas7bdat
Output
sourcedata
srcdata
libref
&studyOutputPath/data

Process Inputs

The sourcemetadata type refers to two data sets that contain the SDTM domain metadata, source_tables and source_columns. Both data sets are stored in the same library. Because the sample create_crtdds_from_sdtm.sas driver program provided with the SAS Clinical Standards Toolkit references a source CDISC SDTM 3.1.2 study, the source_tables data set contains SDTM 3.1.2 metadata about each standard domain defined in the CDISC-SDTM 3.1.2 Implementation Guide and includes any customizations that you have added. The source_columns type contains similar metadata, but it is at the column level. This source metadata is read from this location:
!sasroot/../../SASClinicalStandardsToolkitSDTM312/1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata/metadata
This location is represented in the driver program by the Srcmeta library name.
A source study data set (source_study) is required by this macro. These variables are required in this data set:
Variables Required in the Source Study Data Set (source_study.sas)
Variable*
Required
Description
StudyName
Yes
Name of the study. This value is used to populate the srcdata.study.studyname column.
DefineDocumentName
Yes
Name of the define document being created. This value is used to populate the srcdata.definedocument.description and srcdata.definedocument.id columns.
SASref
Yes
Reference that ties the study name to the corresponding domains that are associated with this study in the source_tables and source_columns data sets.
ProtocolName
Yes
Name of the protocol for the study. This value is used to populate the srcdata.study.protocolname column.
StudyDescription
Yes
Description of the study. This value is used to populate the srcdata.study.studydescription column.
Note: You cannot use commas, semicolons, or quotation marks in the description.
*All variables are required to be non-blank.
Multiple studies can be referenced in the source study data set, as well as source_columns and source_tables, by using different SASref values to link them across the tables.

Process Outputs

The sourcedata type is the library where the metadata files are created. These metadata files are the data sets that constitute the SAS representation of the CDISC CRT-DDS 1.0 standard. The create_crtdds_from_sdtm.sas driver program creates 39 data sets. Most of these data sets have zero observations because there is no default SDTM metadata source. In the SAS Clinical Standards Toolkit sample study, these data sets are written to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/data directory. This location is represented in the driver program by the srcdata library name.

Process Results

When the driver program finishes running, the sdtmtodefine_results data set is created. This data set contains informational, warning, and any error messages that were generated by the submitted driver program.
Example of a Partial Results Data Set from CRT-DDS Sample Study
Example of the partial results data set from the CRT-DDS sample study

Sample Driver Program: create_crtdds_define.sas

Overview

The create_crtdds_define.sas driver program sets up the required environment variables and library references to initiate the crtdds_write macro. This macro reads the 39 data sets that comprise the SAS representation of the CDISC CRT-DDS 1.0 model, and converts that information to the required define.xml structure. If source metadata or data are missing, then empty elements and attributes are not created in the define.xml file. The inputs and outputs are specified in the SASRferences data set.
This table lists the optional parameters that can be set when submitting the macro.
Parameters for the crtdds_write.sas Macro
Parameter
Required
Description
_cstCreateDisplayStyleSheet
Optional
Specifies whether the macro creates a style sheet in the same directory as the output XML file. If the value is 1, then the macro looks in the provided SASReferences file for a record with a type of referencexml and a subtype of stylesheet, and then uses that file. If the value is 0, then the macro does not create the XSL, even if one is specified in the SASReferences file. The default setting is 1.
_cstOutputEncoding
Optional
XML encoding to use for the CRT-DDS file that is created. By default, UTF-8 is used.
_cstHeaderComment
Optional
A short comment added at the top of the CRT-DDS file. If no comment is provided, then a default comment is used. The default comment notes that the file was produced by the SAS Clinical Standards Toolkit.
_cstResultsOverrideDS
Optional
Designates [LIBNAME.]member as the name of the Results data set. If this parameter is omitted (default setting), then the Results data set specified by the &_cstResultsDS global macro variable is used.
_cstLogLevel
Optional
Specifies the level of error reporting. Valid values are Info, Warning, Error, and Fatal Error. The default setting is Info.
Here is an example of a call to the crtdds_write.sas macro:
%crtdds_write(_cstCreateDisplayStyleSheet=1,
                    _cstOutputEncoding=UTF-16,
                    _cstResultsOverrideDS=&_cstResultsDS);
In this example, a default style sheet is generated in the same directory as the XML output based on the information in the SASReferences data set. XML encoding is set to UTF-16, and process results are written to the default &_cstResultsDS data set.
Here is the call to the macro from the sample create_crtdds_define.sas driver program:
%crtdds_write(_cstCreateDisplayStyleSheet=1);
The call creates a display style sheet and uses default values for the parameters.
The create_crtdds_define.sas driver program is ready to run on any of the CDISC SDTM sample studies. The driver program can be run interactively or in batch.
The driver program is located in:
!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/programs
Multiple tasks can be executed in any SAS Clinical Standards Toolkit driver program. The create_crtdds_define.sas driver program calls both the crtdds_write macro to create the define.xml file, and the crtdds_xmlvalidate macro to validate the syntax of the generated define.xml file. For more information about the crtdds_xmlvalidate macro, see Validation of XML-Based Standards.

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and three output references that are key to successful completion of the create_crtdds_define.sas driver program. Key Components of the SASReferences Data Set lists these files and data sets, and they are discussed in separate sections. In the sample create_crtdds_define.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
control
libref
&workpath
sasreferences.sas7bdat
Input
sourcedata
srcdata
libref
&studyRootPath/data
Output
referencexml
xslt01
filename
&studyOutputPath/sourcexml
Output
results
results
LIBNAME
&studyOutputPath/results
write_results.sas7bdat
Output
externalxml
extxml
filename
&studyOutputPath/sourcexml
define.xml
Input
referencexml
odmmap
fileref
&studyRootPath/referencexml
define.map

Process Inputs

Process Inputs Use of the control library name that points to the path in the &workpath macro variable illustrates a technique of documenting the derivation of the SASReferences data set in the SAS Work library. The driver program initiates the macro variable &workpath with this SAS code:
%let workPath=%sysfunc(pathname(work));
The sourcedata type is the library that contains the 39 data sets that might have been populated by the create_crtdds10_from_sdtm311.sas driver program. These metadata files are the data sets that constitute the SAS representation of the CDISC CRT-DDS 1.0 standard. In the SAS Clinical Standards Toolkit sample study, these data sets are read from the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/ sample/cdisc-crtdds-1.0/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

The externalxml type refers to the define.xml file. This file is accessed in the driver program from the extxml filename statement, and is written to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/ sample/cdisc-crtdds-1.0/sourcexml directory.
The referencexml type can serve as either an input or output file reference. Because the path and filename are not provided, the crtdds_write macro interprets the _cstCreateDisplayStyleSheet=1 parameter to use the default style sheet that is provided by the SAS Clinical Standards Toolkit in the Global Library. Had a path and filename been provided, the referencexml type would serve as an output file reference for the crtdds_write macro to copy the default style sheet from the Global Library to the path and filename that were specified. The results type refers to the write_results data set that documents the create define process results. In the SAS Clinical Standards Toolkit CDISC CRT-DDS folder hierarchy, this information is written to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/results directory.

Process Results

Inclusion of the results record (row) in the SASReferences data set signals that the process results are to be copied to a write_results data set located in the specified SAS library.
Example of a Partial Results Data Set from the CRT-DDS Sample Study
Example of a partial Results data set from the CRT-DDS sample study

Creating the CDISC ODM 1.3.0 XML File

There are several key macros that are provided with the SAS Clinical Standards Toolkit that support creation of the ODM XML file. The macros are listed in the order in which they are executed:
  1. The odm_validate macro submits a set of validation checks based on what is defined in the Validation Control data set to validate the referenced SAS representation of each ODM XML file.
  2. The odm_write macro creates the ODM XML file from the SAS representation of the ODM files and validates that the XML file is syntactically correct. This macro is important if you customize the XML file outside of the workflow.
  3. The odm_xmlvalidate macro validates that the XML file is syntactically correct. This macro is important if you customize the ODM XML file outside of the workflow.
These macros are called by driver programs that are responsible for properly setting up each SAS Clinical Standards Toolkit process to perform a specific SAS Clinical Standards Toolkit task. Two sample driver modules are provided with the SAS Clinical Standards Toolkit CDISC ODM standard to support creation of XML files. Here is the purpose of each of these drivers:
  1. The validate_odm_data.sas driver program validates the SAS representation of the ODM data sets based on the selected ODM validation checks. This driver program can be run multiple times until data validation has been reconciled.
  2. The create_odmxml.sas driver program calls the odm_write macro to create the XML file. This driver program creates and validates the syntax for the XML file.
These driver programs are examples that are provided with the SAS Clinical Standards Toolkit. You can use these driver programs or create your own. The names of these driver programs are not important. However, the content is important and demonstrates how the various SAS Clinical Standards Toolkit framework macros are used to generate the required metadata files.

Sample Driver Program: create_odmxml.sas

Overview

The create_odmxml.sas driver program sets up the required environment variables and library references to initiate the odm_write macro. This macro reads the 66 data sets that comprise the default SAS representation of the CDISC ODM 1.3.0 model, and then converts that information to the required ODM XML structure. If source metadata or data are missing, then empty elements and attributes are not created in the ODM XML file. The inputs and outputs are specified in the SASRferences data set.
This table lists the optional parameters that can be set when submitting the macro.
Parameters for the odm_write.sas Macro
Parameter
Required
Description
_cstCreateDisplayStyleSheet
Optional
Specifies whether the macro should create a style sheet in the same directory as the output XML file. If the value is 1, then the macro looks in the provided SASReferences file for a record with a type and subtype of referencexml and stylesheet and uses that file. If the value is 0, then the macro does not create the XSL, even if one is specified in the SASReferences file. The default setting is 0.
_cstOutputEncoding
Optional
XML encoding to use for the ODM XML file that is created. By default, UTF-8 is used.
_cstHeaderComment
Optional
A short comment is added at the top of the ODM XML file. If no comment is provided, then a default comment is used. The default comment notes that the file was produced by the SAS Clinical Standards Toolkit.
_cstResultsOverrideDS
Optional
Provides the opportunity to designate [LIBNAME.]member as the name of the Results data set. If this parameter is omitted (default setting), then the Results data set specified by the &_cstResultsDS global macro variable is used.
_cstLogLevel
Optional
Specifies the level of error reporting. Valid values are Info, Warning, Error, and Fatal Error. The default setting is Info.
Here is an example of a call to the odm_write macro:
%odm_write(_cstOutputEncoding=UTF-16, _cstResultsOverrideDS=&_cstResultsDS);
In this example, no default style sheet is generated for the XML output, XML encoding is set to UTF-16, and process results are written to the default &_cstResultsDS data set.
This is the call to the macro from the sample create_odmxml.sas driver program, using default values for all parameters:
%odm_write();
The create_odmxml.sas driver program is ready to run on the sample CDISC ODM provided with the SAS Clinical Standards Toolkit.
The driver program is located in:
!sasroot/../../ SASClinicalStandardsToolkiODM130/1.4/sample/cdisc-odm-1.3.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, one input file reference and two output references are key to successful completion of the create_odmxml.sas driver program. Key Components of the SASReferences Data Set lists these files and data sets, and they are discussed in separate sections. In the sample create_odmxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkiODM130/&_cstVersion/sample/cdisc-odm-1.3.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkiODM130/&_cstVersion/sample/cdisc-odm-1.3.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
sourcedata
srcdata
libref
&studyRootPath/data
Output
results
results
libref
&studyOutputPath/results
write_results.sas7bdat
Output
externalxml
extxml
filename
&studyOutputPath/sourcexml
odm_sample_out.xml

Process Inputs

The sourcedata type is the library that contains the default 66 data sets that comprise the SAS representation of an ODM XML file. These data sets might have been populated by a previous odm_read task, or you might have processes in place that build these files from some set of source files. In the SAS Clinical Standards Toolkit sample data, these data sets are read from the !sasroot/../../SASClinicalStandardsToolkiODM130/1.4/sample/cdisc-odm-1.3.0/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

The externalxml type refers to the ODM XML file that is to be derived by the process. This file is accessed in the driver program using the extxml filename statement and is written to the !sasroot/../../SASClinicalStandardsToolkiODM130/1.4/sample/cdisc-odm-1.3.0/sourcexml directory.
Note: Unlike CDISC CRT-DDS, CDISC does not supply a default style sheet for ODM, nor is one provided as a part of the SAS Clinical Standards Toolkit. However, if you want to do so, the odm_write macro provides the _cstCreateDisplayStyleSheet parameter to make use of information that you can provide in the Metadata Type referencexml record of the SASReferences file.
The results type refers to the write_results data set that documents the create define process results. In the SAS Clinical Standards Toolkit CDISC CRT-DDS folder hierarchy, this information is written to this location:
!sasroot/../../SASClinicalStandardsToolkiODM130/1.4/sample/cdisc-odm-1.3.0/results

Process Results

Inclusion of the results record (row) in the SASReferences data set signals that the process results are to be copied to a write_results data set located in the specified SAS library.
Example of a Partial Results Data Set from the ODM Sample Data Hierarchy
Example of a partial Results data set from the ODM sample data hierarchy