Reading XML Files

Overview

Support of CDISC XML-based standards, such as CDISC Define-XML 2.0, CDISC CRT-DDS (define.xml), and CDISC ODM, includes the ability to read XML files into SAS data set format. In the SAS Clinical Standards Toolkit, you can read these types of files:
  • a CDISC CRT-DDS 1.0
  • a CDISC Define-XML 2.0 define.xml file (including Analysis Results Metadata 1.0)
  • a CDISC ODM 1.3.0 or CDISC ODM 1.3.1 XML file
  • the Controlled Terminology files as they are published by the NCI in ODM XML format

Basic Workflow

Here is the basic workflow for reading XML files:
  1. Determine the existence of a valid XML file.
  2. Use valid XSL style sheets for each target data set (such as ItemDefs.xsl).
  3. Use the SAS DATA step component JavaObj to create a standardized intermediate cubeXML file using the XSL style sheets.
  4. Read the standardized cubeXML file using the SAS XML LIBNAME engine and XMLMAP processing.
This basic workflow is used by all XML-based standards that are supported by the SAS Clinical Standards Toolkit.

Reading CDISC ODM XML Files: %ODM_READ Macro

Note: The process for reading ODM XML files is the same for all ODM versions that are supported by the SAS Clinical Standards Toolkit. The process is explained using ODM version 1.3.0.
To read an ODM XML file, a specialized macro named %ODM_READ is available in the ODM 1.3.0 standards macro folder. This folder is located here:
global standards library directory/standards/cdisc-odm-1.3.0-1.7/macros
This macro is referenced from the create_sasodm_fromxml.sas driver program (described more fully below).
File references and other metadata that are required by the macro are set as global macro variable values. Currently, these global macro variable values are set through the framework initialization properties and the CDISC ODM 1.3.0 initialization properties. Throughout the processing of the %ODM_READ macro, the Results data set contains all framework and ODM 1.3.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the %ODM_READ macro accesses the ODM XML file.
Here is a partial listing of a sample ODM XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ODM 
  xmlns="http://www.cdisc.org/ns/odm/v1.3"
  FileOID="Study1234" 
  ODMVersion="1.3" 
  FileType="Snapshot" 
  CreationDateTime="2004-07-28T12:34:13-06:00" 
  SourceSystem="ss00"
  AsOfDateTime="2004-07-29T12:34:13-06:00"
  Granularity="SingleSite"
  Description="Study to determine existence of ischemic stroke"
  Archival="Yes" 
  PriorFileOID="Study-4321"
  Originator="SAS Institute"
  SourceSystemVersion="Version 0.0.0"
  Id="DSSignature123">
  <Study OID="1234"
    <GlobalVariables>
      <StudyName>1234</StudyName>
      <StudyDescription>1234 Data Definition</StudyDescription>
      <ProtocolName>1234</ProtocolName> 
    </GlobalVariables>
      <MeasurementUnit OID="MeasurementUnits.OID.MMHG" Name="MMHG"
        <Symbol>
          <TranslatedText xml:lang="en">mmHG</TranslatedText>
          <TranslatedText xml:lang="fr-CA">mmHG</TranslatedText>
        </Symbol>
      </MeasurementUnit>
      <MeasurementUnit OID="MeasurementUnits.OID.YRS" Name="YEARS">
        <Symbol>
          <TranslatedText xml:lang="de">Jahren</TranslatedText>
          <TranslatedText xml:lang="en">Years of age</TranslatedText>
          <TranslatedText xml:lang="fr-CA">Ans</TranslatedText>
        </Symbol>
    </BasicDefinitions>
    <MetaDataVersion MetaDataVersion OID="CDISC.SDTM.3.1.0"
      Name="Study 1234, Data Definitions" 
      Description="Study 1234, Data Definitions">
      <Include StudyOID="1234" MetaDataVersionOID="MDV000">
      </Include>
      <Protocol>
        <Description>
After the %ODM_READ macro confirms that the ODM XML file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the ODM XML file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample ODM 1.3.0 study folder hierarchy under /referencexml as odm.map. The odm.map file is required to process the cubeXML file. If it does not exist, then the %ODM_READ macro attempts to create one using the ODM reference metadata.
Here is a partial listing of the odm.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="ODM130" version="1.2">

<TABLE name="ItemDefs">
   <TABLE-PATH syntax="XPath">/LIBRARY/ItemDefs</TABLE-PATH>
   <TABLE-DESCRIPTION>Item metadata</TABLE-DESCRIPTION>

   <COLUMN name="OID">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/OID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique identifier for this item</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="Name">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Name</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) name</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="DataType">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/DataType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) data type (text, integer, float)</DESCRIPTION>
     <LENGTH>18</LENGTH>
   </COLUMN>
   <COLUMN name="Length">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Length</PATH>
     <TYPE>numeric</TYPE>
     <DATATYPE>numeric</DATATYPE>
     <DESCRIPTION>Item (variable) length</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
When the cubeXML is processed, each of the 66 data sets (such as ItemDefs) that are included in the SAS representation of the CDISC ODM 1.3.0 model is derived.
Note: For more information about the %ODM_READ macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.
By default, if a null-parameter %ODM_READ macro call is made, source metadata files and SAS format catalogs for each language found in the clitemdecodetranslatedtext data set are created after the SAS data sets representing the ODM XML metadata and data content are derived. The target location of the derived metadata files is defined in the SASReferences data set. The target location of any derived SAS format catalogs is the SAS Work library unless defined in the SASReferences data set.

Sample Driver Program: create_sasodm_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM XML files, is guided by a sample driver program that is provided with the SAS Clinical Standards Toolkit. For reading ODM XML files, this program is create_sasodm_fromxml.sas.
The driver program is located here:
sample study library directory/cdisc-odm-1.3.0–1.7/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and five output data set references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sasodm_fromxml.sas Driver Program lists these files and data sets, and they are discussed in separate sections. In the sample create_sasodm_fromxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.-&_cstVersion
&studyOutputPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.-&_cstVersion
Key Components of the SASReferences Data Set for the create_sasodm_fromxml.sas Driver Program
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
odmxml
fileref
&studyRootPath/sourcexml
odm_sample.xml
referencexml
odmmap
fileref
&studyRootPath/referencexml
odm.map
Output
sourcedata
srcdata
libref
&studyOutputPath/derived/data
*.*
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_columns.sas7bdat
targetdata
trgdata
libref
&studyOutputPath/derived/formats
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

The externalxml type refers to the ODM XML file to read. The filename reference odmxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the ODM XML file.
The referencexml type refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename reference odmmap is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file are not specified, a temporary map file is created as part of the odm_read processing.

Process Outputs

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and error messages that were generated by the driver program.
The following display shows an example of the contents of a Results data set that was created while reading the sample ODM XML file that was provided with the SAS Clinical Standards Toolkit:
Example of a Partial Results Data Set Created by the create_sasodm_fromxml.sas Driver Program
Example of a partial Results data set that was created by the create_sasodm_fromxml.sas driver program
The %ODM_READ macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for each of the SAS data sets that is derived from the ODM XML file.
Example of Partial Source_Tables Data Set Derived from the %ODM_READ Macro
Example of the partial source_tables data set that was derived from the %ODM_READ macro
Example of Partial Source_Columns Data Set Derived from the %ODM_READ Macro
Example of the partial source_columns data set that was derived from the %ODM_READ macro
The Srcdata library contains the SAS data sets that represent the ODM file metadata and content. By default, the %ODM_READ macro creates 66 unique data sets in the SAS Clinical Standards Toolkit for ODM 1.3.0. Some of these data sets might be empty if no associated content was derived from the ODM XML file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived from the %ODM_READ Macro
Example of partial Srcdata library that was derived from the %ODM_READ macro

Extracting Clinical Data and Reference Data from the SAS Representation of an ODM XML File: %ODM_EXTRACTDOMAINDATA Macro

As the primary interchange format for CDISC, ODM XML is a common format for electronic data capture (EDC) data management views of clinical data. This format often does not closely approximate submission (SDTM) and analysis (ADaM) data structures unless the EDC views have been built using the CDISC CDASH standard. From a SAS perspective, you might want to extract clinical data from an ODM XML file to serve as source data for transformations that derive SDTM domain data sets.
The %ODM_EXTRACTDOMAINDATA macro supports extracting clinical data or reference data from the SAS data sets that were created by the %ODM_READ macro.
The %ODM_EXTRACTDOMAINDATA macro makes the following assumptions:
  • An ODM XML file is available that contains sufficient metadata and content for extractable clinical data and reference data.
  • A full SAS representation of an ODM XML file is available (for example, the %ODM_READ macro has been run against the XML file).
  • The SAS representation of an ODM XML file contains both metadata and data.
    By default, the driver assumes all source data files reside in the sample derived folder or the data folder that is typically populated by running the %ODM_READ macro. However, the source data files and the source metadata files can be in different folders.
  • Any codelists defined in the ODM XML file and associated with extracted data set columns are available as part of the output of the %ODM_READ macro.
ODM integer and float data types are converted to SAS numeric data. All other ODM data types are converted to SAS character data. If an integer or float data value cannot be converted, a warning appears in the SAS log and Results data set.
Here is a partial listing of the metadata in a sample ODM XML file:
<ItemGroupDef OID="ItemGroupDefs.OID.AE" Repeating="Yes" 
    SASDatasetName="AE" Name="Adverse Events" Domain="AE" 
    Comment="Some adverse events from this trial">
    <ItemRef ItemOID="ID.TAREA"    OrderNumber="1"  Mandatory="No" />
    <ItemRef ItemOID="ID.PNO"      OrderNumber="2"  Mandatory="No" />
    <ItemRef ItemOID="ID.SCTRY"    OrderNumber="3"  Mandatory="No" />
    <ItemRef ItemOID="ID.F_STATUS" OrderNumber="4"  Mandatory="No" />
    <ItemRef ItemOID="ID.LINE_NO"  OrderNumber="5"  Mandatory="No" />
    <ItemRef ItemOID="ID.AETERM"   OrderNumber="6"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTMON"  OrderNumber="7"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTDAY"  OrderNumber="8"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTYR"   OrderNumber="9"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTDT"   OrderNumber="10" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENMON"  OrderNumber="11" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENDAY"  OrderNumber="12" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENYR"   OrderNumber="13" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENDT"   OrderNumber="14" Mandatory="No" />
    <ItemRef ItemOID="ID.AESEV"    OrderNumber="15" Mandatory="No" />
    <ItemRef ItemOID="ID.AEREL"    OrderNumber="16" Mandatory="No" />
    <ItemRef ItemOID="ID.AEOUT"    OrderNumber="17" Mandatory="No" />
    <ItemRef ItemOID="ID.AEACTTRT" OrderNumber="18" Mandatory="No" />
    <ItemRef ItemOID="ID.AECONTRT" OrderNumber="19" Mandatory="No" />
</ItemGroupDef>
...
<ItemDef OID="ID.AESTDT" SASFieldName="AESTDT"
    Name="Derived Start Date" DataType="date"/>
<ItemDef OID="ID.AEENMON" SASFieldName="AEENMON"
    Name="Stop Month - Enter Two Digits 01-12" DataType="integer" Length="2" />
<ItemDef OID="ID.AEENDAY" SASFieldName="AEENDAY"
    Name="Stop Day - Enter Two Digits 01-31" DataType="integer" Length="2" />
<ItemDef OID="ID.AEENYR" SASFieldName="AEENYR"
    Name="Stop Year - Enter Four Digit Year" DataType="integer" Length="4" />
<ItemDef OID="ID.AEENDT" SASFieldName="AEENDT"
    Name="Derived Stop Date" DataType="date"/>
<ItemDef OID="ID.AESEV" SASFieldName="AESEV"
    Name="Severity” DataType="text" Length="1">
<CodeListRef CodeListOID="CL.$AESEV" />
</ItemDef>
<ItemDef OID="ID.AEREL" SASFieldName="AEREL"
      Name="Relationship to study drug" DataType="text" Length="1">
       <CodeListRef CodeListOID="CL.$AEREL" />
    </ItemDef>
Here is a partial listing of the data in the same sample ODM XML file:
<ClinicalData StudyOID="Study.OID" MetaDataVersionOID="MetaDataVersion.OID.1">
<SubjectData SubjectKey="S001P011" TransactionType="Insert">
    <StudyEventData StudyEventOID="StudyEventDefs.OID.6.AdverseEvent"
          StudyEventRepeatKey="1">
        <FormData FormOID="FormDefs.OID.AE" FormRepeatKey="1">
        <ItemGroupData ItemGroupOID="ItemGroupDefs.OID.AE"
              ItemGroupRepeatKey="1">
            <ItemData ItemOID="ID.TAREA" Value="ONC" />
            <ItemData ItemOID="ID.PNO" Value="143-02" />
            <ItemData ItemOID="ID.SCTRY" Value="USA" />
            <ItemData ItemOID="ID.F_STATUS" Value="V" />
            <ItemData ItemOID="ID.LINE_NO" Value="1" />
            <ItemData ItemOID="ID.AETERM" Value="HEADACHE" />
            <ItemData ItemOID="ID.AESTMON" Value="06" />
            <ItemData ItemOID="ID.AESTDAY" Value="10" />
            <ItemData ItemOID="ID.AESTYR" Value="1999" />
            <ItemData ItemOID="ID.AESTDT" Value="1999-06-10" />
            <ItemData ItemOID="ID.AEENMON" Value="06" />
            <ItemData ItemOID="ID.AEENDAY" Value="14" />
            <ItemData ItemOID="ID.AEENYR" Value="1999" />
            <ItemData ItemOID="ID.AEENDT" Value="1999-06-14" />
            <ItemData ItemOID="ID.AESEV" Value="1" />
            <ItemData ItemOID="ID.AEREL" Value="0" />
            <ItemData ItemOID="ID.AEOUT" Value="1" />
            <ItemData ItemOID="ID.AEACTTRT" Value="0" />
            <ItemData ItemOID="ID.AECONTRT" Value="1" />
        </ItemGroupData>
The %ODM_EXTRACTDOMAINDATA macro creates the data set shown in AE SAS Data Set (Unformatted) Created by the %ODM_EXTRACTDOMAINDATA Macroand AE SAS Data Set (Formatted) Created by the %ODM_EXTRACTDOMAINDATA Macro. The first 12 columns in this data set are the data set keys. The macro parameter _cstODMMinimumKeyset determines whether these keys are part of the extracted data set.
AE SAS Data Set (Unformatted) Created by the %ODM_EXTRACTDOMAINDATA Macro
AE SAS Data Set (Unformatted) Created by the %ODM_EXTRACTDOMAINDATA Macro
AE SAS Data Set (Formatted) Created by the %ODM_EXTRACTDOMAINDATA Macro
AE SAS Data Set (Formatted) Created by the %ODM_EXTRACTDOMAINDATA Macro
The %ODM_EXTRACTDOMAINDATA macro has this signature:
%macro odm_extractdomaindata(
  _cstSourceMetadata=,
  _cstSourceData=,
  _cstIsReferenceData=No,
  _cstSelectAttribute=Name,
  _cstSelectAttributeValue=,
  _cstLang=en,
  _cstMaxLabelLength=256,
  _cstAttachFormats=Yes,
  _cstODMMinimumKeyset=No,
  _cstOutputLibrary=,
  _cstOutputDS=
  );

Here are the parameters:
  • _cstSourceMetadata and _cstSourceData contain the SAS libref for the SAS ODM metadata representation data.
    If this is not specified, the macro looks for type=sourcedata in SASReferences. If this is not provided, the data set source is assumed to be in the SAS Work library.
  • _cstIsReferenceData indicates whether the data to extract is reference data or clinical data. Examples of reference data are laboratory reference ranges or trial design data.
  • _cstSelectAttribute contains the ItemGroup attribute that identifies which ItemGroup to extract. Valid values are OID, Name, SASDatasetName, and Domain.
  • _cstSelectAttributeValue contains the value of the attribute defined by _cstSelectAttribute that identifies the ItemGroup to extract.
  • _cstLang specifies a language identifier for the language tag attribute (xml:lang) in the ODM TranslatedText elements.
  • _cstMaxLabelLength determines the maximum value of labels to be created.
    If this is not provided, 256 is assumed. Formats are attached to the data set variables in case the parameter _cstAttachFormats has a value of ‘Yes’.
  • _cstODMMinimumKeyset determines the creation of data set keys. If this is not provided, ‘No’ is assumed.
  • _cstOutputLibrary defines the SAS library where the extracted data sets are written.
    If this is not specified, the macro looks for type=targetdata in SASReferences. If this is not provided, the data sets are written to the SAS Work library.
  • _cstOutputDS contains the name of the extracted data set.
    If this is an invalid SAS data set name, an error is generated. If the data set name is not provided, the macro looks for type=targetdata in SASReferences.
Two sample driver programs for ODM 1.3.0 are provided with the SAS Clinical Standards Toolkit to demonstrate the use of the %ODM_EXTRACTDOMAINDATA macro:
sample study library directory/cdisc-odm-1.3.0-1.7/programs/extract_domaindata_all.sas
sample study library directory/cdisc-odm-1.3.0-1.7/programs/extract_domaindata.sas
Two sample driver programs for ODM 1.3.1 are provided with the SAS Clinical Standards Toolkit to demonstrate the use of the %ODM_EXTRACTDOMAINDATA macro:
sample study library directory/cdisc-odm-1.3.1-1.7/programs/extract_domaindata_all.sas
sample study library directory/cdisc-odm-1.3.1-1.7/programs/extract_domaindata.sas
The extract_domaindata_all.sas sample driver programs demonstrate how all data sets can be extracted at once. The following shows a code fragment:
filename incCode CATALOG "work._cstCode.domains.source" lrecl=255;

data _null_;
 set srcdata.itemgroupdefs(keep=OID Name IsReferenceData SASDatasetName Domain);
  file incCode;
  length macrocall $400 _cstOutputName $100;

  _cstOutputName=SASDatasetName;
  * If we have to use the Name, Only use letters and digits;
  if missing(_cstOutputName) then _cstOutputName=cats(compress(Name, 'adk'));
  * If first character a digit, prepend an underscore;
  if anydigit(_cstOutputName)=1 then _cstOutputName=cats('_', _cstOutputName);
  * Cut long names;
  if length(_cstOutputName) > 32 then _cstOutputName=substr(_cstOutputName, 1, 32);

  macrocall=cats('%odm_extractdomaindata(_cstSelectAttribute=OID',
                                      ', _cstSelectAttributeValue=', OID,
                                      ', _cstIsReferenceData=', IsReferenceData,
                                      ', _cstMaxLabelLength=256',
                                      ', _cstAttachFormats=Yes',
                                      ', _cstODMMinimumKeyset=No',
                                      ', _cstLang=en',
                                      ', _cstOutputDS=', _cstOutputName, ');');
  put macrocall;
run;

%include incCode;
filename incCode clear;

Reading CDISC ODM Controlled Terminology XML Files: %CT_READ Macro

To read an ODM controlled terminology XML file as published quarterly by NCI, a specialized macro named %CT_READ is available in the CDISC controlled terminology 1.0 standards macros folder. This folder is located here:
global standards library directory/standards/cdisc-ct-1.0-1.7/macros
This macro is referenced from the create_sasct_fromxml.sas driver program. For more information, see Sample Driver Program: create_sasct_fromxml.sas .
File references and other metadata that are required by the macro are set as global macro variable values. These global macro variable values are set through the framework initialization properties and the CDISC controlled terminology 1.0 initialization properties. Throughout the processing of the %CT_READ macro, the Results data set contains all framework-specific messages and CDISC controlled terminology 1.0-specific messages that were generated during run time.
Based on file references defined in the SASReferences data set, the %CT_READ macro accesses the ODM controlled terminology XML file.
The following display shows a partial listing of a sample ODM controlled terminology XML file:
Partial Listing of a Sample ODM Controlled Terminology XML File
Partial Listing of a Sample ODM Controlled Terminology XML File
After the %CT_READ macro confirms that the ODM controlled terminology XML file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the ODM controlled terminology XML file into a cubeXML file through transformations using XSL files and processes.
The cubeXML file is created in the SAS Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number.
The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMap processing. A default XMLMap file is stored in the sample CDISC controlled terminology 1.0 study folder hierarchy (referencexml/odm.map). An odm.map file is required to process the cubeXML file. If it does not exist, the %CT_READ macro attempts to create one using the CDISC controlled terminology reference metadata.
Here is a partial listing of the odm.map file.
<?xml version="1.0" encoding="UTF-8"?>
<SXLEMAP name="CT100" version="1.2">

<TABLE name="CodeLists">
   <TABLE-PATH syntax="XPath">/LIBRARY/CodeLists</TABLE-PATH>
   <TABLE-DESCRIPTION>Codelist metadata</TABLE-DESCRIPTION>

   <COLUMN name="OID">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/OID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique identifier for this codelist</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="Name">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/Name</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>CodeList name</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="DataType">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/DataType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>CodeList item value data type (integer | float | text | string)</DESCRIPTION>
     <LENGTH>7</LENGTH>
   </COLUMN>
   <COLUMN name="SASFormatName">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/SASFormatName</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>SAS format name</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
   <COLUMN name="ExtCodeID">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/ExtCodeID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique numeric code randomly generated by NCI Thesaurus (NCIt)</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="CodeListExtensible">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/CodeListExtensible</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Defines if controlled terms may be added to the codelist (Yes | No)</DESCRIPTION>
     <LENGTH>3</LENGTH>
   </COLUMN>
   <COLUMN name="CDISCSubmissionValue">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/CDISCSubmissionValue</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Specific value expected for submissions</DESCRIPTION>
     <LENGTH>512</LENGTH>
   </COLUMN>
When the cubeXML file is processed, each of the 15 data sets (such as CodeLists) that are included in the SAS representation of the CDISC controlled terminology model is derived. One input parameter can be specified in the call to the %CT_READ macro. The parameter offers the option to create source metadata files.
Note: For more information about the %CT_READ macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.
By default, if a %CT_READ macro call is made with null parameters, source metadata is derived. The target location of the derived metadata files is defined in the SASReferences data set.

Sample Driver Program: create_sasct_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM controlled terminology XML files, is guided by a sample driver program that is provided with the SAS Clinical Standards Toolkit. For reading ODM controlled terminology XML files, this driver program is create_sasct_fromxml.sas.
This driver program is located here:
sample study library directory/cdisc-ct-1.0-1.7/programs

The SASReferences Data Set

As part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. The SASReferences data set references the input files that are needed (such as the ODM controlled terminology XML file), the librefs and filenames to use, and the names and locations of the data sets to create. The SASReferences data set can be modified to point to study-specific files.
For more information, see SASReferences File.
In the SASReferences data set, there are two input file references and five output data set references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sasct_fromxml.sas Driver Program lists these files and data sets. In the sample create_sasct_fromxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-ct-1.0-1.7
&studyOutputPath=sample study library directory/cdisc-ct-1.0-1.7
Key Components of the SASReferences Data Set for the create_sasct_fromxml.sas Driver Program
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml/sdtm/201212
sdtm_terminology.xml
referencexml
ctmap
fileref
&studyRootPath/referencexml
ct-1.0.0.map
Output
sourcedata
srcdata
libref
&studyOutputPath/data/sdtm/201212
*.*
results
results
libref
&studyOutputPath/results
read_results_sdtm_2012.sas7bdat

Process Inputs

The externalxml type refers to the ODM controlled terminology XML file to read. The filename reference crtxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the ODM controlled terminology XML file.
The referencexml type refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename reference ctmap is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file are not specified, a temporary map file is created as part of the %CT_READ macro processing.

Process Outputs

When the driver program finishes running, the read_results_sdtm_201212 data set is created in the Results library. This data set contains informational messages, warnings, and error messages that were generated by the driver program.
The following display shows an example of the contents of a Results data set that was created while reading the sample ODM controlled terminology XML file as released by NCI that was provided with the SAS Clinical Standards Toolkit:
Example of a Partial Results Data Set Created by the create_sasct_fromxml.sas Driver Program
Example of a Partial Results Data Set Created by the create_sasct_fromxml.sas Driver Program
The Srcdata library contains the SAS data sets that represent the ODM controlled terminology XML file metadata and content. By default, the %CT_READ macro creates 15 unique data sets in the SAS Clinical Standards Toolkit. Some of these data sets might be empty if no associated content was derived from the ODM controlled terminology XML file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived from the %CT_READ Macro
Example of Partial Srcdata Library Derived from the %CT_READ Macro

Creating a Format Catalog and a Controlled Terminology Data Set from the SAS Representation of a CDISC ODM Controlled Terminology XML File: %CT_CREATEFORMATS Macro

To use the NCI CDISC controlled terminology in a SAS Clinical Standards Toolkit process, the SAS data sets created by the %CT_READ macro must be converted to a SAS format catalog. To enable SAS Clinical Data Integration to import controlled terminology, the SAS data set representation created by the %CT_READ macro must be combined into one SAS data set.
The following display shows an example of controlled terminology in ODM XML (the Action Taken with Study Treatment codelist):
Example of Controlled Terminology in ODM XML
Example of Controlled Terminology in ODM XML
The following display shows the data set created by the %CT_CREATEFORMATS macro:
Partial cterms SAS Data Set Created by the %CT_CREATEFORMATS Macro
Partial cterms SAS Data Set Created by the %CT_CREATEFORMATS Macro
The following display shows that the %CT_CREATEFORMATS macro uses the data set to create the $ACN SAS format:
$ACN SAS Format Created by the %CT_CREATEFORMATS Macro
$ACN SAS Format Created by the %CT_CREATEFORMATS Macro
The %CT_CREATEFORMATS macro has this signature:
%macro ct_createformats(
  _cstLang=en,                /* Language tag in TranslatedText to use      */
  _cstCreateCatalog=1,        /* Create format catalog                      */
  _cstKillCatFirst=0,         /* Empty catalog first                        */
  _cstUseExpression=,         /* Expression to create the SAS format name   */
  _cstAppendChar=F,           /* Letter to append in case SAS format name
                                 ends with digit                            */
  _cstDeleteEmptyColumns=1,   /* Delete columns in output data set that are   
                                 completely missing                         */
  _cstTrimCharacterData=1     /* Truncate character data in output data set
                                 to the minimum value needed.               */
  );
The %CT_CREATEFORMATS macro attempts to map the CodeList/nciodm:CDISCSubmissionValue in the codelist variable to the fmtname variable. The fmtname variable value must contain a valid SAS format name. The %CT_CREATEFORMATS macro uses the following steps to create a valid SAS format name:
  1. Apply a user-defined expression to create the fmtname variable.
  2. If the value of fmtname is empty, use the CodeList/SASFormatName attribute (typically empty in NCI EVS ODM XML files).
  3. If the value of fmtname is empty, use the CodeList/nciodm:CDISCSubmissionValue value in the codelist variable.
  4. If the value of fmtname ends with a digit, add the character specified by the _cstAppendChar macro parameter (default=F).
After these steps, the value of the fmtname variable is validated against the following regular expression:
'm/^(?=.{1,32}$)([\$a-zA-Z_][a-zA-Z0-9_]*[a-zA-Z_])$/'
If the value of the fmtname variable fails validation, the fmtname variable value does not contain a valid SAS format name. The value is set to missing. Then, the codelist is not used to create a SAS format.
Two sample driver programs are provided with the SAS Clinical Standards Toolkit to demonstrate the use of the %CT_CREATEFORMATS macro:
sample study library directory/cdisc-ct-1.0-1.7/programs/create_ctformats.sas
sample study library directory/cdisc-ct-1.0-1.7/programs/create_ctformats_qs.sas
Both of these sample driver programs demonstrate how CDISCSubmissionValue can be mapped to a valid SAS format name.

Reading CDISC CRT-DDS 1.0 or Define-XML 2.0 define.xml Files: %CRTDDS_READ and %DEFINE_READ Macros

The process for reading CDISC CRT-DDS 1.0 and CDISC Define-XML 2.0 define.xml files is similar to reading CDISC ODM XML files.
Note: This section demonstrates reading CDISC CRT-DDS 1.0 define.xml files as an example. The CDISC Define-XML 2.0 process is similar, but uses the define_read macro instead of the crtdds_read macro.
The SAS Clinical Standards Toolkit supports reading a define.xml file and translating the file metadata into a SAS representation of the CDISC CRT-DDS model. To read the define.xml file, a specialized macro named %CRTDDS_READ is available in the CRT-DDS 1.0 standards macros folder. This folder is located in global standards library directory/standards/cdisc-crtdds-1.0-1.7/macros.
This macro is referenced from the create_sascrtdds_fromxml.sas driver program. There are no input parameters in the call to the %CRTDDS_READ macro.
File references and other metadata that are required by the macro are set as global macro variables. These global macro variables are set through the framework initialization properties and the CDISC CRT-DDS 1.0 initialization properties. Throughout the processing of the %CRTDDS_READ macro, the Results data set contains all framework-specific messages and CRT-DDS 1.0-specific messages that were generated during run time.
Based on file references defined in the SASReferences data set, the %CRTDDS_READ macro accesses the define.xml file.
Here is a partial listing of a sample define.xml file.
<ODM xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:def="http://www.cdisc.org/ns/def/v1.0" 
   xmlns="http://www.cdisc.org/ns/odm/v1.2" FileOID="1" 
   CreationDateTime="2011-07-13T17:15:43-04:00"
   AsOfDateTime="2011-07-13T17:12:42" 
   Description="define1" FileType="Snapshot" Id="define1"
   ODMVersion="1.0">
<Study OID="1">
  <GlobalVariables>
    <StudyName>study1</StudyName>
    <StudyDescription>first study</StudyDescription>
    <ProtocolName>Protocol abc</ProtocolName>
  </GlobalVariables>
  <MetaDataVersion OID="1" Name="CDISC-SDTM 3.1.2" 
                   Description="CDISC-SDTM 3.1.2"
                   def:DefineVersion="1.0.0" 
                   def:StandardName="CDISC SDTM"
                   def:StandardVersion="3.1.2">
   <ItemGroupDef 
     OID="AE1" Name="AE" Repeating="Yes"
     IsReferenceData="No" 
     SASDatasetName="AE" Domain="AE"
     Purpose="Tabulation" def:Label="Adverse Events" 
     def:Class="Events" 
     def:Structure="One record per adverse event per subject"
     def:DomainKeys="STUDYID USUBJID AEDECOD AESTDTC" 
     def:ArchiveLocationID="AE1">
      <ItemRef ItemOID="COL1" Mandatory="Yes"
        OrderNumber="1" KeySequence="1" Role="Identifier"/>
      <ItemRef ItemOID="COL2" Mandatory="Yes"
        OrderNumber="2" Role="Identifier"/>
      <ItemRef ItemOID="COL3" Mandatory="Yes"
        OrderNumber="3" KeySequence="2" Role="Identifier"/>
      <ItemRef ItemOID="COL4" Mandatory="Yes"
        OrderNumber="4" Role="Identifier"/>
      <ItemRef ItemOID="COL5" Mandatory="No"
        OrderNumber="5" Role="Identifier"/>
      <ItemRef ItemOID="COL6" Mandatory="No"
        OrderNumber="6" Role="Identifier"/>
      <ItemRef ItemOID="COL7" Mandatory="No"
        OrderNumber="7" Role="Identifier"/>
After the %CRTDDS_READ macro confirms that the define.xml file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the define.xml file into a cubeXML file through transformations using XSL files and processes.
The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml , where nnnn is a randomly generated number.
The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMap processing. A default XMLMap file is stored in the sample CRT-DDS 1.0 study folder hierarchy (referencexml/define.map). The define.map file is required to process the cubeXML file. If it does not exist, the crtdds_read attempts to create one using the CRT-DDS reference metadata.
Here is a partial listing of the define.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP version="1.2">

<TABLE name="AnnotatedCRFs">
   <TABLE-PATH syntax="XPath">/LIBRARY/AnnotatedCRFs</TABLE-PATH>
   <TABLE-DESCRIPTION>Annotated CRF metadata</TABLE-DESCRIPTION>

   <COLUMN name="DocumentRef">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/DocumentRef</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The referenced Annotated CRF document</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="leafID">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/leafID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The unique ID of the referenced Annotated CRF</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="FK_MetaDataVersion">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/FK_MetaDataVersion</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Foreign key: MetaDataVersion.OID</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>

</TABLE>
Processing of the cubeXML file results in the derivation of the data sets (such as ItemDefs) currently included in the SAS representation of the CDISC CRT-DDS model.
The final step in the %CRTDDS_READ macro is the derivation of table and column metadata that describe the data sets in the SAS representation of the define.xml file. At this point, the %CRTDDS_READ macro is ready to create the source_tables and source_columns data sets. The tables in the source_tables data set are created and copied to the output library as defined in the SASReferences data set.

Sample Driver Program: create_sascrtdds_fromxml.sas and create_sasdefine_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC CRT-DDS 1.0 or CDISC Define-XML 2.0 XML files, is guided by a sample driver program that is provided with the SAS Clinical Standards Toolkit.
Note: CDISC CRT-DDS 1.0 is discussed in this section. The process is similar for CDISC Define-XML 2.0.
The create_sascrtdds_fromxml.sas driver program is used to read define.xml files.
The driver program is located here:
sample study library directory/cdisc-crtdds-1.0–1.7/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and four output data set references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sascrtdds_fromxml.sas Driver Program lists these files and data sets, and they are discussed in separate sections. In the sample create_sascrtdds_fromxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
&studyOutputPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
Key Components of the SASReferences Data Set for the create_sascrtdds_fromxml.sas Driver Program
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml
define.xml
referencexml
crtmap
fileref
&studyRootPath/referencexml
define.map
Output
sourcedata
srcdata
libref
&studyOutputPath/deriveddata
*.*
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_columns.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_study.sas7bdat
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

The externalxml type refers to the define.xml file to read. The filename reference crtxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the define.xml file.
The referencexml type refers to the SAS map file that is used to generate the SAS data sets that represent the define.xml file metadata and content. The filename reference crtmap is defined in the SASReferences data set. This filename is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file are not specified, a temporary map file is created as part of the crtdds_read processing.

Process Outputs

The sourcedata type is the library where the metadata files are created. These metadata files are the data sets that comprise the CRT-DDS information.
The sourcemetadata type refers to two data sets that are created from the cubeXML file, source_tables, and source_columns. Both data sets are stored in the same library. The source_tables data set contains metadata about each table that is derived from the CRT-DDS macro. The source_columns data set contains similar metadata but it is at the column level. Both of the data sets are written to the Srcmeta library. The sourcemetadata type refers to a data set source_study. The source_study data set is created in the Srcmeta library and contains study metadata.
The results type refers to the Results data set that contains information from running the CRT-DDS macro. This information is written to the read_results data set in the Results library.

Process Results

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and error messages that were generated by the driver program.
The following display shows an example of the contents of a Results data set in the CRT-DDS sample study:
Example of a Partial Results Data Set Created by the create_sascrtdds_fromxml.sas Driver Program
Example of a partial Results data set created by the create_sascrtdds_fromxml.sas driver program
The %CRTDDS_READ macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for the SAS representation of CRT-DDS that is derived from the define.xml file. The Srcmeta library corresponds to the location specified in SASReferences (&studyOutputPath/derivedmetadata).
Example of Partial Source_Tables Data Set Derived from the %CRTDDS_READ Macro
Example of the partial source_tables data set derived from the %CRTDDS_READ macro
Example of Partial Source_Columns Data Set Derived from the %CRTDDS_READ Macro
Display of the partial source_columns data set derived from the %CRTDDS_READ macro
The Srcdata library contains the driver program-generated tables that comprise the SAS representation of the CRT-DDS model. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library. The Srcdata library corresponds to the location specified in SASReferences (&studyOutputPath/deriveddata).
Example of Partial Srcdata Library Derived from the %CRTDDS_READ Macro
Example of partial Srcdata library derived from the %CRTDDS_READ macro
When running the driver programs against non-sample data, you must populate the SASReferences data set in the driver program with the proper values. For an explanation of the SASReferences data set, see SASReferences File.