Reading XML Files

Overview

Support of CDISC XML-based standards, such as CDISC CRT-DDS (define.xml) and CDISC ODM, includes the ability to read XML files into SAS data set format. In the SAS Clinical Standards Toolkit, you can read these types of files:
  • a CDISC CRT-DDS 1.0 define.xml file that references a CDISC SDTM study (version 3.1.1, 3.1.2, or 3.1.3) or an ADaM 2.1 study
  • a CDISC ODM 1.3.0 or CDISC ODM 1.3.1 XML file
  • the controlled terminology files as they are published by the NCI in ODM XML format

Basic Workflow

Here is the basic workflow for reading XML files:
  1. Determine the existence of a valid XML file.
  2. Use valid XSL style sheets for each target data set (such as ItemDefs.xsl).
  3. Use the SAS DATA step component JavaObj to create a standardized intermediate cubeXML file using the XSL style sheets.
  4. Read the standardized cubeXML file using the SAS XML LIBNAME engine and XMLMAP processing.
This basic workflow is used by all XML-based standards that are supported by the SAS Clinical Standards Toolkit.

Reading CDISC ODM XML Files: odm_read Macro

Note: The process for reading ODM XML files is the same for all ODM versions that are supported by the SAS Clinical Standards Toolkit. The process is explained using ODM version 1.3.0.
In order to read an ODM XML file, a specialized macro named odm_read is available in the ODM 1.3.0 standards macro folder. This folder is located here:
global standards library directory/standards/cdisc-odm-1.3.0-1.5/macros
This macro is referenced from the create_sasodm_fromxml.sas driver program (described more fully below).
File references and other metadata that are required by the macro are set as global macro variable values. Currently, these global macro variable values are set through the framework initialization properties and the CDISC ODM 1.3.0 initialization properties. Throughout the processing of the odm_read macro, the Results data set contains all framework and ODM 1.3.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the odm_read macro accesses the ODM XML file.
Here is a partial listing of a sample ODM XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ODM 
  xmlns="http://www.cdisc.org/ns/odm/v1.3"
  FileOID="Study1234" 
  ODMVersion="1.3" 
  FileType="Snapshot" 
  CreationDateTime="2004-07-28T12:34:13-06:00" 
  SourceSystem="ss00"
  AsOfDateTime="2004-07-29T12:34:13-06:00"
  Granularity="SingleSite"
  Description="Study to determine existence of ischemic stroke"
  Archival="Yes" 
  PriorFileOID="Study-4321"
  Originator="SAS Institute"
  SourceSystemVersion="Version 0.0.0"
  Id="DSSignature123">
  <Study OID="1234"
    <GlobalVariables>
      <StudyName>1234</StudyName>
      <StudyDescription>1234 Data Definition</StudyDescription>
      <ProtocolName>1234</ProtocolName> 
    </GlobalVariables>
      <MeasurementUnit OID="MeasurementUnits.OID.MMHG" Name="MMHG"
        <Symbol>
          <TranslatedText xml:lang="en">mmHG</TranslatedText>
          <TranslatedText xml:lang="fr-CA">mmHG</TranslatedText>
        </Symbol>
      </MeasurementUnit>
      <MeasurementUnit OID="MeasurementUnits.OID.YRS" Name="YEARS">
        <Symbol>
          <TranslatedText xml:lang="de">Jahren</TranslatedText>
          <TranslatedText xml:lang="en">Years of age</TranslatedText>
          <TranslatedText xml:lang="fr-CA">Ans</TranslatedText>
        </Symbol>
    </BasicDefinitions>
    <MetaDataVersion MetaDataVersion OID="CDISC.SDTM.3.1.0"
      Name="Study 1234, Data Definitions" 
      Description="Study 1234, Data Definitions">
      <Include StudyOID="1234" MetaDataVersionOID="MDV000">
      </Include>
      <Protocol>
        <Description>
After the odm_read macro confirms that the ODM XML file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the ODM XML file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample ODM 1.3.0 study folder hierarchy under /referencexml as odm.map. The odm.map file is required to process the cubeXML file. If it does not exist, then the odm_read macro attempts to create one using the ODM reference metadata.
Here is a partial listing of the odm.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="ODM130" version="1.2">

<TABLE name="ItemDefs">
   <TABLE-PATH syntax="XPath">/LIBRARY/ItemDefs</TABLE-PATH>
   <TABLE-DESCRIPTION>Item metadata</TABLE-DESCRIPTION>

   <COLUMN name="OID">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/OID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique identifier for this item</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="Name">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Name</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) name</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="DataType">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/DataType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) data type (text, integer, float)</DESCRIPTION>
     <LENGTH>18</LENGTH>
   </COLUMN>
   <COLUMN name="Length">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Length</PATH>
     <TYPE>numeric</TYPE>
     <DATATYPE>numeric</DATATYPE>
     <DESCRIPTION>Item (variable) length</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
When the cubeXML is processed, each of the 66 data sets (such as ItemDefs) that are included in the SAS representation of the CDISC ODM 1.3.0 model is derived.
A number of input parameters can be specified in the call to the odm_read macro. These parameters offer the options of building source metadata files and SAS format catalogs for codelist translated text. These parameters are itemized in this table.
ODM_read Macro Parameters
Parameter
Description
_cstBuildSrcMetadata
Create the source metadata files (for example, source_tables and source_columns) as a part of the Read operation. Default=Y (yes), otherwise leave blank. This parameter is optional.
_cstBuildFmtCat
Build format catalog(s), representing language-specific codelist TranslatedText, as a part of the Read operation. Default=Y (yes), otherwise leave blank. This parameter is optional.
_cstFmtLib
Where catalog(s) are written. This parameter is optional. If not specified, default first to the value derived from SASReferences, then Work.
_cstReplaceFmtCat
Indicates that an existing format catalog by the same name in _cstFmtLib is replaced. This parameter is optional. Values: N | Y Default behavior: Y (overwrite existing catalog)
_cstFmtCatPrefix
The prefix to use for catalog names. This parameter is optional. If not specified, default is <standard mnemonic>FmtCat (such as ODMFmtCat). This default will produce an English format catalog name of ODMFmtCat_en.
_cstFmtCatLang
If specified, create a format catalog ONLY for the specified language. This parameter is optional. Example: _cstFmtCatLang=en. If no records exist for the specified language, an empty catalog is created.
_cstFmtCatLangOption
The action to take when no language tag is provided in the XML. This parameter is optional. Values: Ignore | English | Use_cstFmtCatLang. If Ignore, records are ignored (but reported in the SAS log). If English, records are added to the English catalog (default). If Use_cstFmtCatLang, records are added to the language catalog specified in the _cstFmtCatLang parameter.
By default, if a null-parameter %odm_read() macro call is made, source metadata files and SAS format catalogs for each language found in the clitemdecodetranslatedtext data set are created after the SAS data sets representing the ODM XML metadata and data content are derived. The target location of the derived metadata files is defined in the SASReferences data set. The target location of any derived SAS format catalogs is the SAS Work library unless defined in the SASReferences data set.

Sample Driver Program: create_sasodm_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM XML files, is guided by a sample driver program that is provided by SAS. For reading ODM XML files, this module is create_sasodm_fromxml.sas.
The driver program is located at:
sample study library directory/cdisc-odm-1.3.0–1.5/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and five output references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sasodm_fromxml.sas Macro lists these files and data sets, and they are discussed in separate sections. In the sample create_sasodm_fromxml.sas driver module, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.-&_cstVersion
&studyOutputPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.-&_cstVersion
Key Components of the SASReferences Data Set for the create_sasodm_fromxml.sas Macro
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
odmxml
fileref
&studyRootPath/sourcexml
odm_sample.xml
referencexml
odmmap
fileref
&studyRootPath/referencexml
odm.map
Output
sourcedata
srcdata
libref
&studyOutputPath/derived/data
*.*
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_columns.sas7bdat
targetdata
trgdata
libref
&studyOutputPath/derived/formats
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

The metadata type externalxml refers to the ODM XML file that is being read. The filename reference odmxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the ODM XML file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename reference odmmap is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the odm_read processing.

Process Outputs

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program.
This display shows an example of the contents of a Results data set that was built while reading the sample ODM XML file that was provided by SAS.
Example of a Partial Results Data Set Created by the create_sasodm_fromxml.sas Driver
Example of a partial Results data set that was created by the create_sasodm_fromxml.sas driver
The odm_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for each of the SAS data sets that are derived from the ODM XML file.
Example of Partial Source_Tables Data Set Derived during odm_read
Example of the partial source_tables data set that was derived during the odm_read process
Example of Partial Source_Columns Data Set Derived during odm_read
Example of the partial source_columns data set that was derived during the odm_read process
The Srcdata library contains the SAS data sets that represent the ODM file metadata and content. By default, the odm_read macro creates 66 unique data sets in the SAS Clinical Standards Toolkit for ODM 1.3.0. Some of these data sets might be empty if no associated content was derived from the ODM XML file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived during odm_read
Example of partial Srcdata library that was derived during the odm_read process

Extracting Clinical Data and Reference Data from the SAS Representation of an ODM XML File: odm_extractdomaindata Macro

As the primary interchange format for CDISC, ODM XML is a common format for electronic data capture (EDC) data management views of clinical data. This format often does not closely approximate submission (SDTM) and analysis (ADaM) data structures unless the EDC views have been built using the CDISC-CDASH standard. From a SAS perspective, you might want to extract clinical data from an ODM XML file to serve as source data for transformations that derive SDTM domain data sets.
The odm_extractdomaindata macro supports extracting clinical data or reference data from the SAS data sets that were created by the odm_read macro.
The odm_extractdomaindata macro makes the following assumptions:
  • An ODM XML file is available that contains sufficient metadata and content for extractable clinical data and reference data.
  • A full SAS representation of an ODM XML file is available (for example, the odm_read macro has been run against the XML file).
  • The SAS representation of an ODM XML file contains both metadata and data.
    By default, the driver assumes all source data files reside in the sample derived folder or the data folder that is typically populated by running the odm_read macro. However, the source data files and the source metadata files can be in different folders.
  • Any codelists defined in the ODM XML file and associated with extracted data set columns are available as part of the output of the odm_read macro.
ODM integer and float data types are converted to SAS numeric data. All other ODM data types are converted to SAS character data. If an integer or float data value cannot be converted, a warning appears in the SAS log and Results data set.
Here is a partial listing of the metadata in a sample ODM XML file:
<ItemGroupDef OID="ItemGroupDefs.OID.AE" Repeating="Yes" 
    SASDatasetName="AE" Name="Adverse Events" Domain="AE" 
    Comment="Some adverse events from this trial">
    <ItemRef ItemOID="ID.TAREA"    OrderNumber="1"  Mandatory="No" />
    <ItemRef ItemOID="ID.PNO"      OrderNumber="2"  Mandatory="No" />
    <ItemRef ItemOID="ID.SCTRY"    OrderNumber="3"  Mandatory="No" />
    <ItemRef ItemOID="ID.F_STATUS" OrderNumber="4"  Mandatory="No" />
    <ItemRef ItemOID="ID.LINE_NO"  OrderNumber="5"  Mandatory="No" />
    <ItemRef ItemOID="ID.AETERM"   OrderNumber="6"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTMON"  OrderNumber="7"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTDAY"  OrderNumber="8"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTYR"   OrderNumber="9"  Mandatory="No" />
    <ItemRef ItemOID="ID.AESTDT"   OrderNumber="10" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENMON"  OrderNumber="11" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENDAY"  OrderNumber="12" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENYR"   OrderNumber="13" Mandatory="No" />
    <ItemRef ItemOID="ID.AEENDT"   OrderNumber="14" Mandatory="No" />
    <ItemRef ItemOID="ID.AESEV"    OrderNumber="15" Mandatory="No" />
    <ItemRef ItemOID="ID.AEREL"    OrderNumber="16" Mandatory="No" />
    <ItemRef ItemOID="ID.AEOUT"    OrderNumber="17" Mandatory="No" />
    <ItemRef ItemOID="ID.AEACTTRT" OrderNumber="18" Mandatory="No" />
    <ItemRef ItemOID="ID.AECONTRT" OrderNumber="19" Mandatory="No" />
</ItemGroupDef>
...
<ItemDef OID="ID.AESTDT" SASFieldName="AESTDT"
    Name="Derived Start Date" DataType="date"/>
<ItemDef OID="ID.AEENMON" SASFieldName="AEENMON"
    Name="Stop Month - Enter Two Digits 01-12" DataType="integer" Length="2" />
<ItemDef OID="ID.AEENDAY" SASFieldName="AEENDAY"
    Name="Stop Day - Enter Two Digits 01-31" DataType="integer" Length="2" />
<ItemDef OID="ID.AEENYR" SASFieldName="AEENYR"
    Name="Stop Year - Enter Four Digit Year" DataType="integer" Length="4" />
<ItemDef OID="ID.AEENDT" SASFieldName="AEENDT"
    Name="Derived Stop Date" DataType="date"/>
<ItemDef OID="ID.AESEV" SASFieldName="AESEV"
    Name="Severity” DataType="text" Length="1">
<CodeListRef CodeListOID="CL.$AESEV" />
</ItemDef>
<ItemDef OID="ID.AEREL" SASFieldName="AEREL"
      Name="Relationship to study drug" DataType="text" Length="1">
       <CodeListRef CodeListOID="CL.$AEREL" />
    </ItemDef>
Here is a partial listing of the data in the same sample ODM XML file:
<ClinicalData StudyOID="Study.OID" MetaDataVersionOID="MetaDataVersion.OID.1">
<SubjectData SubjectKey="S001P011" TransactionType="Insert">
    <StudyEventData StudyEventOID="StudyEventDefs.OID.6.AdverseEvent"
          StudyEventRepeatKey="1">
        <FormData FormOID="FormDefs.OID.AE" FormRepeatKey="1">
        <ItemGroupData ItemGroupOID="ItemGroupDefs.OID.AE"
              ItemGroupRepeatKey="1">
            <ItemData ItemOID="ID.TAREA" Value="ONC" />
            <ItemData ItemOID="ID.PNO" Value="143-02" />
            <ItemData ItemOID="ID.SCTRY" Value="USA" />
            <ItemData ItemOID="ID.F_STATUS" Value="V" />
            <ItemData ItemOID="ID.LINE_NO" Value="1" />
            <ItemData ItemOID="ID.AETERM" Value="HEADACHE" />
            <ItemData ItemOID="ID.AESTMON" Value="06" />
            <ItemData ItemOID="ID.AESTDAY" Value="10" />
            <ItemData ItemOID="ID.AESTYR" Value="1999" />
            <ItemData ItemOID="ID.AESTDT" Value="1999-06-10" />
            <ItemData ItemOID="ID.AEENMON" Value="06" />
            <ItemData ItemOID="ID.AEENDAY" Value="14" />
            <ItemData ItemOID="ID.AEENYR" Value="1999" />
            <ItemData ItemOID="ID.AEENDT" Value="1999-06-14" />
            <ItemData ItemOID="ID.AESEV" Value="1" />
            <ItemData ItemOID="ID.AEREL" Value="0" />
            <ItemData ItemOID="ID.AEOUT" Value="1" />
            <ItemData ItemOID="ID.AEACTTRT" Value="0" />
            <ItemData ItemOID="ID.AECONTRT" Value="1" />
        </ItemGroupData>
The odm_extractdomaindata macro creates the data set shown in AE SAS Data Set (Unformatted) Created by the odm_extractdomaindata Macro and AE SAS Data Set (Formatted) Created by the odm_extractdomaindata Macro. The first 12 columns in this data set are the data set keys. The macro parameter _cstODMMinimumKeyset determines whether these keys are part of the extracted data set.
AE SAS Data Set (Unformatted) Created by the odm_extractdomaindata Macro
AE SAS Data Set (Unformatted) Created by the odm_extractdomaindata Macro
AE SAS Data Set (Formatted) Created by the odm_extractdomaindata Macro
AE SAS Data Set (Formatted) Created by the odm_extractdomaindata Macro
The odm_extractdomaindata macro has this signature:
%macro odm_extractdomaindata(
  _cstSourceMetadata=,
  _cstSourceData=,
  _cstIsReferenceData=No,
  _cstSelectAttribute=Name,
  _cstSelectAttributeValue=,
  _cstLang=en,
  _cstMaxLabelLength=256,
  _cstAttachFormats=Yes,
  _cstODMMinimumKeyset=No,
  _cstOutputLibrary=,
  _cstOutputDS=
  );

Here are the parameters:
  • _cstSourceMetadata and _cstSourceData contain the SAS libref for the SAS ODM metadata representation data.
    If this is not specified, the macro looks for type=sourcedata in SASReferences. If this is not provided, the data set source is assumed to be in the SAS Work library.
  • _cstIsReferenceData indicates whether the data to extract is reference data or clinical data. Examples of reference data are laboratory reference ranges or trial design data.
  • _cstSelectAttribute contains the ItemGroup attribute that identifies which ItemGroup to extract. Valid values are OID, Name, SASDatasetName, and Domain.
  • _cstSelectAttributeValue contains the value of the attribute defined by _cstSelectAttribute that identifies the ItemGroup to extract.
  • _cstLang specifies a language identifier for the language tag attribute (xml:lang) in the ODM TranslatedText elements.
  • _cstMaxLabelLength determines the maximum value of labels to be created.
    If this is not provided, 256 is assumed. Formats are attached to the data set variables in case the parameter _cstAttachFormats has a value of ‘Yes’.
  • _cstODMMinimumKeyset determines the creation of data set keys. If this is not provided, ‘No’ is assumed.
  • _cstOutputLibrary defines the SAS library where the extracted data sets are written.
    If this is not specified, the macro looks for type=targetdata in SASReferences. If this is not provided, the data sets are written to the SAS Work library.
  • _cstOutputDS contains the name of the extracted data set.
    If this is an invalid SAS data set name, an error is generated. If the data set name is not provided, the macro looks for type=targetdata in SASReferences.
Two sample driver programs for ODM version 1.3.0 are provided by SAS to demonstrate the use of the odm_extractdomaindata macro:
sample study library directory/cdisc-odm-1.3.0-1.5/programs/extract_domaindata_all.sas
sample study library directory/cdisc-odm-1.3.0-1.5/programs/extract_domaindata.sas
Two sample driver programs for ODM version 1.3.1 are provided by SAS to demonstrate the use of the odm_extractdomaindata macro:
sample study library directory/cdisc-odm-1.3.1-1.5/programs/extract_domaindata_all.sas
sample study library directory/cdisc-odm-1.3.1-1.5/programs/extract_domaindata.sas
The extract_domaindata_all.sas sample driver programs demonstrate how all data sets can be extracted at once. The following shows a code fragment:
filename incCode CATALOG "work._cstCode.domains.source" lrecl=255;

data _null_;
 set srcdata.itemgroupdefs(keep=OID Name IsReferenceData SASDatasetName Domain);
  file incCode;
  length macrocall $400 _cstOutputName $100;

  _cstOutputName=SASDatasetName;
  * If we have to use the Name, Only use letters and digits;
  if missing(_cstOutputName) then _cstOutputName=cats(compress(Name, 'adk'));
  * If first character a digit, prepend an underscore;
  if anydigit(_cstOutputName)=1 then _cstOutputName=cats('_', _cstOutputName);
  * Cut long names;
  if length(_cstOutputName) > 32 then _cstOutputName=substr(_cstOutputName, 1, 32);

  macrocall=cats('%odm_extractdomaindata(_cstSelectAttribute=OID',
                                      ', _cstSelectAttributeValue=', OID,
                                      ', _cstIsReferenceData=', IsReferenceData,
                                      ', _cstMaxLabelLength=256',
                                      ', _cstAttachFormats=Yes',
                                      ', _cstODMMinimumKeyset=No',
                                      ', _cstLang=en',
                                      ', _cstOutputDS=', _cstOutputName, ');');
  put macrocall;
run;

%include incCode;
filename incCode clear;

Reading CDISC ODM Controlled Terminology XML Files: ct_read Macro

To read an ODM controlled terminology XML file as published quarterly by NCI, a specialized macro named ct_read is available in the CDISC controlled terminology 1.0 standards macros folder. This folder is located at:
global standards library directory/standards/cdisc-ct-1.0-1.5/macros
This macro is referenced from the create_sasct_fromxml.sas driver program. For more information, see Sample Driver Program: create_sasct_fromxml.sas .
File references and other metadata that are required by the macro are set as global macro variable values. These global macro variable values are set through the framework initialization properties and the CDISC controlled terminology 1.0 initialization properties. Throughout the processing of the ct_read macro, the Results data set contains all framework-specific messages and CDISC controlled terminology 1.0-specific messages that were generated during run time.
Based on file references defined in the SASReferences data set, the ct_read macro accesses the ODM controlled terminology XML file.
Here is a partial listing of a sample ODM controlled terminology XML file:
Partial Listing of a Sample ODM Controlled Terminology XML File
Partial Listing of a Sample ODM Controlled Terminology XML File
After the ct_read macro confirms that the ODM controlled terminology XML file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the ODM controlled terminology XML file into a cubeXML file through transformations using XSL files and processes.
The cubeXML file is created in the SAS Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number.
The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample CDISC controlled terminology 1.0 study folder hierarchy (referencexml/odm.map). An odm.map file is required to process the cubeXML file. If it does not exist, the ct_read macro attempts to create one using the CDISC controlled terminology reference metadata.
Here is a partial listing of the odm.map file.
<?xml version="1.0" encoding="UTF-8"?>
<SXLEMAP name="CT100" version="1.2">

<TABLE name="CodeLists">
   <TABLE-PATH syntax="XPath">/LIBRARY/CodeLists</TABLE-PATH>
   <TABLE-DESCRIPTION>Codelist metadata</TABLE-DESCRIPTION>

   <COLUMN name="OID">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/OID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique identifier for this codelist</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="Name">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/Name</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>CodeList name</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="DataType">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/DataType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>CodeList item value data type (integer | float | text | string)</DESCRIPTION>
     <LENGTH>7</LENGTH>
   </COLUMN>
   <COLUMN name="SASFormatName">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/SASFormatName</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>SAS format name</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
   <COLUMN name="ExtCodeID">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/ExtCodeID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique numeric code randomly generated by NCI Thesaurus (NCIt)</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="CodeListExtensible">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/CodeListExtensible</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Defines if controlled terms may be added to the codelist (Yes | No)</DESCRIPTION>
     <LENGTH>3</LENGTH>
   </COLUMN>
   <COLUMN name="CDISCSubmissionValue">
     <PATH syntax="Xpath">/LIBRARY/CodeLists/CDISCSubmissionValue</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Specific value expected for submissions</DESCRIPTION>
     <LENGTH>512</LENGTH>
   </COLUMN>
When the cubeXML file is processed, each of the 15 data sets (such as CodeLists) that are included in the SAS representation of the CDISC controlled terminology model is derived. One input parameter can be specified in the call to the ct_read macro. The parameter offers the option to create source metadata files.
The parameter is shown in this table:
ct_read Macro Parameter
Parameter
Description
_cstBuildSrcMetadata
Create the source metadata files (for example, source_tables and source_columns) as a part of the Read operation. Default=Y (yes), otherwise leave blank. Optional.
By default, if a %ct_read() macro call is made with null parameters, source metadata is derived. The target location of the derived metadata files is defined in the SASReferences data set.

Sample Driver Program: create_sasct_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM controlled terminology XML files, is guided by a sample driver program that is provided by SAS. For reading ODM controlled terminology XML files, this driver program is create_sasct_fromxml.sas.
This driver program is located in:
sample study library directory/cdisc-ct-1.0-1.5/programs

The SASReferences Data Set

As part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. The SASReferences data set references the input files that are needed (such as the ODM controlled terminology XML file), the librefs and filenames to use, and the names and locations of the data sets to create. The SASReferences data set can be modified to point to study-specific files.
For more information, see SASReferences File.
In the SASReferences data set, there are two input file references and five output data set references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sasct_fromxml.sas Macro lists these files and data sets. In the sample create_sasct_fromxml.sas macro, the following values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-ct-1.0-1.5
&studyOutputPath=sample study library directory/cdisc-ct-1.0-1.5
Key Components of the SASReferences Data Set for the create_sasct_fromxml.sas Macro
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml/sdtm/201212
sdtm_terminology.xml
referencexml
ctmap
fileref
&studyRootPath/referencexml
ct-1.0.0.map
Output
sourcedata
srcdata
libref
&studyOutputPath/data/sdtm/201212
*.*
results
results
libref
&studyOutputPath/results
read_results_sdtm_2012.sas7bdat

Process Inputs

The metadata type externalxml refers to the ODM controlled terminology XML file to read. The filename reference crtxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code to refer to the ODM controlled terminology XML file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename reference ctmap is defined in the SASReferences data set. This filename reference is used in the submitted SAS code to refer to the SAS map file. If a path and filename for the map file are not specified, a temporary map file is created as part of the ct_read macro processing.

Process Outputs

When the driver program finishes, the read_results_sdtm_201212 data set is created in the Results library. This data set contains informational messages, warnings, and error messages that were generated by the program.
This display shows an example of the contents of a Results data set that was created while reading the sample ODM controlled terminology XML file as released by NCI that was provided by SAS.
Example of a Partial Results Data Set Created by the create_sasct_fromxml.sas Macro
Example of a Partial Results Data Set Created by the create_sasct_fromxml.sas Macro
The Srcdata library contains the SAS data sets that represent the ODM controlled terminology XML file metadata and content. By default, the ct_read macro creates 15 unique data sets in the SAS Clinical Standards Toolkit. Some of these data sets can be empty if no associated content was derived from the ODM controlled terminology XML file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived from the ct_read Macro
Example of Partial Srcdata Library Derived from the ct_read Macro

Creating a Format Catalog and a Controlled Terminology Data Set from the SAS Representation of a CDISC ODM Controlled Terminology XML File: ct_createformats Macro

To use the NCI CDISC controlled terminology in a SAS Clinical Standards Toolkit process, the SAS data sets created by the ct_read macro must be converted to a SAS format catalog. To enable SAS Clinical Data Integration to import controlled terminology, the SAS data set representation created by the ct_read macro must be combined into one SAS data set.
This display shows an example of controlled terminology in ODM XML (the Action Taken with Study Treatment codelist):
Example of Controlled Terminology in ODM XML
Example of Controlled Terminology in ODM XML
The ct_createformats macro creates the data set shown in this display:
Partial cterms SAS Data Set Created by the ct_createformats Macro
Partial cterms SAS Data Set Created by the ct_createformats Macro
The ct_createformats macro uses the data set to create the $ACN SAS format shown in this display:
$ACN SAS Format Created by the ct_createformats Macro
$ACN SAS Format Created By the ct_createformats Macro
The ct_createformats macro has this signature:
%macro ct_createformats(
  _cstLang=en,                /* Language tag in TranslatedText to use      */
  _cstCreateCatalog=1,        /* Create format catalog                      */
  _cstKillCatFirst=0,         /* Empty catalog first                        */
  _cstUseExpression=,         /* Expression to create the SAS format name   */
  _cstAppendChar=F,           /* Letter to append in case SAS format name
                                 ends with digit                            */
  _cstDeleteEmptyColumns=1,   /* Delete columns in output data set that are   
                                 completely missing                         */
  _cstTrimCharacterData=1     /* Truncate character data in output data set
                                 to the minimum value needed.               */
  );
The ct_createformats macro attempts to map the CodeList/nciodm:CDISCSubmissionValue in the codelist variable to the fmtname variable. The fmtname variable value must contain a valid SAS format name. The ct_createformats macro uses the following steps to create a valid SAS format name:
  1. Apply a user-defined expression to create the fmtname variable.
  2. If the value of fmtname is empty, use the CodeList/SASFormatName attribute (typically empty in NCI EVS ODM XML files).
  3. If the value of fmtname is empty, use the CodeList/nciodm:CDISCSubmissionValue value in the codelist variable.
  4. If the value of fmtname ends with a digit, add the character specified by the _cstAppendChar macro parameter (default=F).
After these steps, the value of the fmtname variable is validated against the following regular expression:
'm/^(?=.{1,32}$)([\$a-zA-Z_][a-zA-Z0-9_]*[a-zA-Z_])$/'
If the value of the fmtname variable fails validation, the fmtname variable value does not contain a valid SAS format name. The value is set to missing. Then, the codelist is not used to create a SAS format.
Two sample driver programs are provided by SAS to demonstrate the use of the ct_createformats macro:
sample study library directory/cdisc-ct-1.0-1.5/programs/create_ctformats.sas
sample study library directory/cdisc-ct-1.0-1.5/programs/create_ctformats_qs.sas
Both of these sample driver programs demonstrate how the CDISCSubmissionValue can be mapped to a valid SAS format name.

Reading CDISC CRT-DDS define.xml Files: crtdds_read Macro

The process for reading CDISC CRT-DDS define.xml files is similar to reading CDISC ODM XML files. The SAS Clinical Standards Toolkit supports reading a define.xml file and translating the file metadata into a SAS representation of the CDISC CRT-DDS model. To read the define.xml file, a specialized macro named crtdds_read is available in the CRT-DDS 1.0 standards macro folder, located in global standards library directory/standards/cdisc-crtdds-1.0-1.5/macros. This macro is referenced from the create_sascrtdds_fromxml.sas driver program. There are no input parameters in the call to the crtdds_read macro. File references and other metadata that are required by the macro are set as global macro variables. Currently, their values are set through the framework initialization properties and the CDISC CRT-DDS 1.0 initialization properties processes. Throughout processing of the crtdds_read macro, the Results data set contains all framework and CRT-DDS 1.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the crtdds_read macro accesses the define.xml file.
Here is a partial listing of a define.xml file.
<ODM xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:def="http://www.cdisc.org/ns/def/v1.0" 
   xmlns="http://www.cdisc.org/ns/odm/v1.2" FileOID="1" 
   CreationDateTime="2011-07-13T17:15:43-04:00"
   AsOfDateTime="2011-07-13T17:12:42" 
   Description="define1" FileType="Snapshot" Id="define1"
   ODMVersion="1.0">
<Study OID="1">
  <GlobalVariables>
    <StudyName>study1</StudyName>
    <StudyDescription>first study</StudyDescription>
    <ProtocolName>Protocol abc</ProtocolName>
  </GlobalVariables>
  <MetaDataVersion OID="1" Name="CDISC-SDTM 3.1.2" 
                   Description="CDISC-SDTM 3.1.2"
                   def:DefineVersion="1.0.0" 
                   def:StandardName="CDISC SDTM"
                   def:StandardVersion="3.1.2">
   <ItemGroupDef 
     OID="AE1" Name="AE" Repeating="Yes"
     IsReferenceData="No" 
     SASDatasetName="AE" Domain="AE"
     Purpose="Tabulation" def:Label="Adverse Events" 
     def:Class="Events" 
     def:Structure="One record per adverse event per subject"
     def:DomainKeys="STUDYID USUBJID AEDECOD AESTDTC" 
     def:ArchiveLocationID="AE1">
      <ItemRef ItemOID="COL1" Mandatory="Yes"
        OrderNumber="1" KeySequence="1" Role="Identifier"/>
      <ItemRef ItemOID="COL2" Mandatory="Yes"
        OrderNumber="2" Role="Identifier"/>
      <ItemRef ItemOID="COL3" Mandatory="Yes"
        OrderNumber="3" KeySequence="2" Role="Identifier"/>
      <ItemRef ItemOID="COL4" Mandatory="Yes"
        OrderNumber="4" Role="Identifier"/>
      <ItemRef ItemOID="COL5" Mandatory="No"
        OrderNumber="5" Role="Identifier"/>
      <ItemRef ItemOID="COL6" Mandatory="No"
        OrderNumber="6" Role="Identifier"/>
      <ItemRef ItemOID="COL7" Mandatory="No"
        OrderNumber="7" Role="Identifier"/>
After the crtdds_read macro confirms that the define.xml file exists, a call is made to the SAS DATA step component JavaObj. The JavaObj processing converts the define.xml file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml , where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample CRT-DDS 1.0 study folder hierarchy under /referencexml as define.map. The define.map file must exist to process the cubeXML file. If it does not exist, then the crtdds_read attempts to create one using the CRT-DDS reference metadata.
Here is a partial listing of the define.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP version="1.2">

<TABLE name="AnnotatedCRFs">
   <TABLE-PATH syntax="XPath">/LIBRARY/AnnotatedCRFs</TABLE-PATH>
   <TABLE-DESCRIPTION>Annotated CRF metadata</TABLE-DESCRIPTION>

   <COLUMN name="DocumentRef">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/DocumentRef</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The referenced Annotated CRF document</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="leafID">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/leafID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The unique ID of the referenced Annotated CRF</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="FK_MetaDataVersion">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/FK_MetaDataVersion</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Foreign key: MetaDataVersion.OID</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>

</TABLE>
Processing of the cubeXML file results in the derivation of the data sets (such as ItemDefs) currently included in the SAS representation of the CDISC CRT-DDS model.
The final step in crtdds_read processing is the derivation of table and column metadata that describe the data sets in the SAS representation of the define.xml file. At this point, the crtdds_read macro is ready to create the source_tables and source_columns data sets. The tables in the source_tables data sets are created and copied to the output library as defined in the SASReferences data set.

Sample Driver Program: create_sascrtdds_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC CRT-DDS XML files, is guided by a sample driver program that is provided by SAS. The create_sascrtdds_fromxml.sas driver program is used to read define.xml files.
The driver program is located at:
sample study library directory/cdisc-crtdds-1.0–1.5/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and four output references that are key to successful completion of the driver program. Key Components of the SASReferences Data Set for the create_sascrtdds_fromxml.sas Macro lists these files and data sets, and they are discussed in separate sections. In the sample create_sascrtdds_fromxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath and are specific to a SAS release.
&studyRootPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
&studyOutputPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
Key Components of the SASReferences Data Set for the create_sascrtdds_fromxml.sas Macro
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml
define.xml
referencexml
crtmap
fileref
&studyRootPath/referencexml
define.map
Output
sourcedata
srcdata
libref
&studyOutputPath/deriveddata
*.*
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_columns.sas7bdat
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_study.sas7bdat
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

Process Inputs The metadata type externalxml refers to the define.xml file that is being read. The filename reference crtxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the define.xml file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the define.xml file metadata and content. The filename reference crtmap is defined in the SASReferences data set that is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the crtdds_read processing.

Process Outputs

The sourcedata type is the library where the metadata files are created. These metadata files are the data sets that comprise the CRT-DDS information.
The sourcemetadata type refers to two data sets that are created from the cubeXML file, source_tables, and source_columns. Both data sets are stored in the same library. The source_tables data set contains metadata about each table that is derived from the CRT¬DDS process. The source_columns data set contains similar metadata, but it is at the column level. Both of the data sets are written to the Srcmeta library. The sourcemetadata also refers to a data set source_study. The source_study data set is also created in the Srcmeta library and contains study metadata.
The results type refers to the Results data set that contains information from running the CRT-DDS process. This information is written to the read_results data set in the Results library.

Process Results

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program.
This display shows an example of the contents of a Results data set in the CRT-DDS sample study.
Example of a Partial Results Data Set Created by the create_sascrtdds_fromxml.sas Driver
Example of a partial Results data set that is created by the create_sascrtdds_fromxml.sas driver
The crtdds_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for the SAS representation of CRT-DDS that is derived from the define.xml file. The Srcmeta library corresponds to the location specified in SASReferences (&studyOutputPath/ derivedmetadata).
Example of Partial Source_Tables Data Set Derived during crtdds_read
Example of the partial source_tables data set derived during crtdds_read
Example of Partial Source_Columns Data Set Derived during crtdds_read
Display of the partial source_columns data set derived during crtdds_read
The Srcdata library contains the driver-generated tables that comprise the SAS representation of the CRT-DDS model. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library. The Srcdata library corresponds to the location specified in SASReferences (&studyOutputPath/deriveddata).
Example of Partial Srcdata Library Derived during crtdds_read
Example of partial Srcdata library derived during crtdds_read
When running the driver programs against non-sample data, you must populate the SASReferences data set in the driver program with the proper values. For an explanation of the SASReferences data set, see SASReferences File.