Reading XML Files

Overview of Basic Workflow

Here is the basic workflow for reading XML files:
  1. Determine the existence of a valid XML file.
  2. Use valid XSL style sheets for each target data set (such as ItemDefs.xsl).
  3. Use the SAS DATA step component JavaObj to create a standardized intermediate cubeXML file using the XSL style sheets.
  4. Read the standardized cubeXML file using the SAS XML LIBNAME engine and XMLMAP processing.
This basic workflow is used by all XML-based standards that are supported by the SAS Clinical Standards Toolkit.

Reading CDISC ODM XML Files: odm_read Macro

In order to read an ODM XML file, a specialized macro named odm_read is available in the ODM 1.3.0 standards macro folder. This folder is located here:
<global standards library directory>/standards/cdisc-odm-1.3.0-1.4/ macros
This macro is referenced from the create_sasodm_fromxml.sas driver program (described more fully below).
File references and other metadata that are required by the macro are set as global macro variable values. Currently, these global macro variable values are set through the framework initialization properties and the CDISC ODM 1.3.0 initialization properties. Throughout the processing of the odm_read macro, the Results data set contains all framework and ODM 1.3.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the odm_read macro accesses the ODM XML file.
Here is a partial listing of a sample ODM XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ODM 
  xmlns="http://www.cdisc.org/ns/odm/v1.3"
  FileOID="Study1234" 
  ODMVersion="1.3" 
  FileType="Snapshot" 
  CreationDateTime="2004-07-28T12:34:13-06:00" 
  SourceSystem="ss00"
  AsOfDateTime="2004-07-29T12:34:13-06:00"
  Granularity="SingleSite"
  Description="Study to determine existence of ischemic stroke"
  Archival="Yes" 
  PriorFileOID="Study-4321"
  Originator="SAS Institute"
  SourceSystemVersion="Version 0.0.0"
  Id="DSSignature123">
  <Study OID="1234"
    <GlobalVariables>
      <StudyName>1234</StudyName>
      <StudyDescription>1234 Data Definition</StudyDescription>
      <ProtocolName>1234</ProtocolName> 
    </GlobalVariables>
      <MeasurementUnit OID="MeasurementUnits.OID.MMHG" Name="MMHG"
        <Symbol>
          <TranslatedText xml:lang="en">mmHG</TranslatedText>
          <TranslatedText xml:lang="fr-CA">mmHG</TranslatedText>
        </Symbol>
      </MeasurementUnit>
      <MeasurementUnit OID="MeasurementUnits.OID.YRS" Name="YEARS">
        <Symbol>
          <TranslatedText xml:lang="de">Jahren</TranslatedText>
          <TranslatedText xml:lang="en">Years of age</TranslatedText>
          <TranslatedText xml:lang="fr-CA">Ans</TranslatedText>
        </Symbol>
    </BasicDefinitions>
    <MetaDataVersion MetaDataVersion OID="CDISC.SDTM.3.1.0"
      Name="Study 1234, Data Definitions" 
      Description="Study 1234, Data Definitions">
      <Include StudyOID="1234" MetaDataVersionOID="MDV000">
      </Include>
      <Protocol>
        <Description>
After the odm_read macro confirms that the ODM XML file exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing converts the ODM XML file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample ODM 1.3.0 study folder hierarchy under /referencexml as odm.map. The odm.map file is required to process the cubeXML file. If it does not exist, then the odm_read macro attempts to create one using the ODM reference metadata.
Here is a partial listing of the odm.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="ODM130" version="1.2">

<TABLE name="ItemDefs">
   <TABLE-PATH syntax="XPath">/LIBRARY/ItemDefs</TABLE-PATH>
   <TABLE-DESCRIPTION>Item metadata</TABLE-DESCRIPTION>

   <COLUMN name="OID">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/OID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique identifier for this item</DESCRIPTION>
     <LENGTH>64</LENGTH>
   </COLUMN>
   <COLUMN name="Name">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Name</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) name</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="DataType">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/DataType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Item (variable) data type (text, integer, float)</DESCRIPTION>
     <LENGTH>18</LENGTH>
   </COLUMN>
   <COLUMN name="Length">
     <PATH syntax="Xpath">/LIBRARY/ItemDefs/Length</PATH>
     <TYPE>numeric</TYPE>
     <DATATYPE>numeric</DATATYPE>
     <DESCRIPTION>Item (variable) length</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
When the cubeXML is processed, each of the 66 data sets (such as ItemDefs) that are included in the SAS representation of the CDISC ODM model is derived.
A number of input parameters can be specified in the call to the odm_read macro. These parameters offer the options of building source metadata files and SAS format catalogs for codelist translated text. These parameters are itemized in this table.
ODM_read Macro Parameters
Parameter
Description
_cstBuildSrcMetadata
Create the source metadata files (for example, source_tables and source_columns) as a part of the Read operation. Default=Y (yes), otherwise leave blank. Optional.
_cstBuildFmtCat
Build format catalog(s), representing language-specific codelist TranslatedText, as a part of the Read operation. Default=Y (yes), otherwise leave blank. Optional.
_cstFmtLib
Where catalog(s) are to be written. Optional. If not specified, default first to value derived from SASReferences, then Work.
_cstReplaceFmtCat
Should an existing format catalog by that name in _cstFmtLib be replaced? Optional. Values: N | Y Default behavior: Y (overwrite existing catalog)
_cstFmtCatPrefix
Use this prefix for catalog names. Optional. If not specified, default is <standard mnemonic>FmtCat (such as ODMFmtCat). This default will produce an English format catalog name of ODMFmtCat_en.
_cstFmtCatLang
If specified, create a format catalog ONLY for the specified language. Optional. Example: _cstFmtCatLang=en. If no records exist for the specified language, an empty catalog is created.
_cstFmtCatLangOption
If no language tag is provided in the XML, what action should be taken with these records? Optional. Values: Ignore | English | Use_cstFmtCatLang. If Ignore, records are ignored (but reported in the SAS log). If English, records are added to the English catalog (default). If Use_cstFmtCatLang, records are added to the language catalog specified in the _cstFmtCatLang parameter.
By default, if a null-parameter %odm_read() macro call is made, source metadata files and SAS format catalogs for each language found in the clitemdecodetranslatedtext data set are created after the SAS data sets representing the ODM XML metadata and data content are derived. The target location of the derived metadata files is defined in the SASReferences data set. The target location of any derived SAS format catalogs is the SAS Work library unless defined in the SASReferences data set.

Sample Driver Program: create_sasodm_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM XML files, is guided by a sample driver module that is provided by SAS. For reading ODM XML files, this module is create_sasodm_fromxml.sas.
The driver program is located at:
!sasroot/../../SASClinicalStandardsToolkitODM130/1.4/sample/cdisc-odm-1.3.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and five output references that are key to the successful completion of the driver program. Key Components of the SASReferences Data Set lists these files and data sets, and they are discussed in separate sections. In the sample create_sasodm_fromxml.sas driver module, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitODM130/&_cstVersion/sample/cdisc-odm-1.3.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitODM130/&cstVersion/sample/cdisc-odm-1.3.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
odmxml
fileref
&studyRootPath/sourcexml
odm_sample.xml
Input
referencexml
odmmap
fileref
&studyRootPath/referencexml
odm.map
Output
sourcedata
srcdata
libref
&studyOutputPath/derived/data
*.*
Output
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_tables.sas7bdat
Output
sourcemetadata
srcmeta
libref
&studyOutputPath/derived/metadata
source_columns.sas7bdat
Output
targetdata
trgdata
libref
&studyOutputPath/derived/formats
Output
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

The metadata type externalxml refers to the ODM XML file that is being read. The filename reference odmxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the ODM XML file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename reference odmmap is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the odm_read processing.

Process Outputs

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program.
This display shows an example of the contents of a Results data set that was built while reading the sample ODM XML file that was provided by SAS.
Example of a Partial Results Data Set Created by the create_sasodm_fromxml.sas Driver
Example of a partial Results data set that was created by the create_sasodm_fromxml.sas driver
The odm_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for each of the SAS data sets that are derived from the ODM XML file.
Example of Partial Source_Tables Data Set Derived during odm_read
Example of the partial source_tables data set that was derived during the odm_read process
Example of Partial Source_Columns Data Set Derived during odm_read
Example of the partial source_columns data set that was derived during the odm_read process
The Srcdata library contains the SAS data sets that represent the ODM file metadata and content. By default, the odm_read macro creates 66 unique data sets in the SAS Clinical Standards Toolkit. Some of these data sets might be empty if no associated content was derived from the ODM XML file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived during odm_read
Example of partial Srcdata library that was derived during the odm_read process

Reading CDISC CRT-DDS define.xml Files: crtdds_read Macro

The process for reading CDISC CRT-DDS define.xml files is similar to reading CDISC ODM XML files. The SAS Clinical Standards Toolkit supports reading a define.xml file and translating the file metadata into a SAS representation of the CDISC CRT-DDS model. To read the define.xml file, a specialized macro named crtdds_read is available in the CRT-DDS 1.0 standards macro folder, located in <global standards library directory>/standards/cdisc-crtdds-1.0-1.4/macros. This macro is referenced from the create_sascrtdds_fromxml.sas driver program. There are no input parameters in the call to the crtdds_read macro. File references and other metadata that are required by the macro are set as global macro variables. Currently, their values are set through the framework initialization properties and the CDISC CRT-DDS 1.0 initialization properties processes. Throughout processing of the crtdds_read macro, the Results data set contains all framework and CRT-DDS 1.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the crtdds_read macro accesses the define.xml file.
Here is a partial listing of a define.xml file.
<ODM xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:def="http://www.cdisc.org/ns/def/v1.0" 
   xmlns="http://www.cdisc.org/ns/odm/v1.2" FileOID="1" 
   CreationDateTime="2011-07-13T17:15:43-04:00"
   AsOfDateTime="2011-07-13T17:12:42" 
   Description="define1" FileType="Snapshot" Id="define1"
   ODMVersion="1.0">
<Study OID="1">
  <GlobalVariables>
    <StudyName>study1</StudyName>
    <StudyDescription>first study</StudyDescription>
    <ProtocolName>Protocol abc</ProtocolName>
  </GlobalVariables>
  <MetaDataVersion OID="1" Name="CDISC-SDTM 3.1.2" 
                   Description="CDISC-SDTM 3.1.2"
                   def:DefineVersion="1.0.0" 
                   def:StandardName="CDISC SDTM"
                   def:StandardVersion="3.1.2">
   <ItemGroupDef 
     OID="AE1" Name="AE" Repeating="Yes"
     IsReferenceData="No" 
     SASDatasetName="AE" Domain="AE"
     Purpose="Tabulation" def:Label="Adverse Events" 
     def:Class="Events" 
     def:Structure="One record per adverse event per subject"
     def:DomainKeys="STUDYID USUBJID AEDECOD AESTDTC" 
     def:ArchiveLocationID="AE1">
      <ItemRef ItemOID="COL1" Mandatory="Yes"
        OrderNumber="1" KeySequence="1" Role="Identifier"/>
      <ItemRef ItemOID="COL2" Mandatory="Yes"
        OrderNumber="2" Role="Identifier"/>
      <ItemRef ItemOID="COL3" Mandatory="Yes"
        OrderNumber="3" KeySequence="2" Role="Identifier"/>
      <ItemRef ItemOID="COL4" Mandatory="Yes"
        OrderNumber="4" Role="Identifier"/>
      <ItemRef ItemOID="COL5" Mandatory="No"
        OrderNumber="5" Role="Identifier"/>
      <ItemRef ItemOID="COL6" Mandatory="No"
        OrderNumber="6" Role="Identifier"/>
      <ItemRef ItemOID="COL7" Mandatory="No"
        OrderNumber="7" Role="Identifier"/>
After the crtdds_read macro confirms that the define.xml file exists, a call is made to the SAS DATA step component JavaObj. The JavaObj processing converts the define.xml file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml , where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample CRT-DDS 1.0 study folder hierarchy under /referencexml as define.map. The define.map file must exist to process the cubeXML file. If it does not exist, then the crtdds_read attempts to create one using the CRT-DDS reference metadata.
Here is a partial listing of the define.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP version="1.2">

<TABLE name="AnnotatedCRFs">
   <TABLE-PATH syntax="XPath">/LIBRARY/AnnotatedCRFs</TABLE-PATH>
   <TABLE-DESCRIPTION>Annotated CRF metadata</TABLE-DESCRIPTION>

   <COLUMN name="DocumentRef">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/DocumentRef</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The referenced Annotated CRF document</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="leafID">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/leafID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>The unique ID of the referenced Annotated CRF</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="FK_MetaDataVersion">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/FK_MetaDataVersion</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Foreign key: MetaDataVersion.OID</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>

</TABLE*
Processing of the cubeXML file results in the derivation of the data sets (such as ItemDefs) currently included in the SAS representation of the CDISC CRT-DDS model.
The final step in crtdds_read processing is the derivation of table and column metadata that describe the data sets in the SAS representation of the define.xml file. At this point, the crtdds_read macro is ready to create the source_tables and source_columns data sets. The tables in the source_tables data sets are created and copied to the output library as defined in the SASReferences data set.

Sample Driver Program: create_sascrtdds_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC CRT-DDS XML files, is guided by a sample driver program that is provided by SAS. The create_sascrtdds_fromxml.sas driver program is used to read define.xml files.
The driver program is located in:
!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and four output references that are key to successful completion of the driver program. Key Components of the SASReferences Data Set lists these files and data sets, and they are discussed in separate sections. In the sample create_sascrtdds_fromxml.sas driver program, these values are set for &studyRootPath and &studyOutputPath and are specific to a SAS release.
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml
define.xml
Input
referencexml
crtmap
fileref
&studyRootPath/referencexml
define.map
Output
sourcedata
srcdata
libref
&studyOutputPath/deriveddata
*.*
Output
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_tables.sas7bdat
Output
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_columns.sas7bdat
Output
sourcemetadata
srcmeta
libref
&studyOutputPath/derivedmetadata
source_study.sas7bdat
Output
results
results
libref
&studyOutputPath/results
read_results.sas7bdat

Process Inputs

Process Inputs The metadata type externalxml refers to the define.xml file that is being read. The filename reference crtxml is defined in the SASReferences data set. This filename reference is used in the submitted SAS code when referring to the define.xml file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the define.xml file metadata and content. The filename reference crtmap is defined in the SASReferences data set that is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the crtdds_read processing.

Process Outputs

The sourcedata type is the library where the metadata files are created. These metadata files are the data sets that comprise the CRT-DDS information.
The sourcemetadata type refers to two data sets that are created from the cubeXML file, source_tables, and source_columns. Both data sets are stored in the same library. The source_tables data set contains metadata about each table that is derived from the CRT¬DDS process. The source_columns data set contains similar metadata, but it is at the column level. Both of the data sets are written to the Srcmeta library. The sourcemetadata also refers to a data set source_study. The source_study data set is also created in the Srcmeta library and contains study metadata.
The results type refers to the Results data set that contains information from running the CRT-DDS process. This information is written to the read_results data set in the Results library.

Process Results

When the driver program finishes running, the read_results data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program.
This display shows an example of the contents of a Results data set in the CRT-DDS sample study.
Example of a Partial Results Data Set Created by the create_sascrtdds_fromxml.sas Driver
Example of a partial Results data set that is created by the create_sascrtdds_fromxml.sas driver
The crtdds_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for the SAS representation of CRT-DDS that is derived from the define.xml file. The Srcmeta library corresponds to the location specified in SASReferences (&studyOutputPath/ derivedmetadata).
Example of Partial Source_Tables Data Set Derived during crtdds_read
Example of the partial source_tables data set derived during crtdds_read
Example of Partial Source_Columns Data Set Derived during crtdds_read
Display of the partial source_columns data set derived during crtdds_read
The Srcdata library contains the driver-generated tables that comprise the SAS representation of the CRT-DDS model. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library. The Srcdata library corresponds to the location specified in SASReferences (&studyOutputPath/deriveddata).
Example of Partial Srcdata Library Derived during crtdds_read
Example of partial Srcdata library derived during crtdds_read
When running the driver programs against non-sample data, you must populate the SASReferences data set in the driver program with the proper values. For an explanation of the SASReferences data set, see SASReferences File.