Reading XML Files

Overview of Basic Workflow

The following is the basic workflow for reading XML files:
  1. Determine the existence of a valid XML file.
  2. Use valid XSL style sheets for each target data set (such as ItemDefs.xsl).
  3. Use the SAS DATA step component JavaObj to create a standardized intermediate cubeXML file using the XSL style sheets.
  4. Read the standardized cubeXML file using the SAS XML LIBNAME engine and XMLMAP processing.
This basic workflow is used by all XML-based standards that are supported by the SAS Clinical Standards Toolkit.

Reading CDISC ODM XML Files: odm_read Macro

The current SAS Clinical Standards Toolkit release supports the reading of portions of an odm.xml file. It supports the translation of only the metadata (<Study>) and clinical data (<ClinicalData>) sections of the file into a SAS representation of the file content.
In order to read an odm.xml file, a specialized macro named odm_read is available in the ODM 1.3.0 standards macro folder. (For SAS 9.2, this folder is located at <global standards library directory>/standards/cdisc-odm-1.3.0-1.3/macros.) This macro is referenced from the create_sasodm_fromxml.sas driver program (described more fully below). There are no input parameters in the call to the odm_read macro. File references and other metadata that are required by the macro are set as global macro variable values. Currently, those global macro variable values are set through the framework initialization properties and the CDISC ODM 1.3.0 initialization properties. Throughout the processing of the odm_read macro, the Results data set contains all framework and ODM 1.3.0 specific messages generated during run time.
Based on file references from the SASReferences data set, odm_read accesses the odm.xml file.
The following is a partial listing of the sample odm.xml file.
<?xml version="1.0" encoding="ISO-8859-1"?>
<ODM 
  xmlns="http://www.cdisc.org/ns/odm/v1.3"
  FileOID="Study1234" 
  ODMVersion="1.3" 
  FileType="Snapshot" 
  CreationDateTime="2004-07-28T12:34:13-06:00" 
  SourceSystem="ss00"
  AsOfDateTime="2004-07-29T12:34:13-06:00"
  Granularity="SingleSite"
  Description="Study to determine existence of ischemic stroke"
  Archival="Yes"
  PriorFileOID="Study-4321"
  Originator="SAS Institute"
  SourceSystemVersion="Version 0.0.0"
  Id="DSSignature123">
<Study OID="1234">
<GlobalVariables>
    <StudyName>1234</StudyName>
    <StudyDescription>1234 Data Definition</StudyDescription>
    <ProtocolName>1234</ProtocolName>
</GlobalVariables>
<BasicDefinitions>
    <MeasurementUnit Name="My Unit" OID="MU_0001">
       <Symbol>
         <TranslatedText xml:lang="enus">Hello there text</TranslatedText>
       </Symbol>
    </MeasurementUnit>
    <MeasurementUnit Name="My Other Unit" OID="MU_0002">
       <Symbol>
          <TranslatedText xml:lang="jpn">Bye there text</TranslatedText>
       </Symbol>
    </MeasurementUnit>
</BasicDefinitions>
<MetaDataVersion OID="CDISC.SDTM.3.1.0"
    Name="Study 1234, Data Definitions"
    Description="Study 1234, Data Definitions">
    <Include StudyOID="1234" 
      MetaDataVersionOID="MDV000">
    </Include>
    <Protocol>
      <Description>
After the odm_read macro confirms that the odm.xml file exists, a call is made to the SAS DATA step component JavaObj. In SAS 9.1.3, you get a warning in the log that states that JavaObj is experimental. JavaObj processing converts the odm.xml file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml, where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample ODM 1.3.0 study folder hierarchy under /referencexml as odm.map. The odm.map file is required to process the cubeXML file. If it does not exist, then the odm_read macro attempts to create one using the ODM reference metadata.
The following is a partial listing of the odm.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP version="1.2">

<TABLE name="Annotations">
   <TABLE-PATH syntax="XPath">/LIBRARY/Annotations</TABLE-PATH>
   <TABLE-DESCRIPTION>Annotations associated with data</TABLE-DESCRIPTION>

   <COLUMN name="ID">
     <PATH syntax="Xpath">/LIBRARY/Annotations/ID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Unique ID for a specific Annotation element</DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="SeqNum">
     <PATH syntax="Xpath">/LIBRARY/Annotations/SeqNum</PATH>
     <TYPE>numeric</TYPE>
     <DATATYPE>numeric</DATATYPE>
     <DESCRIPTION>Uniquely identifies the annotation within its parent 
             entity</DESCRIPTION>
     <LENGTH>8</LENGTH>
   </COLUMN>
   <COLUMN name="Comment">
     <PATH syntax="Xpath">/LIBRARY/Annotations/Comment</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Free-text (uninterpreted) comment about clinical data</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="SponsorOrSite">
     <PATH syntax="Xpath">/LIBRARY/Annotations/SponsorOrSite</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Comment source (Sponsor | Site)</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="FlagType">
     <PATH syntax="Xpath">/LIBRARY/Annotations/FlagType</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION>Type of flag</DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="FlagValue">
When the cubeXML is processed, the data sets (such as ItemDefs) that are included in the SAS representation of the CDISC ODM model are derived. The final step for the odm_read macro is the derivation of table and column metadata that describe the data sets in the SAS representation of the odm.xml file. At this point, the odm_read macro is ready to create the source_tables and source_columns data sets. The tables in the source_tables data sets are created and copied to the output library as defined in the SASReferences data set.

Sample Driver Program: create_sasodm_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC ODM XML files, is guided by a sample driver module that is provided by SAS. For reading ODM XML files, this module is create_sasodm_fromxml.sas.
For SAS 9.1.3, this driver program is located at:
!sasroot/../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0/programs/create_sasodm_fromxml.sas
For SAS 9.2, the driver program is located at:
!sasroot/../../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0/programs/create_sasodm_fromxml.sas
The value for !sasroot is the location of your SAS installation directory.

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed (such as the odm.xml file), the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and four output references that are key to the successful completion of the driver program. The following table lists these files and data sets, and they are discussed in separate sections. In the sample create_sasodm_fromxml.sas driver module, the following values are set for &studyRootPath and &studyOutputPath and are specific to a SAS release.
SAS 9.1.3
&studyRootPath=!sasroot/../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0
&studyOutputPath=!sasroot/../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0
SAS 9.2
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitODM130/1.3/sample/cdisc-odm-1.3.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
odmxml
fileref
&studyRootPath/sourcexml
odm.xml
Input
referencexml
odmmap
fileref
&studyRootPath/referencexml
odm.map
Output
sourcedata
srcdata
LIBNAME
&studyOutputPath/data
*.*
Output
sourcemetadata
srcmeta
LIBNAME
&studyOutputPath/metadata
Source_tables.sas7bdat
Output
sourcemetadata
srcmeta
LIBNAME
&studyOutputPath/metadata
Source_columns.sas7bdat
Output
results
results
LIBNAME
&studyOutputPath/results
Read_results.sas7bdat

Process Inputs

The metadata type externalxml refers to the odm.xml file that is being read. The filename odmxml is defined in the SASReferences data set. This filename is used in the submitted SAS code when referring to the ODM file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the ODM file metadata and content. The filename odmmap is defined in the SASReferences data set. This filename is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the odm_read processing.

Process Outputs

When the driver program finishes running, the read_results.sas7bdat data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program. The following display shows an example of the contents of a Results data set that was built while reading the sample odm.xml file that was provided by SAS.
Example of a Partial Results Data Set Created by the create_sasodm_fromxml.sas Driver
Display of a partial Results data set that was created by the create_sasodm_fromxml.sas driver
The odm_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for each of the SAS data sets that is derived from the odm.xml file.
Example of Partial Source_Tables Data Set Derived During odm_read
Display of the partial source_tables data set that was derived during the odm_read process
Example of Partial Source_Columns Data Set Derived During odm_read
Display of the Partial source_columns data set that was derived during the odm_read process
The Srcdata library contains the SAS data sets that represent the ODM file metadata and content. By default, odm_read creates 52 unique data sets in SAS Clinical Standards Toolkit 1.3. Some of these data sets might be empty if no associated content was derived from the odm.xml file. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library.
Example of Partial Srcdata Library Derived During odm_read
Partial Srcdata library that was derived during the odm_read process

Reading CDISC CRT-DDS define.xml Files: crtdds_read Macro

The process for reading CDISC CRT-DDS define.xml files is similar to reading CDISC ODM XML files. SAS Clinical Standards Toolkit 1.3 supports reading a define.xml file and translating the file metadata into a SAS representation of the CDISC CRT-DDS model. To read the define.xml file, a specialized macro named crtdds_read.sas is available in the CRT-DDS 1.0 standards macro folder, located at <global standards library directory>/standards/cdisc-crtdds-1.0-1.3/macros. This macro is referenced from the create_sascrtdds_fromxml.sas driver program. There are no input parameters in the call to the crtdds_read macro. File references and other metadata that are required by the macro are set as global macro variables. Currently, their values are set through the framework initialization properties and the CDISC CRT-DDS 1.0 initialization properties processes. Throughout processing of the crtdds_read macro, the Results data set contains all framework and CRT-DDS 1.0 specific messages generated during run time.
Based on file references retrieved from the SASReferences data set, crtdds_read accesses the define.xml file.
The following is a partial listing of a define.xml file.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="define1-0-0.xsl"?>

<!--Produced from SAS data using the SAS Clinical Toolkit.-->
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.2" 
xmlns:def="http://www.cdisc.org/ns/def/v1.0" 
xmlns:xlink="http://www.w3.org/1999/xlink" FileOID="1" CreationDateTime=
"2010-10-07T11:41:05-04:00" AsOfDateTime="2010-08-05T09:35:59" 
Description="define1" FileType="Snapshot" Id="define1" ODMVersion="1.0" 
Originator="SAS Institute">
   <Study OID="1">
        <GlobalVariables>
            <StudyName>study1</StudyName>
            <StudyDescription>first study</StudyDescription>
            <ProtocolName>Protocol abc</ProtocolName>
        </GlobalVariables>
        <MetaDataVersion OID="1" Name="CDISC-SDTM 3.1.2" 
Description="CDISC-SDTM 3.1.2" def:DefineVersion="1.2" 
def:StandardName="CDISC-SDTM" def:StandardVersion="3.1.2">
            <ItemGroupDef OID="AE1" Name="AE" Repeating="Yes" IsReferenceData="No" 
SASDatasetName="AE" Domain="AE" Purpose="Tabulation" def:Label="Adverse Events" 
def:Class="Events" def:Structure="One record per adverse event per subject" 
def:DomainKeys="STUDYID USUBJID AEDECOD AESTDTC" def:ArchiveLocationID="AE1">
<ItemRef ItemOID="COL1" Mandatory="Yes" OrderNumber="1" 
         KeySequence="1" Role="Identifier"/>
<ItemRef ItemOID="COL2" Mandatory="Yes" OrderNumber="2" 
         Role="Identifier"/>
<ItemRef ItemOID="COL3" Mandatory="Yes" OrderNumber="3" 
         KeySequence="2" Role="Identifier"/>
<ItemRef ItemOID="COL4" Mandatory="Yes" OrderNumber="4" 
         Role="Identifier"/>
<ItemRef ItemOID="COL5" Mandatory="No" OrderNumber="5" 
         Role="Identifier"/>
<ItemRef ItemOID="COL6" Mandatory="No" OrderNumber="6" 
         Role="Identifier"/>
<ItemRef ItemOID="COL7" Mandatory="No" OrderNumber="7" 
         Role="Identifier"/>
<ItemRef ItemOID="COL8" Mandatory="Yes" OrderNumber="8" 
         Role="Topic"/>
<ItemRef ItemOID="COL9" Mandatory="No" OrderNumber="9" 
         Role="SynonymQualifier"/>
<ItemRef ItemOID="COL10" Mandatory="Yes" OrderNumber="10" 
         KeySequence="3" Role="SynonymQualifier"/>
<ItemRef ItemOID="COL11" Mandatory="No" OrderNumber="11" 
         Role="GroupingQualifier"/>
<ItemRef ItemOID="COL12" Mandatory="No" OrderNumber="12" 
         Role="GroupingQualifier"/>
<ItemRef ItemOID="COL13" Mandatory="No" OrderNumber="13" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL14" Mandatory="No" OrderNumber="14" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL15" Mandatory="No" OrderNumber="15" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL16" Mandatory="No" OrderNumber="16" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL17" Mandatory="No" OrderNumber="17" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL18" Mandatory="No" OrderNumber="18" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL19" Mandatory="No" OrderNumber="19" 
         Role="RecordQualifier"/>
<ItemRef ItemOID="COL20" Mandatory="No" OrderNumber="20" 
         Role="RecordQualifier"/>
After the crtdds_read macro confirms that the define.xml file exists, a call is made to the SAS data step component JavaObj. In SAS 9.1.3, you get a warning in the log that states that JavaObj is experimental. The JavaObj processing converts the define.xml file into the cubeXML file through transformations using XSL files and processes. The cubeXML file is created in the Work library. The name of the cubeXML file is _cubnnnn.xml , where nnnn is a randomly generated number. The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default XMLMAP file is stored in the sample CRT-DDS 1.0 study folder hierarchy under /referencexml as define.map. The define.map file must exist to process the cubeXML file. If it does not exist, then the crtdds_read attempts to create one using the CRT-DDS reference metadata.
The following is a partial listing of the define.map file.
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP version="1.2">

<TABLE name="AnnotatedCRFs">
   <TABLE-PATH syntax="XPath">/LIBRARY/AnnotatedCRFs</TABLE-PATH>
   <TABLE-DESCRIPTION></TABLE-DESCRIPTION>

   <COLUMN name="DocumentRef">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/DocumentRef</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION></DESCRIPTION>
     <LENGTH>2000</LENGTH>
   </COLUMN>
   <COLUMN name="leafID">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/leafID</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION></DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>
   <COLUMN name="FK_MetaDataVersion">
     <PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/FK_MetaDataVersion</PATH>
     <TYPE>character</TYPE>
     <DATATYPE>character</DATATYPE>
     <DESCRIPTION></DESCRIPTION>
     <LENGTH>128</LENGTH>
   </COLUMN>

</TABLE>
Processing of the cubeXML file results in the derivation of the data sets (such as ItemDefs) currently included in the SAS representation of the CDISC CRT-DDS model.
The final step in crtdds_read processing is the derivation of table and column metadata that describe the data sets in the SAS representation of the define.xml file. At this point, the crtdds_read macro is ready to create the source_tables and source_columns data sets. The tables in the source_tables data sets are created and copied to the output library as defined in the SASReferences data set.

Sample Driver Program: create_sascrtdds_fromxml.sas

Overview

Each primary SAS Clinical Standards Toolkit task, such as reading CDISC CRT-DDS XML files, is guided by a sample driver program that is provided by SAS. The create_sascrtdds_fromxml.sas driver program is used to read define.xml files.
For SAS 9.1.3, the driver program is located at:
!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/programs/create_sascrtdds_fromxml.sas
For SAS 9.2, the driver program is located at:
!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/programs/create_sascrtdds_fromxml.sas
The value for !sasroot is the location of your SAS installation directory.

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are two input file references and four output references that are key to successful completion of the driver program. The following table lists these files and data sets, and they are discussed in separate sections. In the sample create_sascrtdds_fromxml.sas driver program, the following values are set for &studyRootPath and &studyOutputPath and are specific to a SAS release.
SAS 9.1.3
&studyRootPath=!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
SAS 9.2
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
externalxml
crtxml
fileref
&studyRootPath/sourcexml
define.xml
Input
referencexml
crtmap
fileref
&studyRootPath/referencexml
define.map
Output
sourcedata
srcdata
LIBNAME
&studyOutputPath/data
*.*
Output
sourcemetadata
srcmeta
LIBNAME
&studyOutputPath/metadata
Source_tables.sas7bdat
Output
sourcemetadata
srcmeta
LIBNAME
&studyOutputPath/metadata
Source_columns.sas7bdat
Output
results
results
LIBNAME
&studyOutputPath/results
Read_results.sas7bdat

Process Inputs

The metadata type externalxml refers to the define.xml file that is being read. The filename crtxml is defined in the SASReferences data set. This filename is used in the submitted SAS code when referring to the define.xml file.
The metadata type referencexml refers to the SAS map file that is used to generate the SAS data sets that represent the define.xml file metadata and content. The filename crtmap is defined in the SASReferences data set that is used in the submitted SAS code when referring to the SAS map file. If a path and filename for the map file is not specified, then a temporary map file is created as part of the crtdds_read processing.

Process Outputs

The sourcedata type is the library where the metadata files are created. These metadata files are the data sets that comprise the CRT-DDS information. In the SAS Clinical Standards Toolkit sample study, these data sets are written to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/deriveddata directory. This location is represented in the driver program by the Srcdata library name.
The sourcemetadata type refers to two data sets that are created from the cubeXML file, source_tables and source_columns. Both data sets are stored in the same library. The source_tables data set contains metadata about each table that is derived from the CRT-DDS process. The source_columns data set contains similar metadata, but it is at the column level. In the SAS Clinical Standards Toolkit sample study, this metadata is written to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/derivedmetadata directory. This location is represented in the driver program by the Srcmeta library name.
The results type refers to the Results data set that contains information from running the CRT-DDS process. In the SAS Clinical Standards Toolkit sample study, this information is written to the !sasroot/../../SASClinicalStandardsToolkit CRTDDS10/1.3/sample/cdisc-crtdds-1.0/results directory. This location is represented in the driver program by the Results library name.

Process Results

When the driver program finishes running, the read_results.sas7bdat data set is created in the Results library. This data set contains informational, warning, and any error messages that were generated by the submitted driver program. The following display shows an example of the contents of a Results data set in the CRT-DDS sample study.
Example of a Partial Results Data Set Created by the create_sascrtdds_fromxml.sas Driver
Display of a partial Results data set that is created by the create_sascrtdds_fromxml.sas driver
The crtdds_read macro creates the source_tables and source_columns data sets in the Srcmeta library. These data sets contain the table and column metadata for the SAS representation of CRT-DDS that is derived from the define.xml file. The Srcmeta library corresponds to the location specified in SASReferences (&studyOutputPath/derivedmetadata).
Example of Partial Source_Tables Data Set Derived During crtdds_read
Display of the partial source_tables data set derived during crtdds_read
Example of Partial Source_Columns Data Set Derived During crtdds_read
Display of the partial source_columns data set derived during crtdds_read
The Srcdata library contains the driver-generated tables that comprise the SAS representation of the CRT-DDS model. There is a one-to-one correspondence between the tables listed in the Srcdata library and the tables contained in the source_tables metadata file in the Srcmeta library. The Srcdata library corresponds to the location specified in SASReferences ( &studyOutputPath/deriveddata).
Example of Partial Srcdata Library Derived During crtdds_read
Partial Srcdata library derived during crtdds_read
When running the driver programs against non-sample data, you need to populate the SASReferences data set in the driver program with the proper values. For an explanation of the SASReferences data set, see SASReferences File.