Importing a CDISC ODM XML Document Using a Language Identifier

Overview

This example imports clinical trials data from a CDISC ODM XML document by specifying a language identifier with the LANGUAGE= option in the PROC CDISC statement. By specifying the LANGUAGE= option, PROC CDISC locates the matching language identifier in the ODM TranslatedText element. It creates a SAS format by using the TranslatedText items with a matching language tag attribute (xml:lang). The created SAS format is then applied to the data that is imported from the XML document.
This example imports the following XML document:
<?xml version="1.0" encoding="windows-1252" ?>
<!--
      Clinical Data Interchange Standards Consortium (CDISC)
      Operational Data Model (ODM) for clinical data interchange

      You can learn more about CDISC standards efforts at 
      http://www.cdisc.org/standards/index.html
  -->

<ODM xmlns="http://www.cdisc.org/ns/odm/v1.2"
     xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.cdisc.org/ns/odm/v1.2 ODM1-2-0.xsd"

     ODMVersion="1.2"
     FileOID="000-00-0000"
     FileType="Snapshot"
     Description="testing codelist stuff"

     AsOfDateTime="2006-11-03T09:47:53"
     CreationDateTime="2006-11-03T09:47:53"
     SourceSystem="SAS"
     SourceSystemVersion="GENERIC"
     >

   <Study OID="AStudyOID">

      <!--
            GlobalVariables is a REQUIRED section in ODM markup
        -->
      <GlobalVariables>
         <StudyName>CODELIST</StudyName>
         <StudyDescription>Checking Codelists</StudyDescription>
         <ProtocolName>Protocol</ProtocolName>
      </GlobalVariables>

      <BasicDefinitions />

      <!--
            Internal ODM markup required metadata
        -->
      <MetaDataVersion OID="MDV_CODELIST" Name="MDV Codelist">
         <Protocol>
            <StudyEventRef StudyEventOID="StudyEventOID" OrderNumber="1" 
                   Mandatory="Yes" />
         </Protocol>

         <StudyEventDef OID="StudyEventOID" Name="Study Event Definition" 
                  Repeating="Yes" Type="Common">
            <FormRef FormOID="X" OrderNumber="1" Mandatory="No" />
         </StudyEventDef>

         <FormDef OID="X" Name="Form Definition" Repeating="Yes">
            <ItemGroupRef ItemGroupOID="X" Mandatory="No" />
         </FormDef>


         <!--
               Columns defined in the table
           -->
         <ItemGroupDef OID="X" Repeating="Yes"
                       SASDatasetName="X"
                       Name="ODM Examples"
                       Comment="Examples of ODM Datatypes">
            <ItemRef ItemOID="ID.x" OrderNumber="1" Mandatory="No" />
         </ItemGroupDef>


         <!--
               Column attributes as defined in the table
           -->
         <ItemDef OID="ID.x" SASFieldName="x" Name="x" DataType="float" Length="12" 
                  SignificantDigits="2" Comment="x">
            <CodeListRef CodeListOID="CL.NUMBERS" />
         </ItemDef>


         <!--
               Translation to ODM markup for any PROC FORMAT style
               user defined or SAS internal formatting specifications
               applied to columns in the table
           -->
         <CodeList OID="CL.NUMBERS" SASFormatName="NUMBERS" Name="NUMBERS" 
                    DataType="float">
            <CodeListItem CodedValue="1">
               <Decode>
                  <TranslatedText xml:lang="de">einz</TranslatedText>
                  <TranslatedText xml:lang="en">one</TranslatedText> 
                  <TranslatedText xml:lang="es">uno</TranslatedText>
               </Decode>
            </CodeListItem>
            <CodeListItem CodedValue="2">
               <Decode>
                  <TranslatedText xml:lang="de">zwei</TranslatedText>
                  <TranslatedText xml:lang="en">two</TranslatedText> 
                  <TranslatedText xml:lang="es">dos</TranslatedText>
               </Decode>
            </CodeListItem>
            <CodeListItem CodedValue="3">
               <Decode>
                  <TranslatedText xml:lang="de">drei</TranslatedText>
                  <TranslatedText xml:lang="en">three</TranslatedText>    
                  <TranslatedText xml:lang="es">tres</TranslatedText>
               </Decode>
            </CodeListItem>
         </CodeList>
      </MetaDataVersion>
   </Study>


   <!--
         Administrative metadata
     -->
   <AdminData />


   <!--
         Actual data content begins here
         This section represents each data record in the table
     -->
   <ClinicalData StudyOID="AStudyOID" MetaDataVersionOID="MDV_CODELIST">
      <SubjectData SubjectKey="001">
         <StudyEventData StudyEventOID="StudyEventOID" StudyEventRepeatKey="1">
            <FormData FormOID="X" FormRepeatKey="1">
               <ItemGroupData ItemGroupOID="X" ItemGroupRepeatKey="1">
                  <ItemData ItemOID="ID.x" Value="3" />
               </ItemGroupData>
            </FormData>
         </StudyEventData>
      </SubjectData>
   </ClinicalData>
</ODM>

Program

The following SAS program imports the XML document as a SAS data set:
  1. The LIBNAME statement assigns the libref RESULTS to the physical location of the output SAS data set.
  2. The FILENAME statement assigns the fileref XMLINP to the physical location of the input XML document (complete pathname, filename, and file extension) to be imported.
  3. The PROC CDISC statement specifies the following:
    • CDISC ODM as the model.
    • Fileref XMLINP, which references the physical location of the input XML document to be imported.
    • FORMATACTIVE=YES to convert CDISC ODM CodeList content in the XML document to SAS formats.
    • FORMATNOREPLACE=NO to replace existing SAS formats in the FORMAT catalog that have the same name as the converted formats.
    • LANGUAGE="DE" to specify a language identifier with a two-letter language code. PROC CDISC locates the DE language identifier in the ODM TranslatedText element and creates a SAS format by using the TranslatedText items with a matching language tag attribute. The created SAS format is then applied to the data that is imported from the XML document.
  4. ODMMINIMUMKEYSET=NO in the ODM statement specifies that all KeySet members are written to the output SAS data set. This is the default setting for ODMMINIMUMKEYSET=.
  5. The CLINICALDATA statement identifies the output SAS data set, which is RESULTS.NUMBERS, and specifies the CDISC ODM ItemGroupDef attribute that indicates where the data content in the XML document begins, which is X.
  6. The CONTENTS procedure lists the contents of the output SAS data set.
  7. The PRINT procedure prints the rows in the output SAS data set. The VAR statement selects just the X variable.
libname results 'C:\My Documents\'; 1 

filename xmlinp 'C:\XML\numbers.xml'; 2 

proc cdisc model=odm 3
                  read=xmlinp 
                  formatactive=yes
                  formatnoreplace=no
                  language="de";
   odm odmversion="1.2"  odmminimumkeyset=no; 4      
   clinicaldata out=results.numbers sasdatasetname="X"; 5 
run;

filename xmlinp clear;

proc contents data=results.numbers;  6
run;

proc print data=results.numbers;  7
   var x; 
run;

libname results clear;

Output

The output from PROC CONTENTS displays the attributes of each interpreted variable, which includes the SAS variable X and all KeySet members.
PROC CONTENTS Output for RESULTS.NUMBERS
PROC CONTENTS output for RESULTS.NUMBERS
The output from PROC PRINT lists the value for the imported SAS variable X. The procedure applies the SAS format NUMBERS, which is created by using the TranslatedText item with the matching language tag attribute DE. It applies NUMBERS to the data that is imported from the XML document, which is 3. The result is the German word drei.
PROC PRINT Output for Variable X
PROC PRINT output for variable X with applied SAS format