The %DATASETXML_WRITE
macro creates a Dataset-XML file from a SAS data set or from a library
of SAS data sets.
Here is an example:
libname srcdata "&studyRootPath/data";
filename srcmeta "&studyRootPath/sourcexml/define.xml";
libname xmldata "&studyOutputPath/sourcexml";
%datasetxml_write(
_cstSourceLibrary=srcdata,
_cstOutputLibrary=xmldata
_cstSourceMetadataDefineFileRef=srcmeta,
_cstCheckLengths=Y,
_cstIndent=N,
_cstZip=Y,
_cstDeleteAfterZip=N
);
In this example, the
Dataset-XML files are compressed into ZIP files, with one ZIP file
per Dataset-XML file. But, the Dataset-XML files are not deleted after
compression.
Instead of specifying
inputs (_cstSourceLibrary and _cstSourceMetadataDefineFileRef) and
outputs (_cstOutputLibrary) for the process with the parameters, you
can use the more traditional SASReferences data set. These different
ways of specifying parameters are demonstrated in two sample programs:
create_datasetxml_standalone.sas and create_datasetxml.sas. These
sample programs are located here:
sample study library directory/cdisc-datasetxml-1.0.0–1.7/programs
Note: The create_datasetxml_standalone.sas
sample program does not use a SASReferences data set and writes reports
only in the SAS log file.
The Define-XML file
that describes the SAS data sets must contain metadata about all SAS
data sets and all variables to convert. The Dataset-XML files by themselves
do not have any information about the SAS data sets (name and label)
or the SAS variables (name, label, data type, length, and display
format). When the Dataset-XML file is converted back to SAS data sets,
this information must be provided by the Define-XML file.
Here is an example
of a Dataset-XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
xmlns:data="http://www.cdisc.org/ns/Dataset-XML/v1.0"
ODMVersion="1.3.2" FileType="Snapshot" FileOID="cdisc01.AE"
PriorFileOID="www.cdisc.org.Studycdisc01-Define-XML_2.0.0"
CreationDateTime="2014-06-23T13:18:18"
data:DatasetXMLVersion="1.0.0">
<ClinicalData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.AE" data:ItemGroupDataSeq="1">
...
<ItemData ItemOID="IT.AE.AETERM" Value="AGITATED"/>
Here is an example
of a Define-XML file:
<ODM ... >
<Study OID="cdisc01">
...
<MetaDataVersion OID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2"
Name="Study CDISC01, Data Definitions"
Description="Study CDISC01, Data Definitions"
def:DefineVersion="2.0.0" def:StandardName="SDTM-IG"
def:StandardVersion="3.1.2">
...
<ItemGroupDef OID="IG.AE"
Domain="AE" Name="AE" Repeating="Yes" IsReferenceData="No"
SASDatasetName="AE" Purpose="Tabulation"
def:Structure="One record per adverse event per subject"
def:Class="EVENTS" def:ArchiveLocationID="LF.AE">
...
<ItemRef ItemOID="IT.AE.AETERM" OrderNumber="6" Mandatory="Yes"/>
...
<ItemDef OID="IT.AE.AETERM" Name="AETERM" DataType="text" Length="25"
SASFieldName="AETERM">
A Dataset-XML file must
satisfy these requirements:
-
The ClinicalData attributes StudyOID
and MetaDataVersionOID must be the same value as the corresponding
OID attributes in the define.xml document.
-
The ItemGroupOID value must be
the same value as the corresponding ItemGroup OID attribute in the
define.xml document.
-
All ItemOID attributes in the ItemData
elements must have values identical to the values of the corresponding
ItemOID attributes in the ItemRef elements that are child elements
of the corresponding ItemGroupDef element in the define.xml document.
It would be an error
to try to extract from the Dataset-XML file the SAS data set name
from an ItemGroup object identifier (ItemGroupOID=“IG.AE”).
It would also be an error to try to extract the variable name from
an object identifier (ItemOID=”IT.AE.AETERM”). There
is no requirement concerning the values of the identifiers.
SAS tables and columns
are matched to @SASDatasetName (or, if this value is not specified,
@Name) and @SASFieldName (or, if this value is not specified, @Name).
SASDatasetName and SASFieldName are optional but @Name is required.
So, @Name is always available.
If the ItemGroup or
ItemDef is not found, the XML is generated with this pattern for @ItemGroupOID
and @ItemOID:
ItemGroupOID = ”IG.<table>”
ItemOID = “IT.<table>.<column>”
Although ItemGroupOID
and ItemOID are generated for missing ItemGroups or ItemDefs, it is
important to realize that this can lead to problems later when converting
Dataset-XML files to SAS data sets. For example, when converting a
Dataset-XML file into a SAS data set, ItemGroupOIDs or ItemOIDs that
cannot be matched in the corresponding Define-XML file can lead to
missing SAS data sets or missing SAS data set variables.
Warnings are written
to the SAS log file and the write_results data set in the results
folder.
Here is an example
of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Columns not found in metadata:
ADAE.AEDECOD ADAE.AETERM
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Missing ItemData/@ItemOID for column=AEDECOD
has been set to IT.ADAE.AEDECOD
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Missing ItemData/@ItemOID for column=AETERM
has been set to IT.ADAE.AETERM
The @IsReferenceData
attribute in the Define-XML file determines whether the data set is
considered ReferenceData or ClinicalData. Here is an example:
<ReferenceData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.TE" data:ItemGroupDataSeq="1">
<ItemData ItemOID="IT.STUDYID" Value="CDISC01"/>
<ItemData ItemOID="IT.TE.DOMAIN" Value="TE"/>
<ItemData ItemOID="IT.TE.ETCD" Value="EOS"/>
<ItemData ItemOID="IT.TE.ELEMENT" Value="End of Study"/>
<ItemData ItemOID="IT.TE.TESTRL" Value="Study Termination"/>
<ItemData ItemOID="IT.TE.TEDUR" Value="P1D"/>
</ItemGroupData>
<ClinicalData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.AE" data:ItemGroupDataSeq="1">
<ItemData ItemOID="IT.STUDYID" Value="CDISC01"/>
<ItemData ItemOID="IT.AE.DOMAIN" Value="AE"/>
<ItemData ItemOID="IT.USUBJID" Value="CDISC01.100008"/>
<ItemData ItemOID="IT.AE.AESEQ" Value="1"/>
<ItemData ItemOID="IT.AE.AESPID" Value="1"/>
<ItemData ItemOID="IT.AE.AETERM" Value="AGITATED"/>
The _cstCheckLengths
macro parameter enables the %DATASETXML_WRITE macro to determine whether
the lengths defined in the metadata are long enough for character
data. This check is important to avoid data truncation problems when
importing the Dataset-XML files into SAS data set with the %DATASETXML_READ
macro. Warnings are written to the SAS log file and the write_results
data set in the results folder.
Here is an example
of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=24 value=HEARTBURN-LIKE DYSPEPSIA
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=25 value=ACID REFLUX (OESOPHAGEAL)
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AEDECOD Length=20 _valueLength=32 value=Gastrooesophageal reflux disease
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=25 value=ACID REFLUX (OESOPHAGEAL)
The %DATASETXML_WRITE
macro also checks that numeric variables in ADaM data sets that represent
date and time information have a DisplayFormat defined in the Define-XML
file.