CDISC Define-XML 2.0

Purpose

The CDISC Define-XML 2.0 standard defines the metadata structures in a machine-readable XML format. These metadata structures are used to describe tabulation and analysis data sets and variables for regulatory submissions and any proprietary (non-CDISC) data set structure. The XML schema that is used to define the metadata structures in an XML format is based on an extension to the CDISC Operational Data Model (ODM).

Release Date

CDISC Define-XML Version 2.0 specification, Production Version 2.0.0, March 5, 2013.

Regulatory Basis

(Source: CDISC Define-XML Version 2.0 Specification)
“In the United States, the approval process for regulated human and animal health products requires the submission of data from clinical trials and other studies as expressed in the Code of Federal Regulations (CFR). The FDA established the regulatory basis for wholly electronic submission of data in 1997 with the publication of regulations on the use of electronic records in place of paper records (21 CFR Part 11). In 1999, the FDA standardized the submission of clinical and non-clinical data using the SAS Version 5 XPORT Transport Format and the submission of metadata using Portable Document Format (PDF), respectively. In 2005, the Study Data Specifications published by the FDA included the recommendation that data definitions (metadata) be provided as a Define-XML file. In December 2011, the CDER Common Data Standards Issues Document stated that “a properly functioning define.xml file is an important part of the submission of standardized electronic datasets and should not be considered optional.””

CDISC Define-XML 2.0 Reference Standard

Overview

The domain and column metadata that constitute the SAS representation of the CDISC Define-XML 2.0 standard are derived from the global standards library in these formats:
CDISC Define-XML 2.0 reference_tables
CDISC Define-XML 2.0 reference_tables
CDISC Define-XML 2.0 reference_columns
CDISC Define-XML 2.0 reference_columns
The tablecore column in the reference_tables data set indicates whether the table is a required (Req) or optional (Opt) part of the Define-XML 2.0 metadata according to the XML schema. Tables with tablecore equal to Ext are part of the underlying ODM metadata model, but they should be considered extensions to the Define-XML 2.0 metadata model. The core column in the reference_columns data set indicates whether a column is required (Req) or optional (Opt) in a table when the table is part of the metadata.
As a general rule, the SAS representation of the CDISC Define-XML 2.0 standard is patterned to match the XML element (data set) and attribute (column) structure of define.xml. The SAS representation of the CDISC Define-XML 2.0 metadata model contains fewer tables than the CDISC Define-XML 2.0 metadata model. This reduction was accomplished by combining tables with the same structure.
This display shows an example of combining tables.
CDISC Define-XML 2.0 TranslatedText Table
CDISC Define-XML 2.0 TranslatedText Table
The TranslatedText table contains the contents of the TranslatedText child elements of various parent elements (ItemGroupDefs, ItemDefs, ItemOrigin, CodeLists, CodeListItems, MethodDefs, CommentDefs, and others). Other tables that combine similar table structures into one table are the Aliases table, the DocumentRefs table, and the FormalExpressions table.
The highly structured nature of CDISC Define-XML 2.0 data requires that any mapping to a relational format include a large number of data sets. Foreign key relationships help preserve the intended non-relational object structure. In SAS Clinical Standards Toolkit 1.7, these foreign key relationships will be enforced when validating CDISC Define-XML 2.0 data sets in a way that is similar to the CDISC CRT-DDS 1.0 data sets.
Field lengths in the CDISC Define-XML 2.0 data sets are consistent by core data type. CDISC has not specified a limit to the length of most character fields. Arbitrary lengths have been chosen by data type. Here are the lengths:
CDISC Define-XML 2.0 Default Lengths by Data Type
Type Name
Length
Description
oid
128
A unique object identifier or a reference
text
2000
A character field that can accommodate a large number of characters
name
128
A descriptive identifier
value
512
An item of collected or reference data
path
512
An absolute or relative file system path or URL
Note: CRT-DDS 1.0 and Define-XML 2.0 use the same default lengths
In the table, standard data types are distilled into core data types. Larger lengths have been chosen to ensure that no data loss occurs in the SAS Clinical Standards Toolkit pre-installed data sets. Production tables can be compressed using SAS mechanisms to preserve disk space.

CDISC Define-XML 2.0 SAS Data Set Construction

The SAS Clinical Standards Toolkit CDISC Define-XML 2.0 reference standard supports these actions:
  • reading and representing a define.xml file in SAS
  • building a define.xml file
  • validating the structural integrity of the define.xml file against an XML schema
To support this functionality, supplemental files include these global standards library files:
  • A SAS format catalog (defct.sas7bcat) in the formats folder provides valid values for selected columns in the 46 data sets of the SAS representation.
  • The Messages data set in the messages folder provides unified error messaging for all Define-XML processes.
  • SAS code in the macros folder provides code that is specific to CDISC Define-XML 2.0. This SAS code augments code that is provided in the primary SAS Clinical Standards Toolkit autocall library (!sasroot/cstframework/sasmacro).
  • The style sheet folder contains the define2-0-0.xsl XSL style sheet. The define2-0-0.xsl style sheet is based on the style sheet that was published by CDISC in 2013. It can be found at http://www.cdisc.org/define-xml.
    A define.xml file can be rendered in a human-readable form (such as HTML) with an XSL style sheet.