Validation of XML-Based Standards

XML Validation

When validating XML-based standards (such as CDISC ODM and CDISC CRT-DDS), the SAS Clinical Standards Toolkit offers two complementary methodologies.
The first methodology is described in Validation. It relies on the definition of a master set of validation checks that are specific to the table and column metadata that define a set of data, and checks that are specific to the data itself. This method uses SAS files and SAS code to validate the SAS representation of the XML-based standard. Example checks include the assessment of foreign key relationships across data sets and value conformance to a set of expected values.
The second methodology involves verification that an XML file is valid structurally and syntactically according to the XML schema for that standard.
The SAS Clinical Standards Toolkit provides both methodologies to support the validation of CDISC CRT-DDS 1.0 and CDISC ODM 1.3.0 files.

Validating CDISC CRT-DDS 1.0 Files

The crtdds_xmlvalidate Macro

The crtdds_xmlvalidate macro validates the structure and syntax of the define.xml file against the XML schema for the CRT-DDS standard. It can be run at any time. The SAS Clinical Standards Toolkit includes a call to the crtdds_xmlvalidate macro immediately following the call to the crtdds_write macro as the last step of the create_crtdds_define.sas sample driver program. If you customize the define.xml file after it is generated, then this macro can be used to validate the changes. The SAS Clinical Standards Toolkit also includes a call to the crtdds_xmlvalidate macro immediately before the call to the crtdds_read macro in the create_crtdds_fromxml.sas sample driver program.
Here is an example of a call to the crtdds_xmlvalidate.sas macro:
%crtdds_xmlvalidate(_cstLogLevel=info,_cstResultsOverrideDS=work.xmlvalidate);
In this example, the %crtdds_xmlvalidate macro is being submitted with a log level of Info. The Results data set is named XMLVALIDATE and resides in the Work library.
Parameters for the crtdds_xmlvalidate.sas Macro
Parameter
Required
Description
_cstLogLevel
Yes
Identifies the log level. Valid values are Info, Warning, Error, and Fatal Error. The default value is Info.
_cstResultsOverrideDS
Yes
Designates [LIBNAME.]member as the name of the Results data set. If this parameter is omitted (default setting), then the Results data set specified by the &_cstResultsDS global macro variable is used.
XML schema validation results are logged using four log level settings. These log levels refer to the XML-generated log, not the log that is generated by SAS.
Log Levels for the crtdds_xmlvalidate.sas Macro
Log Level
Description
Info
Messages such as the system properties of the current Java environment and progress messages. This is the default value.
Warning
Messages that indicate that there might be an issue with the CRT-DDS document or with the execution of the validation process.
Error
Messages that indicate that something in the define.xml document is invalid with respect to the normal XML schema for CRT-DDS. Or, a non-fatal error has occurred during processing.
Fatal Error
Messages that indicate that the XML document could not be processed at all. There are many causes, including file system access errors, incorrect file paths, and malformed XML.
Each message that is generated during XML validation is associated with one of these levels. The level that you choose determines what other messages are generated. For example, if you choose the Warning level, then all Warning messages and anything more severe, such as Error and Fatal error messages, are generated. If you choose the Error level, then only Error and Fatal Error messages are generated.

Validation of the SAS Representation: crtdds_validate Macro

The crtdds_validate macro supports the first XML validation methodology outlined above. This method is based on SAS and validates the SAS representation of the XML-based standard.
In the SAS Clinical Standards Toolkit, CDISC CRT-DDS validation uses the same types of metadata and the same workflow process that is common to validation of all data standards. SAS provides a set of validation checks for CDISC CRT-DDS that are designed to verify the metadata definitions and values of the 39 data sets that comprise the SAS representation of the CRT-DDS model. These checks were created by SAS. For more information about these checks, see Validation and Validation Checks. Metadata about each check is provided in the Validation Master data set in <global standards library directory>/standards/cdisc-crtdds-1.0-1.4/validation/control.
The crtdds_validate macro controls the validation workflow for CRT-DDS. As each check is processed from the run-time validation check data set, the check determines the source of the table and column metadata to use. The reference_tables and reference_columns data sets contain the metadata for the 39 data sets that comprise the SAS representation for CDISC CRT-DDS. Unless you make customizations or run-time modifications, the source metadata source_tables and source_columns data sets contain the same content as the reference metadata reference_tables and reference_columns data sets.
If all 39 CRT-DDS tables contribute information to the define.xml file, then the validation process can run directly against the reference tables and columns data sets. In this case, the Use source data flag in the validation check data set needs to be set to N. However, you will probably run validation against a subset of the 39 tables. In this case, a source_tables data set that contains the subset needs to be created from the reference_tables data set. And, a corresponding source_columns data set needs to be created from the reference_columns data set. The run-time validation check data set can contain all of the checks, and Use source data can be left set to Y, which is the default value.
There are no parameters for the crtdds_validate macro.

Sample Driver Program: validate_crtdds_data.sas

The validate_crtdds_data.sas driver program sets up the required environment variables and library references before a call is made to the crtdds_validate macro.
The driver program is located in:
!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are four input file references, one input library reference and, and one output file reference that are key to successful completion of the validation process. Key Components of the SASReferences Data Set lists these libraries and data sets, and they are discussed in separate sections. In the sample validate_crtdds_data.sas driver program, these values are set for &studyRootPath and &studyOutputPath.
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver. Two macro variables have been retained to maintain consistency across the SAS Clinical Standards Toolkit driver programs.
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/&_cstVersion/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
cntl_s
libref
&workpath
sasreferences.sas7bdat
Input
control
cntl_v
libref
&studyRootPath/control
validation_control.sas7bdat
Input
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_tables.sas7bdat
Input
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_columns.sas7bdat
Input
sourcedata
srcdata
libref
&studyRootPath/data
Output
results
results
libref
&studyOutputPath/results
validation_results.sas7bdat

Process Inputs

The use of the cntl_s LIBNAME that points to the &workpath path illustrates a technique of documenting the derivation of the SASreferences data set in the SAS Work library. The driver program initiates the macro variable &workPath with this statement:
%let workPath=%sysfunc(pathname(work));
In this case, the cntl_s LIBNAME points to the same directory as the Work LIBNAME. The second control record points to the validation_control (run-time validation check) data set, and is accessed by the cntl_v LIBNAME statement. This LIBNAME is assigned to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/control directory.
The sourcemetadata type references two metadata data sets that describe the table (source_tables) and column (source_columns) metadata for the 39 data sets that comprise the SAS representation of the CRT-DDS model. Both data sets are stored in the same library. In the SAS Clinical Standards Toolkit, this source metadata is read from the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/metadata directory. This location is represented in the driver program using the Srcmeta library name.
The sourcedata type is the library where the 39 data sets that comprise the SAS representation of the CRT-DDS model are stored. These are the data sets that are being validated. In the SAS Clinical Standards Toolkit, this library is read from the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.4/sample/cdisc-crtdds-1.0/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

For the SAS Clinical Standards Toolkit validation processes, the only process outputs that are generated are the Validation Results and Validation Metrics data sets. These data sets are described in the following section.

Process Results

When the validate_crtdds_data.sas driver program finishes running, the validation_results data set is created in the Results library. The Results data set contains informational, warning, and error messages that were generated by the validation program. Reporting of validation process metrics is supported, though it is not implemented for CDISC CRT-DDS validation.
Example of a CDISC CRT-DDS Results Data Set
Example of the Results data set

Validating CDISC ODM 1.3.0 Files

XML Schema Validation

When an ODM XML is created using the create_odmxml driver (and the odm_write macro), the structure and syntax of the XML file are validated against the XML schema for the ODM standard. The results of this validation are written to the Results data set. Here is a sample of the validation results.
Example of Schema Validation Reported in a CDISC ODM Results Data Set
Example of schema validation reported in a CDISC ODM Results data set
XML schema validation results are logged using four log level settings. These log levels refer to the XML-generated log, not the log that is generated by SAS.
Log Levels for Schema Validation
Log Level
Description
Info
Messages such as the system properties of the current Java environment and progress messages. This is the default value.
Warning
Messages that indicate that there might be an issue with the ODM document or with the execution of the validation process.
Error
Messages that indicate that something in the ODM document is invalid with respect to the normal XML schema for ODM. Or, a non-fatal error has occurred during processing.
Fatal Error
Messages that indicate that the XML document could not be processed at all. There are many causes, including, file system access errors, incorrect file paths, and malformed XML.
Each message that is generated during XML validation is associated with one of these levels. The level specified determines what other messages are generated. For example, if the Warning level is specified, then all Warning messages and anything more severe, such as Error and Fatal Error messages, are generated. In the SAS Clinical Standards Toolkit, the Log Level is set to Info by default when using the create_odmxml driver (and the odm_write macro).
It is also possible to use the odm_xmlvalidate macro to validate the structure and syntax of an ODM XML file against the XML schema for the ODM standard. It can be run at any time. The SAS Clinical Standards Toolkit includes a call to the odm_xmlvalidate macro immediately following the call to the odm_write macro as the last step of the create_odmxml.sas sample driver program. If you customize the ODM XML file after it is generated, then this macro can be used to validate the changes. The SAS Clinical Standards Toolkit also includes a call to the odm_xmlvalidate macro immediately before the call to the odm_read macro in the create_sasodm_fromxml.sas sample driver program.
Here is an example of a call to the odm_xmlvalidate macro:
%odm_xmlvalidate(_cstLogLevel=info,_cstResultsOverrideDS=work.xmlvalidate);
In this example, the odm_xmlvalidate macro is being submitted with a log level of Info. The Results data set is named XMLVALIDATE and resides in the Work library.

Validation of the SAS Representation: odm_validate Macro

The odm_validate macro supports the XML validation methodology described above that relies on the definition of a master set of validation checks that are specific to the table and column metadata that define a set of data, and checks that are specific to the data itself. This method is based on SAS and validates the SAS representation of the XML-based standard.
In the SAS Clinical Standards Toolkit, CDISC ODM validation uses the same types of metadata and the same workflow process that is common to validation of all data standards. SAS provides a set of validation checks for CDISC ODM that are designed to verify the metadata definitions and values of the default 66 data sets that comprise the SAS representation of the ODM model. These checks were created by SAS. For more information about these checks, see Validation and Validation Checks. Metadata about each check is provided in the Validation Master data set in <global standards library directory>/standards/cdisc-odm-1.3.0-1.4/validation/control. The odm_validate macro controls the validation workflow for ODM. As each check is processed from the run-time validation check data set, the check determines the source of the table and column metadata to use. The reference_tables and reference_columns data sets contain the metadata for the default 66 data sets that comprise the SAS representation for CDISC ODM. Unless you make customizations or run-time modifications, the source metadata source_tables and source_columns data sets contain the same content as the reference metadata reference_tables and reference_columns data sets.If all 66 ODM tables contribute information to the ODM XML file, then the validation process can run directly against the reference tables and columns data sets. In this case, the Use source data flag in the validation check data set needs to be set to N. However, you can elect to run validation against a subset of the 66 tables. In this case, a source_tables data set that contains the subset needs to be created from the reference_tables data set. And, a corresponding source_columns data set needs to be created from the reference_columns data set. The run-time validation check data set can contain all of the checks, and Use source data can be left set to Y, which is the default value.
There are no parameters for the odm_validate macro.

Sample Driver Program: validate_odm_data.sas

The validate_odm_data.sas driver program sets up the required environment variables and library references before a call is made to the odm_validate macro.
The driver program is located in:
!sasroot/../../SASClinicalStandardsToolkiODM130/1.4/sample/cdisc-odm-1.3.0/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are three input file references, one input library reference, and one output file reference that are key to successful completion of the validation process. These libraries and data sets are listed in Key Components of the SASReferences Data Set, and they are addressed in separate sections. In the sample validate_odm_data.sas driver program, these values are set for &studyRootPath and &studyOutputPath.
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver. These two macro variables have been retained to maintain consistency across the SAS Clinical Standards Toolkit driver programs.
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkiODM130/&_cstVersion/sample/cdisc-odm-1.3.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkiODM130/&_cstVersion/sample/cdisc-odm-1.3.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
cntl_v
libref
&studyRootPath/control
validation_control.sas7bdat
Input
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_tables.sas7bdat
Input
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_columns.sas7bdat
Input
sourcedata
srcdata
libref
&studyRootPath/data
Output
results
results
libref
&studyOutputPath/results
validation_results.sas7bdat

Process Inputs

The control record points to the validation_control (run-time validation check) data set and is accessed by the cntl_v LIBNAME statement. This LIBNAME is assigned to the !sasroot/../../SASClinicalStandardsToolkitODM130/1.4/sample/cdisc-odm-1.3.0/control directory.
The sourcemetadata type references two metadata data sets that describe the table (source_tables) and column (source_columns) metadata for the default 66 data sets that comprise the SAS representation of the ODM model. Both data sets are stored in the same library. In the SAS Clinical Standards Toolkit, this source metadata is read from the !sasroot/../../SASClinicalStandardsToolkitODM130/1.4/ sample/cdisc-odm-1.3.0/metadata directory. This location is represented in the driver program using the Srcmeta library name.
The sourcedata type is the library where the default 66 data sets that comprise the SAS representation of the ODM model are stored. These are the data sets that are being validated. In the SAS Clinical Standards Toolkit, this library is read from the !sasroot/../../SASClinicalStandardsToolkitODM130/1.4/ sample/cdisc-odm-1.3.0/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

For the SAS Clinical Standards Toolkit validation processes, the only process outputs that are generated are the Validation Results and Validation Metrics data sets. These data sets are described in the following section.

Process Results

When the validate_odm_data driver program finishes running, the validation_results data set is created in the Results library. The Results data set contains informational, warning, and error messages that were generated by the validation program. Reporting of validation process metrics is also supported for CDISC ODM validation.
Example of a CDISC ODM Validation Results Data Set
Example of a CDISC ODM validation Results data set