Validation of XML-Based Standards

XML Validation

When validating XML-based standards (such as CDISC ODM, CDISC CT, CDISC CRT-DDS 1.0, and CDISC Define-XML 2.0, ), the SAS Clinical Standards Toolkit offers two complementary methodologies.
The first methodology is described in Compliance Assessment Against a Reference Standard. It relies on the definition of a master set of validation checks that are specific to the table and column metadata that define a set of data and on checks that are specific to the data itself. This method uses SAS files and SAS code to validate the SAS representation of the XML-based standard. Example checks include the assessment of foreign key relationships across data sets and value conformance to a set of expected values.
The second methodology involves verification that an XML file is valid structurally and syntactically according to the XML schema for that standard.
The SAS Clinical Standards Toolkit provides both methodologies to support the validation of CDISC CRT-DDS 1.0 and CDISC ODM 1.3.0 and 1.3.1 files.
For CDISC Define-XML 2.0 files, SAS Clinical Standards Toolkit supports validation against an XML schema.

Validating an XML File against an XML Schema: %CSTUTILXMLVALIDATE Macro

The %CSTUTILXMLVALIDATE macro validates the structure and syntax of an XML file against the XML schema associated with the XML file. It can be run at any time.
Note: This macro replaces the standard-specific macros crtdds_xmlvalidate.sas, ct_xmlvalidate.sas, and odm_xmlvalidate.sas. These macros are deprecated and are deleted in SAS Clinical Standards Toolkit 1.7. It is recommended that you replace calls to these macros with a call to the %CSTUTILXMLVALIDATE macro.
The SAS Clinical Standards Toolkit includes a call to the %CSTUTILXMLVALIDATE macro immediately following a call to create a specific XML file (for example, the %DEFINE_WRITE macro to create a CDISC Define-XML 2.0 file). This is typically the last step of the sample driver program (for example, create_definexml.sas). If you customize the XML file after it is generated, this macro can be used to validate the customizations. The SAS Clinical Standards Toolkit includes a call to the %CSTUTILXMLVALIDATE macro immediately before a call to read a specific XML file (for example, the crtdds_read macro to read a CDISC CRT-DDS 1.0 file) from the associated sample driver program (for example, create_sascrtdds_fromxml.sas).
Here is an example of a call to the %CSTUTILXMLVALIDATE macro:
%cstutilxmlvalidate(_cstSASReferences=work.sasreferences,_cstLogLevel=info);
In this example, the %CSTUTILXMLVALIDATE macro is being submitted with a log level of Info.
Note: For more information about the %CSTUTILXMLVALIDATE macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.
XML schema validation results are logged using four log-level settings. These log levels refer to the XML-generated log, not the log that is generated by SAS.
The following table shows the log levels:
Log Levels for the %CSTUTILXMLVALIDATE Macro
Log Level
Description
Info
Messages such as the system properties of the current Java environment and progress messages. This is the default value.
Warning
Messages that indicate that there might be an issue with the CRT-DDS document or with the execution of the validation process.
Error
Messages that indicate that something in the define.xml document is invalid with respect to the normal XML schema for CRT-DDS. Or, a non-fatal error has occurred during processing.
Fatal Error
Messages that indicate that the XML document could not be processed at all. There are many causes, including file system access errors, incorrect file paths, and malformed XML.
Each message that is generated during XML validation is associated with one of these levels. The level that you choose determines what other messages are generated. For example, if you choose the Warning level, then all Warning messages and anything more severe, such as Error and Fatal error messages, are generated. If you choose the Error level, then only Error and Fatal Error messages are generated.

Validating the SAS Representation of a CDISC CRT-DDS 1.0 XML File: %CRTDDS_VALIDATE Macro

Overview

The %CRTDDS_VALIDATE macro supports the first XML validation methodology. This method is based on SAS and validates the SAS representation of the XML-based standard.
In the SAS Clinical Standards Toolkit, CDISC CRT-DDS validation uses the same types of metadata and the same workflow process that is common to validation of all data standards. SAS provides a set of validation checks for CDISC CRT-DDS that are designed to verify the metadata definitions and values of the 39 data sets that comprise the SAS representation of the CRT-DDS model. These checks were created by SAS. For more information about these checks, see Compliance Assessment Against a Reference Standard. Metadata about each check is provided in the Validation Master data set in global standards library directory/standards/cdisc-crtdds-1.0-1.7/validation/control.
The %CRTDDS_VALIDATE macro controls the validation workflow for CRT-DDS. As each check is processed from the run-time validation check data set, the check determines the source of the table and column metadata to use. The reference_tables and reference_columns data sets contain the metadata for the 39 data sets that comprise the SAS representation for CDISC CRT-DDS. Unless you make customizations or run-time modifications, the source metadata source_tables and source_columns data sets contain the same content as the reference metadata reference_tables and reference_columns data sets.
If all 39 CRT-DDS tables contribute information to the define.xml file, then the validation process can run directly against the reference_tables and reference_columns data sets. In this case, the Use source data flag in the validation check data set needs to be set to N. However, you are likely to run validation against a subset of the 39 tables. In this case, a source_tables data set that contains the subset needs to be created from the reference_tables data set. And, a corresponding source_columns data set needs to be created from the reference_columns data set. The run-time validation check data set can contain all of the checks, and Use source data can be set to Y, which is the default value.
There are no parameters for the %CRTDDS_VALIDATE macro.

Sample Driver Program: validate_crtdds_data.sas

The validate_crtdds_data.sas driver program sets up the required environment variables and library references before a call is made to the %CRTDDS_VALIDATE macro.
The driver program is located here:
sample study library directory/cdisc-crtdds-1.0–1.7/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are four input file references, one input library reference, and one output data set reference that are key to the successful completion of the validation process. Key Components of the SASReferences Data Set for the validate_crtdds_data.sas Driver Program lists these files, libraries, and data sets, and they are discussed in separate sections. In the sample validate_crtdds_data.sas driver program, these values are set for &studyRootPath and &studyOutputPath:
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver program. Two macro variables have been retained to maintain consistency across the SAS Clinical Standards Toolkit driver programs.
&studyRootPath=sample study library directory/cdisc-crtdds-1.0–1.7
&studyOutputPath=sample study library directory/cdisc-crtdds-1.0–1.7
Key Components of the SASReferences Data Set for the validate_crtdds_data.sas Driver Program
Metadata Type
SAS LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
cntl_s
libref
&workpath
sasreferences.sas7bdat
control
cntl_v
libref
&studyRootPath/control
validation_control.sas7bdat
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_columns.sas7bdat
sourcedata
srcdata
libref
&studyRootPath/data
Output
results
results
libref
&studyOutputPath/results
validation_results.sas7bdat

Process Inputs

The use of the cntl_s LIBNAME that points to the &workpath path demonstrates a technique of documenting the derivation of the SASReferences data set in the SAS Work library. The driver program initiates the macro variable &workPath with this statement:
%let workPath=%sysfunc(pathname(work));
In this case, the cntl_s LIBNAME points to the same directory as the Work LIBNAME. The second control record points to the validation_control data set (run-time validation check data set), and is accessed by the cntl_v LIBNAME statement. This LIBNAME is assigned to the sample study library directory/cdisc-crtdds-1.0–1.7/control directory.
The sourcemetadata type references two metadata data sets that describe the table (source_tables) and column (source_columns) metadata for the 39 data sets that comprise the SAS representation of the CRT-DDS model. Both data sets are stored in the same library. In the SAS Clinical Standards Toolkit, this source metadata is read from the sample study library directory/cdisc-crtdds-1.0–1.7/metadata directory. This location is represented in the driver program by the Srcmeta library name.
The sourcedata type is the library where the 39 data sets that comprise the SAS representation of the CRT-DDS model are stored. These are the data sets that are being validated. In the SAS Clinical Standards Toolkit, this library is read from the sample study library directory/cdisc-crtdds-1.0–1.7/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

For the SAS Clinical Standards Toolkit validation processes, the only process outputs that are generated are the Validation Results and Validation Metrics data sets. These data sets are described in the following section.

Process Results

When the validate_crtdds_data.sas driver program finishes running, the validation_results data set is created in the Results library. The Results data set contains informational, warning, and error messages that were generated by the driver program. Reporting of validation process metrics is supported, although it is not implemented for CDISC CRT-DDS validation.
Example of a CDISC CRT-DDS Results Data Set
Example of the Results data set

Validating the SAS Representation of ODM Files: %ODM_VALIDATE Macro

Overview

The %ODM_VALIDATE macro supports the second XML validation methodology. This method relies on the definition of a master set of validation checks that are specific to the table and column metadata that define a set of data and on checks that are specific to the data itself. This method uses SAS files and SAS code to validate the SAS representation of the XML-based standard.
In the SAS Clinical Standards Toolkit, CDISC ODM validation uses the same types of metadata and the same workflow process that is common to validation of all data standards. SAS provides a set of validation checks for CDISC ODM that are designed to verify the metadata definitions and values of the default 66 data sets that comprise the SAS representation of the ODM model. These checks were created by SAS. For more information about these checks, see Compliance Assessment Against a Reference Standard. Metadata about each check is provided in the Validation Master data set in the global standards library directory/standards/cdisc-odm-1.3.0-1.7/validation/control directory.
The %ODM_VALIDATE macro controls the validation workflow for ODM. As each check is processed from the run-time validation check data set, the check determines the source of the table and column metadata to use. The reference_tables and reference_columns data sets contain the metadata for the 66 data sets that comprise the SAS representation for CDISC ODM. Unless you make customizations or run-time modifications, the source metadata source_tables and source_columns data sets contain the same content as the reference metadata reference_tables and reference_columns data sets.
If all 66 ODM tables contribute information to the ODM XML file, then the validation process can run directly against the reference_tables and reference_columns data sets. In this case, the Use source data flag in the validation check data set needs to be set to N. However, you can choose to run validation against a subset of the 66 tables. In this case, a source_tables data set that contains the subset needs to be created from the reference_tables data set. And, a corresponding source_columns data set needs to be created from the reference_columns data set. The run-time validation check data set can contain all of the checks, and the Use source data flag can be set to Y, which is the default value.
There are no parameters for the %ODM_VALIDATE macro.

Sample Driver Program: validate_odm_data.sas

The validate_odm_data.sas driver program sets up the required environment variables and library references before a call is made to the %ODM_VALIDATE macro.
The driver program is located here:
sample study library directory/cdisc-odm-1.3.0–1.7/programs

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It references the input files that are needed, the librefs and filenames to use, and the names and locations of data sets to be created by the process. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are three input file references, one input library reference, and one output data set reference that are key to the successful completion of the validation process. These files, libraries, and data sets are listed in Key Components of the SASReferences Data Set for the validate_odm_data.sas Driver Program, and they are discussed in separate sections. In the sample validate_odm_data.sas driver program, these values are set for &studyRootPath and &studyOutputPath.
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver program. These two macro variables have been retained to maintain consistency across the SAS Clinical Standards Toolkit driver programs.
&studyRootPath=sample study library directory/cdisc-odm-1.3.0–1.7
&studyOutputPath=sample study library directory/cdisc-odm-1.3.0–1.7
Key Components of the SASReferences Data Set for the validate_odm_data.sas Driver Program
Metadata Type
LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
cntl_v
libref
&studyRootPath/control
validation_control.sas7bdat
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_tables.sas7bdat
sourcemetadata
srcmeta
libref
&studyRootPath/metadata
source_columns.sas7bdat
sourcedata
srcdata
libref
&studyRootPath/data
Output
results
results
libref
&studyOutputPath/results
validation_results.sas7bdat

Process Inputs

The control record points to the validation_control data set (run-time validation check data set) data set. It is accessed by the cntl_v LIBNAME statement. This LIBNAME is assigned to the sample study library directory/cdisc-odm-1.3.0–1.7/control directory.
The sourcemetadata type references two metadata data sets that describe the table (source_tables) and column (source_columns) metadata for the 66 data sets that comprise the SAS representation of the ODM model. Both data sets are stored in the same library. In the SAS Clinical Standards Toolkit, this source metadata is read from the sample study library directory/cdisc-odm-1.3.0–1.7/metadata directory. This location is represented in the driver program by the Srcmeta library name.
The sourcedata type is the library where the 66 data sets that comprise the SAS representation of the ODM model are stored. These are the data sets that are being validated. In the SAS Clinical Standards Toolkit, this library is read from the sample study library directory/cdisc-odm-1.3.0–1.7/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

For the SAS Clinical Standards Toolkit validation processes, the only process outputs that are generated are the Validation Results and Validation Metrics data sets. These data sets are described in the following section.

Process Results

When the validate_odm_data driver program finishes running, the validation_results data set is created in the Results library. The Results data set contains informational, warning, and error messages that were generated by the driver program. Reporting of validation process metrics is supported, although it is not implemented for CDISC ODM validation.
Example of a CDISC ODM Validation Results Data Set
Example of a CDISC ODM Validation Results data set