Validation of XML-Based Standards

XML Validation

When validating XML-based standards (such as CDISC ODM and CDISC CRT-DDS), SAS Clinical Standards Toolkit offers two complementary methodologies. The first methodology is described in Validation. It relies on the definition of a master set of validation checks that are specific to the table and column metadata that define a set of data, and checks that are specific to the data itself. This method uses SAS files and SAS code to validate the SAS representation of the XML-based standard. Example checks include the assessment of foreign key relationships across data sets and value conformance to a set of expected values. The second methodology involves verification that an XML file is valid structurally and syntactically according to the XML schema for that standard.
SAS Clinical Standards Toolkit 1.3 provides both methodologies to support the validation of CDISC CRT-DDS 1.0 files. CDISC ODM validation capabilities are under development. (See the SAS Customer Support Web site for SAS Clinical Standards Toolkit at http://support.sas.com/rnd/base/cdisc/cst/index.html for the latest updates.)

Validating CDISC CRT-DDS 1.0 Files

The crtdds_xmlvalidate Macro

The crtdds_xmlvalidate.sas macro validates the structure and syntax of the define.xml file against the XML schema for the ODM standard. It can be run at any time. The SAS Clinical Standards Toolkit includes a call to the crtdds_xmlvalidate.sas macro immediately following the call to the crtdds_write.sas macro as the last step of the create_crtdds_define.sas sample driver program. If you customize the define.xml file after it is generated, then this macro can be used to validate the changes.
The following is an example of a call to the crtdds_xmlvalidate.sas macro:
%crtdds_xmlvalidate(_cstLogLevel=info,_cstResultsOverrideDS=work.xmlvalidate);
In this example, the %crtdds_xmlvalidate macro is being submitted with a log level of Info. The Results data set is named XMLVALIDATE and resides in the Work library.
Parameters for the crtdds_xmlvalidate.sas Macro
Parameter
Required
Description
_cstLogLevel
Yes
Identifies the log level. Valid values are Info, Warning, Error, and Fatal Error. The default value is Info.
_cstResultsOverrideDS
Yes
Provides the opportunity to designate [LIBNAME.]member as the name of the Results data set. If this parameter is omitted (default setting), then the Results data set specified by the &_cstResultsDS global macro variable is used.
XML schema validation results are logged using four log level settings. These log levels refer to the XML-generated log, not the log that is generated by SAS.
Log Levels for the crtdds_xmlvalidate.sas Macro
Log Level
Description
Info
Informational messages such as the system properties of the current Java environment, and progress messages. This is the default value.
Warning
Messages that indicate that there might be an issue with the CRT-DDS document or with the execution of the validation process.
Error
Messages that indicate that something in the define.xml document is invalid with respect to the normal XML schema for CRT-DDS. Or, a non-fatal error has occurred during processing.
Fatal Error
Messages that indicate that the XML document could not be processed at all. There are many causes, including, file system access errors, incorrect file paths, and malformed XML.
Each message that is generated during XML validation is associated with one of these levels. The level that you choose determines what other messages are generated. For example, if you choose warning, then all Warning messages and anything more severe, such as Error and Fatal Error messages, are generated. If you choose error, then only Error and Fatal Error messages are generated.
The following is an example of a call to the crtdds_xmlvalidate.sas macro:
%crtdds_xmlvalidate(_cstLogLevel=info,
                                 _cstResultsOverrideDS=work.xmlvalidate);

Validation of the SAS Representation: crtdds_validate Macro

The crtdds_validate.sas macro supports the first XML validation methodology outlined above. This method is based on SAS and validates the SAS representation of the XML-based standard.
In SAS Clinical Standards Toolkit, CDISC CRT-DDS validation uses the same types of metadata and the same workflow process that is common to validation of all data standards. SAS provides a set of validation checks for CDISC CRT-DDS that are designed to verify the metadata definitions and values of the 39 data sets that comprise the SAS representation of the CRT-DDS model. These checks were created by SAS. For more information about these checks, see Validation and Validation Checks. Metadata about each check is provided in the Validation Master data set which can be found in <global standards library directory>/standards/cdisc-crtdds-1.0-1.3/validation/control.
The crtdds_validate.sas macro controls the validation workflow for CRT-DDS. As each check is processed from the run-time validation check data set, the check determines the source of the table and column metadata to use. The reference_tables and reference_columns data sets contain the metadata for the 39 data sets that comprise the SAS representation for CDISC CRT-DDS. Unless you make customizations or run-time modifications, the source metadata source_tables and source_columns data sets contain the same content as the reference metatadata reference_tables and reference_columns data sets.
If all 39 CRT-DDS tables contribute information to the define.xml file, then the validation process can run directly against the reference tables and columns data sets. In this case, the Use source data flag in the validation check data set needs to be set to N. However, most users will run validation against a subset of the 39 tables. In this case, a source_tables data set that contains the subset needs to be created from the reference_tables data set. And, a corresponding source_columns data set needs to be created from the reference_columns data set. The run-time validation check data set can contain all of the checks, and Use source data can be left set to Y, which is the default value.
There are no parameters for the crtdds_validate macro.

Sample Driver Program: validate_crtdds_data.sas

The validate_crtdds_data.sas driver program sets up the required environment variables and library references before a call is made to the crtdds_validate.sas macro.
For SAS 9.1.3, the driver program is located at:
!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/programs/validate_crtdds_data.sas
For SAS 9.2, the driver program is located at:
!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/programs/validate_crtdds_data.sas
The value for !sasroot is the location of your SAS installation directory.

The SASReferences Data Set

As a part of each SAS Clinical Standards Toolkit process setup, a valid SASReferences data set is required. It can be modified to point to study-specific files. For an explanation of the SASReferences data set, see SASReferences File.
In the SASReferences data set, there are four input file references, one input library reference and, and one output file reference that are key to successful completion of the validation process. The following table lists these libraries and data sets, and they are discussed in separate sections. In the sample validate_crtdds_data.sas driver program, the following values are set for &studyRootPath and &studyOutputPath and are specific to a SAS release.
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver. Two macro variables have been retained to maintain consistency across SAS Clinical Standards Toolkit driver programs.
SAS 9.1.3
&studyRootPath=!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
SAS 9.2
&studyRootPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
&studyOutputPath=!sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0
Key Components of the SASReferences Data Set
Input or Output
Metadata Type
LIBNAME or Fileref to Use
Reference Type
Path
Name of File
Input
control
cntl_s
LIBNAME
&workpath
sasreferences.sas7bdat
Input
control
cntl_v
LIBNAME
&studyRootPath/control
validation_control.sas7bdat
Input
sourcemetadata
srcmeta
LIBNAME
&studyRootPath/metadata
source_tables.sas7bdat
Input
sourcemetadata
srcmeta
LIBNAME
&studyRootPath/metadata
source_columns.sas7bdat
Input
sourcedata
srcdata
LIBNAME
&studyRootPath/data
Output
results
results
LIBNAME
&studyOutputPath/results
validation_results.sas7bdat

Process Inputs

The use of the cntl_s LIBNAME that points to the &workpath path illustrates a technique of documenting the derivation of the SASreferences data set in the SAS Work library. The driver program initiates the macro variable &workPath with the following statement:
%let workPath=%sysfunc(pathname(work));
In this case, the cntl_s LIBNAME points to the same directory as the Work LIBNAME. The second control record points to the validation_control.sas7bdat (run-time validation check) data set, and is accessed by the cntl_v LIBNAME statement. This LIBNAME is assigned to the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/control directory.
The sourcemetadata type references two metadata data sets that describe the table (source_tables) and column (source_columns) metadata for the 39 data sets that comprise the SAS representation of the CRT-DDS model. Both data sets are stored in the same library. In the SAS Clinical Standards Toolkit, this source metadata is read from the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/metadata directory. This location is represented in the driver program using the Srcmeta library name.
The sourcedata type is the library where the 39 data sets that comprise the SAS representation of the CRT-DDS model are stored. These are the data sets that are being validated. In the SAS Clinical Standards Toolkit, this library is read from the !sasroot/../../SASClinicalStandardsToolkitCRTDDS10/1.3/sample/cdisc-crtdds-1.0/data directory. This location is represented in the driver program by the Srcdata library name.

Process Outputs

For SAS Clinical Standards Toolkit validation processes, the only process outputs that are generated are the Validation Results and Validation Metrics data sets. These data sets are described in the following section.

Process Results

When the validate_crtdds_data.sas driver program finishes running, the validation_results.sas7bdat data set is created in the Results library. The Results data set contains informational, warning, and error messages that were generated by the validation program. Reporting of validation process metrics is supported, though it is not implemented for CDISC CRT-DDS validation.
Example of a CDISC CRT-DDS Results Data Set
Display of the Results data set