Validation Checks by Standard

Overview

The SAS Clinical Standards Toolkit 1.5 provides a set of defined checks for each standard, where the global standards library directory/metadata standards data set supportsvalidation flag is set to “Y”. By default, each Validation Master data set is located in the global standards library directory/standards/<standard>/validation/control folder.
This table summarizes the content of each standard-specific validation_master data set that is provided by SAS:
Summary of Checks in Each validation_master Data Set That Is Provided by SAS
CDISC Standard and Version
Total Number of Check Records
Number of Unique Checks
Number of Check Macros Used
ADaM 2.1
264
257
14
CRT-DDS 1.0
83
12
7
CT 1.0.0
34
14
7
ODM 1.3.0
179
39
10
ODM 1.3.1
190
38
10
SDTM 3.1.1
257
150
14
SDTM 3.1.2
247
243
15
SDTM 3.1.3
290
263
15
CST-FRAMEWORK
130
86
11

ADaM 2.1

The CDISC ADaM validation checks are derived from the SAS interpretation of the CDISC ADaM Validation Checks Version 1.0 (final production version dated September 20, 2010) and the CDISC ADaM Validation Checks Version 1.1 maintenance release (dated and released January 21, 2011 to correct errors and remove duplicate checks).
In addition, SAS has added 45 unique checks (52 total records) to the Validation Master data set. These checks can be identified where checksource=“SAS”.
ADaM data sets are typically derived from a tabulation study, such as SDTM or SEND. Some checks require the comparison of ADaM content with data and metadata from the tabulation source. Of the 264 validation_master records, 22 involve a comparison with another CDISC standard such as SDTM 3.1.3.

CDISC CRT-DDS 1.0

The SAS Clinical Standards Toolkit provides check macros that validate the data in the SAS data sets representing CDISC CRT-DDS data. The goal of these check macros is to ensure that all data is correctly specified and that referential integrity is maintained. As a result, a standards-compliant CDISC define.xml file can be produced from these data sets.
The validity of CRT-DDS data is determined by the standard in the form of XML schema definitions. These XML schema definitions must be translated into checks appropriate for the relational and tabular format.
Checks fall into these general categories:
  • Ensures that all cross-table references are satisfied and that the referenced item actually exists (referential integrity).
  • Ensures that required variables are not missing or empty for an observation or row.
  • Ensures that character data conforms to a particular format.
Formats are specified in the standard in one of two ways:
  • an enumeration
  • a regular expression
The SAS Clinical Standards Toolkit 1.5 provides 83 CDISC CRT-DDS validation checks. These validation checks were developed by SAS and are based on CRT-DDS and ODM implementation experience and careful review of the associated implementation guides, with special emphasis on the occurrence of “should” within each implementation guide. CRT-DDS Validation Check Types lists the types of checks for CRT-DDS data.
Each check type is assumed to operate on data that exists in a source column in a source data set. A check type can reference one or more parameters that validate the source column data. A parameter can be a character string or a representation of some column other than the source column against which the source column data must be compared.
All character comparisons are case sensitive. Character data is assumed to have been trimmed of leading or trailing white space.
CRT-DDS Validation Check Types
Check Type
Category
Description
Unique in data set
Structural
No two values for the source column can be the same in the same source data set.
Required character value
Data
The trimmed (white space removed) value of the character data must consist of one or more characters.
Required numeric value
Data
The numeric value of the column cannot be missing.
Enumeration(s0,s1,...)
Data
If character data exists, its value must match one of the enumerated character strings. All string comparisons are case sensitive.
Foreign key(targetColumn)
Structural
Each existing value in this column must have an equivalent value in the target column.
Foreign key required(targetColumn)
Structural
A value is required for this column in every row. Each value must have an equivalent value in the target column. This check is the equivalent of running the required character value check, and this check failing if that check fails. If the required character value passes, the foreign key() check is run.
Character format: language
Data
The character data must consist of 1 to 8 alphabetical characters of any case. It can be followed by a hyphen and any sequence of 1 to 8 alphabetical characters in any case or numeric digits after that hyphen. For example, e is a legal value, as is en-us, english, and english-d842. Invalid values include 1en, mumblespeak, and en_us. The hyphen character sequence can be repeated, making a value such as english-mumbly-growly-47 a legal value. Regular expression: [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*.
Character format: fileName
Data
The character data must not contain any characters other than uppercase and lowercase letters of the alphabet, numeric digits, an underscore (_), or a period. Regular expression: [A-Za-z0-9_.]+.
Character format: sasFormat
Data
The first character must be either a lowercase or uppercase letter, an underscore (_), or the dollar sign ($). Any subsequent character must be either an uppercase or lowercase letter, a numeric digit, an underscore (_), or a period. Regular expression: [A-Za-z_$][A-Za-z0-9_.]*.
Character format: sasName
Data
The first character must be either a lowercase or uppercase letter or an underscore (_). Any subsequent character must be either an uppercase or lowercase letter, a numeric digit, or an underscore (_). Regular expression: [A-Za-z_][A-Za-z0-9_]*.
Unique across data sets(targetcolumn0,...)
Structural
No value in this column can be the same as any value in any of the data set columns.
Primary key
Data
Must be unique in data set check type and the required character value check type.
Must Have Corresponding Value(targetColumn)
Structural
For each distinct value in this column, there must be at least one equivalent value in the target column.
No Duplicates Per Unique Value(targetColumn)
Structural
For each distinct value in the target column, each value in the source column must be unique. That is, the same value cannot appear more than once in the source column for each distinct value in the target column.
(1) This validation is a combination of checks CRT0101 and CRT0110.
(2) This validation is a combination of checks CRT0100 and CRT0101.
Each check type belongs to one of two categories.
  1. Data checks have no dependencies on data outside of the source table. An example is ensuring that a value exists in a column in which values cannot be missing.
  2. Structural checks deal with relationships and data integrity between tables. Foreign key enforcement is an example of a structural check. Structural conditions must be met for the successful generation of a define.xml file. You might want to defer structural checks until later in the process of populating the CRT-DDS data sets. This is because foreign key relationships require that the data be made available in a particular order (that is, a referenced key must be available before the foreign key to it can exist).
The CDISC CRT-DDS validation also checks the data against a set of expected values. The expected values have been stored in a format catalog (crtddsct.sas7bcat) and a data set (crtddsct.sas7bdat). They are in the global standards library directory/standards/cdisc-crtdds-1.0-1.5/formats folder.
The SASReferences data set needs to contain a row for fmtsearch, with SAS libref set to crtfmt and the Filename should refer to crtddsct.sas7bcat.

CDISC ODM 1.3.0 and 1.3.1

The SAS Clinical Standards Toolkit provides check macros that validate the data in the SAS data sets representing CDISC ODM data. The structure of this data is similar to CDISC CRT-DDS. Therefore, the process for validating the data is similar. The goal of these check macros is to ensure that all data is correctly specified, and that referential integrity is maintained. As a result, a standards-compliant CDISC define.xml file can be produced from these data sets.
As in CRT-DDS, the validity of ODM data is determined by the standard in the form of XML schema definitions. These XML schema definitions must be translated into checks appropriate for the relational and tabular formats.
Checks fall into these general categories:
  • Ensures that all cross-table references are satisfied and that the referenced item actually exists (referential integrity).
  • Ensures that required variables are not missing or empty for an observation or row.
  • Ensures that character data conforms to a particular format.
  • Formats are specified in the standard in one of two ways:
    • an enumeration
    • a regular expression
The SAS Clinical Standards Toolkit 1.5 provides 179 ODM 1.3.0 and 190 ODM 1.3.1 validation checks. These validation checks were developed by SAS and are based on ODM implementation experience and careful review of the CDISC ODM Implementation Guide, with special emphasis on the occurrence of “should” within the Implementation Guide.
By default, the ODM 1.3.0 Validation Master data sets are located in the global standards library directory/standards/cdisc-odm-1.3.0-1.5/validation/control and the global standards library directory/standards/cdisc-odm-1.3.1-1.5/validation/control folders.
ODM Validation Check Types lists the types of checks for ODM data.
Each check type is assumed to operate on data that exists in a source column in a source data set. A check type can reference one or more parameters that validate the source column data. A parameter can be a character string or a representation of a column other than the source column against which the source column data must be compared.
All character comparisons are case sensitive. Character data is assumed to have been trimmed of leading and trailing white space.
ODM Validation Check Types
Check Type
Category
Description
Unique in data set
Structural
No two values for the source column can be equivalent within the same source data set.
Structural
Duplicate OrderNumber element. The OrderNumber attribute must be unique within the same source data set when not null.
Required character value
Data
The trimmed (white space removed) value of the character data must consist of one or more characters.
Required numeric value
Data
The numeric value of the column cannot be missing.
Enumeration(s0,s1,…)
Data
If character data exists, its value must match one of the given enumerated character strings. All string comparisons are case sensitive.
Foreign key(targetColumn)
Structural
Each existing value in this column must have an equivalent value in the given target column.
Foreign key required(targetColumn)
Structural
A value is required for this column in every row and each value must have an equivalent value in the given target column. This check is the equivalent of running the required character value check, and failing if that check fails. If required character value passes, the foreign key() check is run.
Character format: language
Data
The character data must consist of 1-8 alphabetical characters of either case, followed optionally by a hyphen character and any sequence of 1-8 alphabetical characters of either case or numeric after that hyphendigits. For example, e is a legal value, as are en-us and english and english-d842. Invalid values include 1en, mumblespeak, and en_us. The hyphen character sequence can be repeated any number of times also making a value such as english-mumbly-growly-47 a legal value. Regular expression: “[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*”.
Character format: fileName
Data
The character data must not contain any characters other than upper- and lower-case letters of the alphabet, numeric digits, the underscore (_) character, or a period. Regular expression: [A-Za-z0-9_.]+.
Character format: sasName
Data
The first character must be either alower- or upper-case letter or an underscore (_). Any subsequent character must be either an upper- or lowercase letter, a numeric digit, or the underscore (_). Regular expression: [A-Za-z_][A-Za-z0-9_]*.
Character format: sasFormat
Data
The first character must be either a lower- or upper-case letter, an underscore (_), or the dollar sign ($). Any subsequent character must be either an upper- or lowercase letter, a numeric digit, the underscore (_), or a period. Regular expression: [A-Za-z_$][A-Za-z0-9_.]*.
Must Have Corresponding Value(targetColumn)
Structural
For each distinct value in this column, there must be at least one equivalent value in the supplied target column.
Unique across data sets(targetcolumn0,…)
Structural
No value in this column can be equal to any value in any of the given data set columns.
Primary key
Data
Must satisfy the Unique in data set check type and the required character value check type.
Invalid Value
Data
Documents based on ODM 1.3 should have ODM version set to 1.3.
Data
An invalid SAS format name. In case the data type is character, the format name needs to start with the $ character.
Data
An invalid integer value. The attribute is defined as an integer, but the text string does not match the named data format. The allowed string pattern for an integer is: -?digit+.
Data
An invalid float value. The attribute is defined a float, but the text string does not match the named data format. The allowed string pattern for a float is: -?digit+(.digit+)?.
Data
An invalid date value. The attribute is defined as a date, but the text string does not match the named data format. The allowed string pattern for a date is: YYYY-MM-DD.
Data
An invalid time value. The attribute is defined a time, but the text string does not match the named data format. The allowed string pattern for a time is: hh:mm:ss(.n+)?((+|-)hh:mm)?.
Data
An invalid datetime value. The attribute is defined as a datetime, but the text string does not match the named data format. The allowed string pattern for a datetime is: YYYY-MMM-DD T hh:mm:ss(.n+)?((+|-)hh:mm)?.
External File Reference Found
Data
External file reference found because the prior file OID is not missing (for example, ODM.PriorFileOID ne ‘’)
Referenced OID Not Found
Data
If Metadata version IncludedOID is non-null, the referenced OID must be found in this XML file.
Data
If Metadata version IncludedStudyOID is non-null, the referenced OID must be found in this XML file.
Attribute is Required
Column
The ItemDef length attribute is required when data type is text, string, integer, or float and can be ignored for the other types.
Column
The required attribute SignificantDigits cannot be empty or missing when Data type is Float.
Column
Only numeric (integer or float) items should have measurement units. The MeasurementUnitRefs list the acceptable measurement units for this type of item. If only one MeasurementUnitRef is present, all items of this type carry this measurement unit by default. If no MeasurementUnitRef is present, the item's value is scalar (for example., a pure number).
Data Set Does Not Exist
Metadata
Invalid root element. The ODM file must contain a root element called ODM. In other words, the ODM data set must exist.
Mixed Data Exists
Multirecord
Typed and Untyped data transmission should not be mixed within a single ODM file.
Multiple Records Exists
Column
To avoid ambiguity, a particular language tag should not occur more than once in a series of TranslatedText elements
(1) This validation is a combination of checks ODM0101 and ODM0110.
(2) This validation is a combination of checks ODM0100 and ODM0101.
Each check type belongs to one of two categories:
  1. Data checks have no dependencies on data outside of the source table. An example is ensuring that a value exists in a column in which values cannot be missing.
  2. Structural checks deal with relationships and data integrity between tables. An example is foreign key enforcement. Structural conditions must be met for the successful generation of an ODM XML file. You might want to defer structural checks until later in the process when populating the ODM data sets. This is because foreign key relationships require that the data is made available in a particular order (that is, a referenced key must be available before the foreign key to it can exist).
For the CDISC ODM validation checks that compare the data against a set of expected values, the expected values are stored in a format catalog (odmct.sas7bcat) and a data set (odmct.sas7bdat). For ODM 1.3.0, these are in the global standards library directory/standards/cdisc-odm-1.3.0-1.5/formats folder. Case-sensitivity compliance is required by the XML schema validation.

CDISC SDTM 3.1.1, 3.1.2, and 3.1.3

The SAS Clinical Standards Toolkit 1.5 provides validation checks in support of CDISC SDTM 3.1.1, 3.1.2, and 3.1.3. These checks are derived from multiple sources that have evolved over time, including:
  • The SAS interpretation of the CDISC SDTM WebSDM 2.6 and 3.0 documented checks.
  • Checks supporting loads into the FDA Janus study data repository.
  • The SAS interpretation of the OpenCDISC CDISC SDTM validation rules ( http://www.opencdisc.org)
  • SAS checks based on SAS data management and cleaning experiences building CDISC SDTM domains.
Future updates will be guided in part by the FDA/PhUSE Working Groups ( http://www.phusewiki.org), such as the SDTM Validation Rules project.
Each version of the CDISC SDTM Validation Master data set (such as SDTM 3.1.3) contains a different number of checks based on the rules that are in effect at the time of each version and the number and type of supported tabulation domains. For more information about the distribution of checks by version, see Summary of Checks in Each validation_master Data Set That Is Provided by SAS.
By default, the Validation Master data set is located in the global standards library directory/standards/<specific standard and version>/validation/control folder. It is named validation_master.sas7bdat.
Each Validation Master data set is built with multiple instances of the checks. This better supports check selection by version or checksource (that is, WebSDM, Janus, or customer-defined checks) and enables unique check logic and messaging by version or checksource.
Multiple instances of specific checks are provided to handle different sets of SDTM domains. For example, check SDTM0604 assesses whether the sequence numbers (**SEQ) are consecutively numbered. For most domains, this is assessed in each patient (USUBJID). However, the trial summary (TS) domain does not contain patient-level data, so the check logic differs. The Validation Master metadata differs for these two instances of the SDTM0604 check, but it reports the same error message for the check.
Note: The validation check data set column checkstatus indicates the state of each check. It indicates that the check is ready to be run in its current defined state, or that the check can be run based on some external criteria. Current valid values are 1 (active), 0 (inactive), -1 (deprecated), and -2 (not yet implemented). Values are extensible to meet your requirements. You can elect to use other values such as 1 (draft), 2 (test), and 3 (production). If a check is included in the run-time Validation Control data set, the SAS Clinical Standards Toolkit attempts to run the check as defined if the checkstatus value is > 0.
Consider the interrelationships among the SAS Clinical Standards Toolkit validation check metadata. All run-time Validation Control data sets, any programs that build or derive from these data sets, corresponding Messages data sets, and the Validation_StdRef data set are examples of how interconnected many SAS Clinical Standards Toolkit metadata files are. For more information, see Messages. By default, the Validation_StdRef data set is located in the global standards library directory/standards/<specific standard and version>/validation/control folder.

CDISC CT 1.0.0

The CDISC CT validation checks are patterned in part after the CDISC ODM checks. The checks ensure that SAS rules for format names and non-duplicate values are followed. A total of 34 records are defined in the Validation Master data set, which, by default, is located at:global standards library directory/standards/cdisc-ct-1.0.0-1.5/validation/control.

The SAS Clinical Standards Toolkit Framework

Validation of the SAS Clinical Standards Toolkit framework files is referred to as internal validation. For more information, see Internal Validation.