Metadata Requirements

Overview

As noted in Supported Standards, a standard consists of properties, messages, and metadata files that collectively represent the standard in the SAS Clinical Standards Toolkit. Each SAS Clinical Standards Toolkit registered standard can support validation if the standards.supportsvalidation flag is set to Y. This setting indicates that the required set of validation files defining the standard exist. By default, the set of validation files that supports the standards that are supplied by SAS is in the cstGlobalLibrary folder hierarchy.
For example, validation files that define the CDISC SDTM 3.1.3 standard are in this folder hierarchy:
global standards library directory/standards/cdisc-sdtm-3.1.3–1.5
The following sections describe each metadata type used by typical validation processes. For information about metadata files that are common to all SAS Clinical Standards Toolkit processes, see Metadata File Descriptions. Metadata characteristics specific to compliance assessments are described in the sections in this chapter.

Reference Metadata

For CDISC standards, reference metadata about data sets is defined in a reference_tables data set, and metadata about columns is defined in a reference_columns data set. An example of a reference_tables record is provided in reference_tables Data Set and an example of a reference_columns record is provided in reference_columns Data Set.
Note: The structure and content of the reference metadata data sets can vary for other standards.
As noted in Supported Standards, each standard that is supplied by SAS provides a SAS interpretation of the published source guidelines or specification of that standard. Each standard is designed to serve as a representative model or template of the source specification. Each model or template can be modified to establish your own gold standard.
reference_tables Data Set
Column Name
Column Length
Description
sasref
$8
The SAS libref that refers to the table in the SAS Clinical Standards Toolkit process. This value should match the value of the SASReferences.sasref field, where type=referencemetadata and subtype=table. This column is required.
table
$32
The name of the tabulation domain or analysis data set being defined in the standard. The value must conform to SAS naming conventions. This column is required.
label
$40
The label of the domain being defined in the standard. The value must conform to SAS naming conventions. This column is optional.
class
$40
The observation class in the standard. Example CDISC SDTM values are Events, Findings, Interventions, Relates, Special Purpose, and Trial Design. This column is optional and not relevant for all standards.
xmlpath
$200
The path to the SAS transport file. This path can be specified as a relative path. The value can be used when creating define.xml to populate the value for the def:leaf xlink:href link to the domain file. The value should be the pathname and filename of the SAS transport file relative to the location of define.xml file. This column is optional and not relevant for all standards.
xmltitle
$200
The title of the SAS transport file. The value can be used when creating a define.xml file to populate the value for the def:leaf def:title value. It can provide a meaningful description, label, or location of the domain leaf (for example, crt/datasets/Protocol 1234/AE.xpt). This column is optional and not relevant for all standards.
structure
$200
The description of the general structure of the table. An example value is one record per event per subject. This column is optional and not relevant for all standards.
purpose
$20
The description of the general purpose of the table. Examples are Tabulation (required for CDISC SDTM) and Analysis (required for CDISC ADaM). This column is optional and not relevant for all standards.
keys
$200
A space-delimited string of keys that captures the table columns that uniquely define records in the table. This set of keys can also define the sort order of records in the table. Example is STUDYID USUBJID. This column is required.
state
$20
A description of the table state, such as Draft or Final. This column is optional.
date
$20
A meaningful, distinguishing date that describes the table, such as the release date, the creation date, or the modified date. This column is optional.
standard
$20
This value captures the standard name. This value must match the name of a registered standard in the SAS Clinical Standards Toolkit framework. For a discussion of registered standards, see Framework. This value must match the standard field in the SASReferences data set. Examples are CDISC SDTM and CDISC CRT-DDS. This column is required.
standardversion
$20
This value captures a specific version of a standard. This value must match one of the standard versions associated with a registered standard. This value must match the standardversion field in the SASReferences data set. Examples are 3.1.1 and 1.0. This column is required.
standardref
$200
Any reference to an associated standard definition, implementation guide, schema, and so on, that provides additional information about the table or describes the table in greater detail. This column is optional.
comment
$200
Any character string that provides comments relevant to the table. This column is optional.
Note: The column length can vary to match submission requirements or corporate conventions.
reference_columns Data Set
Column Name
Column Length
Description
sasref
$8
The SAS libref that refers to the table containing the column in the SAS Clinical Standards Toolkit process. This value should match the value of the SASReferences.sasref field, where type=referencemetadata and subtype=column. This column is required.
table
$32
The name of the tabulation domain or analysis data set being defined in the standard. The value must conform to SAS naming conventions. This column is required.
column
$32
The name of the column in the table. The value must conform to SAS naming conventions. This column is required.
label
$200
The label of the column. The value must conform to SAS naming conventions. This column is optional.
order
8.
The order of the columns in each table. Values must be integers >0 and unique in each table. This column is required.
type
$1
The SAS type, N for numeric, C for character. This column is required.
length
8.
The length of the column. Numeric columns have a length of 8. This column is required.
displayformat
$32
The display format for numeric variables. For example, 8.2 indicates that floating-point variable values should be displayed to the second decimal place. This value is optional and not relevant for all standards.
xmldatatype
$8
The data type of the column as it is defined in the define.xml file. Values are integer | float | date | datetime | time | text. This column is optional and not relevant for all standards.
xmlcodelist
$32
A SAS format name that is used to assess conformance to controlled terminology. This value does not have a $ prefix for character formats and does not have the trailing period. This value is also the codelist name in the define.xml file. The SAS format name must be in the format search path for successful column-value validation. This record is optional and not relevant for all standards.
core
$10
The value indicates whether the column is required. Sample CDISC SDTM values are Req (required), Exp (expected), Perm (permissible), and Dep (deprecated). This column is optional and not relevant for all standards.
origin
$40
Information about the source of the column. Values can include CRF page numbers and derived or variable references. Values are user extensible. This column is optional and not relevant for all standards.
role
$200
Space-delimited column classification. Examples are Identifier, Topic, Qualifier, Timing, Selection, and Analysis. Columns can have multiple roles. This column is optional and not relevant for all standards.
term
$80
The value indicates whether the column is subject to controlled terminology as defined in each standard source specification. This column is optional and not relevant for all standards.
algorithm
$1000
Imputation or computation method to derive the column value. This column is optional and not be relevant for all standards.
qualifiers
$200
Space-delimited string containing supplemental column attributes. Example CDISC SDTM values are MIXEDCASE, UPPERCASE, DATETIME, and DURATION. This column is optional and not relevant for all standards.
standard
$20
This value captures the standard name. This value must match the name of a registered standard in the SAS Clinical Standards Toolkit framework. For a discussion of registered standards, see Framework. This value must match the standard field in the SASReferences data set. Examples are CDISC SDTM and CDISC CRT-DDS. This column is required.
standardversion
$20
This value captures a specific version of a standard. This value must match one of the standard versions associated with a registered standard. This value must match the standardversion field in the SASReferences data set. Examples are 3.1.1 and 1.0. This column is required.
standardref
$200
Any reference to an associated standard definition, implementation guide, schema, and so on, that provides additional information about the column or describes the column in greater detail. This column is optional.
comment
$1000
Any character string that provides comments relevant to the column. This column is optional.
Note: The column length can vary to match submission requirements or corporate conventions.
The standard reference metadata provided by SAS is in the SAS Clinical Standards Toolkit global standards library. By default, this library is here:
global standards library directory/standards/<specific standard>/metadata
For example, for the CDISC SDTM 3.1.3 standard, the location is:
global standards library directory/standards/cdisc-sdtm-3.1.3-1.5/metadata
This global standards library metadata folder can contain other standard-specific metadata. For example, CDISC SDTM includes class_tables and class_columns data sets. These data sets have more generic metadata than specific domain instances like DM or AE, and they are most useful when deriving new, custom domains. For example, if a new CDISC SDTM events domain is required, you can initialize table metadata based on the EVENTS record in class_tables data set, and can initialize column metadata based on the EVENTS, IDENTIFIERS, and TIMING records in the class_columns data set.

Source Metadata

The SAS Clinical Standards Toolkit validation processes require source metadata that describes source (study) domains and columns. This is the study data that is to be validated. The SAS Clinical Standards Toolkit assumes that the reference metadata (that is, reference_tables and reference_columns) for a standard serves as a model or template for the source metadata (that is, source_tables and source_columns). It is recommended that these two sets of metadata be structurally equivalent. However, additional metadata attributes might exist if they are used for other purposes or for custom extensions to the SAS Clinical Standards Toolkit.
The SAS Clinical Standards Toolkit assumes that source_tables and source_columns data sets accurately reflect and are consistent with the source data that they describe. Although some standard-specific validation checks might look for discrepancies and report them in detail, failure to accurately reflect and be consistent with the source data can lead to errors in the SAS Clinical Standards Toolkit validation process. It can even halt the execution of the process.

Validation Check Metadata: Validation Master

The Validation Master data set contains all validation checks defined for a standard. By default, this data set is deployed to this directory in each supported standard:
global standards library directory/standards/<standard>/validation/control
By default, the Validation Master SAS data set’s actual name is validation_master.sas7bdat.
The SAS Clinical Standards Toolkit requires that this data set have a fixed structure.
This table lists the columns in the Validation Master data set. These columns are described and examples are reviewed in the following sections.
Column Descriptions of the Validation Master Data Set
Column Name
Column Length
Description
checkid
$8
Validation check ID. The SAS Clinical Standards Toolkit has adopted a naming convention matching each standard to be validated. The checkid values are prefixed with an up to 4-character prefix (CDISC examples: ODM, SDTM, ADAM, and CRT). By convention, the prefix matches the mnemonic field in the Standards data set in global standards library directory/metadata. This prefix is followed by a 4-digit numeric that is unique within the standard (for example, SDTM1234). You can use any naming convention limited to eight characters. By default, the checkid column is the first (primary) sort field in the Validation Master data set provided by SAS. Sorting by checkid is not required. This column is required.
standard
$20
This value captures the standard name. This value must match the name of a registered standard in the SAS Clinical Standards Toolkit framework. For a discussion of registered standards, see Framework. This value must match the standard field in the SASReferences data set. Examples are CDISC SDTM and CDISC CRT-DDS. This column is required.
standardversion
$20
This value captures a specific version of a standard. This value must match one of the standard versions associated with a registered standard. This value must match the standardversion field in the SASReferences data set. The only exception to this rule is that *** can be used to signify that the check applies to all supported versions of the standard. For example, 3.1.1, 1.0, ***. If a subsequent version of the standard is released, then *** would be applicable if the check is valid for the new version. This column is required.
checksource
$40
A string that identifies the source of the check. CDISC examples include Janus, JanusFR (FAIL-REJECT), SAS, WebSDM, and OpenCDISC. This field can contain any user-defined value. A primary use of this field is to subset the full set of checks in the run-time Validation Control data set. This column is required.
sourceid
$8
A reference identifier for this check from the checksource. In the Validation Master data set, a SAS identifier (for example, SAS0001) is used for checks provided by SAS with no external source. An example is IR4000 (WebSDM identifier). This column is optional.
checkseverity
$40
The severity as assigned by checksource. This value is mapped to these standardized values: Note (Low), Warning (Medium), Error (High). A value is expected, although it is not technically required. It is used in messages and reporting.
checktype
$20
General type of check. This value categorizes checks and helps register customized checks. Values are user extensible and can be standard specific. A primary use of this field is to subset the full set of checks in the run-time Validation Control data set. Example CDISC SDTM values are:
Metadata-structural—Checks some metadata-only property (no data access required).
ColumnValue-content: Checks a column value or compares two column values.
Date-content: Checks ISO 8601 compliance or compares two date values.
Multirecord-content: Looks across multiple records in a single domain.
Multitable-content: Looks across multiple domains.
Controlterm-content: Assesses whether column value is consistent with controlled terminology.
This column is optional.
codesource
$32
The name of the check macro. The name must conform to SAS naming conventions. The value must be in the SAS autocall path. An example is cstcheck_notunique. This column is required.
usesourcemetadata
$1
The value indicates whether to use source metadata rather than reference metadata. The metadata controls the derivation of domains and column lists to be validated, program flow, and looping. Values are Y and N (default). This column is optional.
tablescope
$200
The value specifies the domains to be validated by the check. The domains must exist in either or both of the reference metadata or source metadata. The value can be in the form:
_ALL_-DM-DS: Multiple domains that exclude one or more specific domains that are delimited with a -.
DM: Any single domain; can be specified as libref.domain.
DM+AE: Multiple domains delimited with a +.
_ALL_: Multiple DM domains that exclude specific domains delimited with a -.
SUPP**: Wildcard to include multiple domains.
CLASS:EVENTS: All domains capturing event results. (This syntax specifies to use table metadata column CLASS for EVENTS as the value-similar syntax for all other fields and values.)
[_ALL_-DM][DM]: Bracket syntax to define sublists for comparative purposes. In this example, all non-DM domains are compared with the DM domain.
See the Validation Master data set for a full set of values.
This column is required.
columnscope
$200
The value specifies one or more space-delimited columns identified for inclusion or exclusion in the specified check. The value can be in the form:
_ALL_: All columns (equivalent to ** or a null value).
_NA_: Not applicable (that is, domain-level check).
AGE: Any single column. This value can be specified as libref.domain.column or domain.column.
ARM+ARMCD: Multiple columns delimited with a +.
**BLFL-LBBLFL: Multiple columns that exclude specific columns delimited with a -.
**DTC: Wildcard to include multiple columns with ** representing the domain name.
xxx**: (For example, AE**, where ** is a column wildcard).
[**STDTC][**ENDTC]: Bracket syntax to define sublists for comparative purposes. In this example, all start dates are compared with all end dates. The number of columns in each sublist must be equivalent.
See the Validation Master data set for a full set of values.
This column is optional. (If null, the value is equivalent to _ALL_.)
codelogic
$2000
Check-specific code segment that is inserted into the check macro defined in codesource and consistent with codetype. The codelogic value enables check-level customization and allows the reuse of more general check macros. The field length of $2000 limits the code to short code segments, although referencing another macro or using %include expands this capability. The codelogic value can use global and local macro variables (for example, variables provided as macro input parameters and variables set within the calling code). Examples include:
If ( . < &_cstColumn1 <
&_cstColumn2), then _cstError=1;
%include <fileref>
/* where <fileref> can be set outside of the SAS Clinical Standards Toolkit
or in the SASReferences control data set */
The previous code is limited to filerefs set outside of the SAS Clinical Standards Toolkit or in the SASReferences control data set.
%sdtmcheckutil_recordlookup
data _cstProblems;
set&_cstDSName;
if <some condition>;
run;
This column is optional.
codetype
8.
This value defines whether to use codelogic and what type of codelogic can be used in the validation code. Values include:
0: No codelogic used.
1: DATA step statement level. (For example, if &_cstColumn <0 then _cstError=1.)
2: Full DATA step, PROC SQL step, or multiple steps.
3: Calls a SAS macro or %include that can contain only DATA step statement level code. (For example, codetype=1.)
4: Calls a SAS macro or %include that can contain only full DATA step or PROC SQL step code. (For example, codetype=2.)
This column is required.
lookuptype
$20
This value defines the type of information to use for value comparison to some standard. Values include:
Metadata: Use the SAS Clinical Standards Toolkit metadata. Specifically, use the value of the column metadata field xmlcodelist to identify the codelist (rendered as a SAS format).
Format: Use a SAS format from the SAS format search path.
Dataset: Use a reference SAS data set (for example, medDRA). There are no SAS Clinical Standards Toolkit requirements for the structure and content of the reference SAS data set.
<extensible>: Other user-defined values can be used if there are explicitly referenced in user-written code.
This column in optional.
lookupsource
$32
The specific SAS format or file associated with lookuptype. For example:
If lookuptype is metadata, then lookupsource should be blank. The code gets the value from the source_columns.xmlcodelist field.
If lookuptype is format, then lookupsource should be the SAS format and must be in the format search path if it is specified. This value should generally match any value in source_columns.xmlcodelist for the columns specified in columnscope. This field allows a run-time validation check against another format.
If lookuptype is dataset, then lookupsource should be the name of a SAS data set. This value is specified as the data set name (for example, meddra) or libref.dataset. If a value is provided without a libref, then the SAS Clinical Standards Toolkit looks for any SASReferences type=referencecterm records for the sasref value.
This column is optional.
standardref
$200
Any reference to an associated standard definition, implementation guide, schema, and so on, that provides additional information about the check or describes the basis for the check in greater detail. This column is optional.
reportingcolumns
$200
This value includes columns not included in columnscope for code-processing purposes and to help resolve errors. If this value is specified, then it should be a space-delimited list of columns in the domains specified in the tablescope field. The values of these columns can be reported in the Results data set. This column is optional.
checkstatus
8.
This value determines whether the check is ready to be used and included in any Validation Control run-time data set. If the check is ready, then the value should be set to any positive integer. Values include:
0: (inactive, default)
>0: (active)
-1: (deprecated, archived)
-2: (not implemented in this SAS Clinical Standards Toolkit release)
This column is optional, although it is expected.
reportall
$1
This value enables more concise reporting of errors. Values include:
Y: (yes, report all records, default)
N: (no)
This column is required although not all check macro modules support abbreviated (N) reporting.
uniqueid
$48
This value provides a unique ID for the check. It ensures uniqueness in the data set and in the SAS Clinical Standards Toolkit. This value allows any provided or derived check to be uniquely identifiable over time. An example is SDTM000100CST120SDTM3112009-05-12T12:00:00CDI.
Legend:
characters 1-8: checkid
characters 9-10: checkid repeat indicator (00 unless multiple invocations of checkid are included)
characters 11-16: the version of the SAS Clinical Standards Toolkit where the check metadata was last materially modified
characters 17-23: standard version
characters 24-42: implementation datetime of the last metadata update
characters 43-48: assigning authority
This column is optional, although it is expected.
comment
$200
Any character string that provides comments relevant to the check. This column is optional.
The content of the Validation Master data set is based on a combination of compliance requirements and the SAS representation of the standard.
This table describes a sample Validation Master data set record for the CDISC SDTM 3.1.2 standard.
Sample CDISC SDTM 3.1.2 Validation Master Data Set Record
Column Name
Column Value
Comment
checkid
SDTM0207
The SAS Clinical Standards Toolkit check identifier used in validation results and reports.
standard
CDISC-SDTM
The registered standard.
standardversion
***
The standard version. A value of *** indicates that the check is applicable to all versions of the standard.
checksource
WebSDM
This check originated as a WebSDM check.
sourceid
IR5010
WebSDM check IR5010.
checkseverity
Warning
checktype
ColumnValue
codesource
cstcheck_column
This check uses the cstcheck_column check macro in the SAS Clinical Standards Toolkit autocall library.
usesourcemetadata
Y
This check is run on source data domains.
tablescope
_ALL_
This check is run on all domains.
columnscope
VISITNUM
This check evaluates VISITNUM values from each domain.
codelogic
_vnum=kstrip(put(&_cstColumn,best.));_dot=kindexc(_vnum,".");if _dot then if length(ksubstr(_vnum,_dot+1))>3 then _cstError=1;
This logic is used in cstcheck_column. Errors are documented in a work._cstProblems data set.
lookuptype
lookupsource
standardref
reportingcolumns
checkstatus
1
reportall
Y
This check reports all errors that are identified.
uniqueid
SDTM020701CST150SDTM3122012-06-08T10:49:21CST
codetype
1
This code logic is used in the DATA step.
comment
The Validation Master data set contains all validation checks for a standard, whereas the Validation Control data set is the run-time equivalent and contains just the validation checks to be run in a validation process. The Validation Control data set is structurally equivalent to the Validation Master data set. For additional information about how the validation check metadata in the Validation Control data set is used in the SAS Clinical Standards Toolkit validation processes, see Special Topic: How the SAS Clinical Standards Toolkit Interprets Validation Check Metadata.

Supplemental Validation Check Metadata: Validation Standard References

The validation standard references data set contains additional information about each of the checks in the Validation Master data set. This data set is used in the validation metadata reporting process to provide additional information to you about the origin of the check. It also provides any supporting documentation about the check. By default, this data set is deployed to this directory in each supported standard:
global standards library directory/standards/<standard>/validation/control
Column Descriptions of the Validation_StdRef Data Set
Column Name
Column Length
Description
checkid
$8
The validation check ID, as specified in the Validation Master data set. (See Column Descriptions of the Validation Master Data Set.)
standard
$20
This value captures the standard name. This value must match the standard in the associated Validation Master data set. This column is required.
standardversion
$20
This value captures a specific version of a standard. This value should be the version for which the supplemental reference information is applicable. This column is required.
informationsource
$80
This value captures the origin of the reference information. The value can be an implementation guide, web site, harmonization document, and so on. It can be any source that can be referenced that provides insight into the check.
sourcelocation
$200
This value contains the location in the information source, such as a page number or a section number.
seqno
8.
This value provides a sequence number for checkid if multiple sources of information are available for a check. This column is required.
sourcetext
$2000
This value captures descriptive information from the source that supports the check. This information attempts to provide a basis for inclusion of the check.
The content of the Validation_StdRef data set is based on information from any source that supports the check.
This table describes information about a specific check in the Validation_StdRef data set (record 1) for the CDISC SDTM 3.1.2 standard.
Sample CDISC SDTM 3.1.2 Validation_StdRef Data Set for Check SDTM0207 — Record 1
Column Name
Column Value
Comment
checkid
SDTM0207
The SAS Clinical Standards Toolkit check identifier used in results and reports.
standard
CDISC-SDTM
The registered standard.
standardversion
3.1.2
The standard version.
informationsource
SDTM 3.1.2 Implementation Guide
This reference information originated from the SDTM 3.1.2 Implementation Guide.
sourcelocation
5.3.2, page 72
Section 5.3.2, page 72 of the SDTM 3.1.2 Implementation Guide.
seqno
1
The first record for this checkid.
sourcetext
Clinical encounter number. (Decimal numbering might be useful for inserting unplanned visits.)
The text of the information retrieved from section 5.3.2, page 72 of the SDTM 3.1.2 Implementation Guide.
This table describes information about a specific check in the Validation_StdRef data set (record 2) for the CDISC SDTM 3.1.2 standard.
Sample CDISC SDTM 3.1.2 Validation_StdRef Data Set for Check SDTM0207 — Record 2
Column Name
Column Value
Comment
checkid
SDTM0207
The SAS Clinical Standards Toolkit check identifier used in results and reports.
standard
CDISC-SDTM
The registered standard.
standardversion
3.1.2
The standard version.
informationsource
WebSDM
This reference information originated from the WebSDM validation checks.
sourcelocation
Convention
Compliance convention set by WebSDM.
seqno
2
The second record for this checkid.
sourcetext
Compliance convention set by WebSDM. No supporting implementation guide found.
Representative text for an accepted convention.

Supplemental Validation Check Metadata: CDISC SDTM Domains by Check

The SAS Clinical Standards Toolkit validation metadata, as specified in the Validation Master data set, uses the tablescope and columnscope columns to define the scope of the check. The scope being what domains (tables) and what columns will be validated when the check is run. The SAS Clinical Standards Toolkit uses a shorthand syntax in these columns that is interpreted by the SAS Clinical Standards Toolkit framework macros to build a list of target tables and columns. For more information, see Special Topic: How the SAS Clinical Standards Toolkit Interprets Validation Check Metadata. The Validation_DomainsByCheck data set is supplied in global standards library directory/standards/cdisc-sdtm-3.1.x/validation/control. It contains records for each domain that is to-be-validated by each check in the Validation Master data set. This data set is used by reporting tools that are provided with the SAS Clinical Standards Toolkit to report domain-specific errors. For more information, see Reporting. It is also available to other programs and applications that might need to subset checks that are applicable to specific domains.
The SDTM version of the Validation_DomainsByCheck data set that is supplied by SAS is built from the version of the Validation Master data set that is also supplied by SAS. If the tableScope and columnScope columns are modified, then the Validation_DomainsByCheck data set must also be modified or rebuilt.
Column Descriptions of the Validation_DomainsByCheck Data Set
Column Name
Column Length
Description
checkid
$8
The validation check ID, as specified in the Validation Master data set. (See Column Descriptions of the Validation Master Data Set.)
table
$32
This value captures the domain or table name. This column is required.
standardversion
$20
This value captures a specific version of a standard. This value must match standardversion in the associated Validation Master data set.
checksource
$40
A string that identifies the source of the check. This value must match checksource in the associated Validation Master data set.
resultseq
8.
The unique invocation of a check within the Validation Master data set. This value is incremented if multiple record or domain combinations exist.
For CDISC SDTM 3.1.2 validation check SDTM0207, the Validation_DomainsByCheck data set contains records for 14 domains. These 14 domains are DA, EG, FA, IE, LB, MB, MS, PC, PE, PP, QS, SV, TV, and VS. The target domains and columns for check SDTM0207 are defined as tableScope=_ALL_ and columnScope=VISITNUM. This means there are 14 domains in the sample study metadata provided for CDISC SDTM 3.1.2 that contain the column VISITNUM.

Supplemental Validation Check Metadata: CDISC ADaM Class by Check

For CDISC ADaM, the supplemental data set is called Validation_ClassByCheck. It is located at:global standards library directory/standards/cdisc-adam-2.1-1.5/validation/control.
This data set is patterned after the data set that is described in Column Descriptions of the Validation_DomainsByCheck Data Set. However, the column class ($40, Observation Class within Standard) has been added. This addition accommodates the different way that the ADaM reference standard is defined. For example, the reference_tables data set, located in /standards/cdisc-adam-2.1-1.5/metadata, includes a BDS record that serves as a class template for all specific implementations of BDS that are required for a study. The SAS Clinical Standards Toolkit does not know each of the specific analysis data sets, so the Validation_ClassByCheck data set includes records by class, not by domain, for each check in the ADaM Validation Master data set.

Validation.Properties

Properties specific to validation processes are provided with the SAS Clinical Standards Toolkit. These properties enable you to specify how validation checks are to be processed and whether metrics are to be reported.
As with all SAS Clinical Standards Toolkit properties files, a call to the %cst_setproperties macro is required to translate the properties into SAS global macro variables. This call can be explicitly made as a driver module setup task, or it can be made by including the Validation.Properties file as a record in the SASReferences data set. For all standards that support validation, the Validation.Properties file is required, even if no metrics are wanted because the SAS Clinical Standards Toolkit validation process does expect, and will use, the metrics global macro variables.
This table describes the properties in the Validation.Properties file.
Properties in the Validation.Properties File
Property Name
Description
_cstCheckSortOrder
This property determines the order in which validation checks are processed. If no value is provided, or the default value _DATA_ is used, then the data set order is assumed. Or, _cstCheckSortOrder can be set to sort the Validation Control data set at run time by any fields in that data set (for example, CHECKSOURCE CHECKID).
_cstMetrics
This property determines whether to calculate and report metrics. An example value is 1=Yes.
_cstMetricsDS
This property sets the SAS data set name to use to accumulate metrics during the process. The default value is work._cstmetrics.
_cstMetricsNumSubj
_cstMetricsCntNumSubj
This property determines whether to calculate and report subject-level counts. An example value is 1=Yes, initialize _cstMetricsCntNumSubj to 0. The calculation of subject-level counts might not be appropriate for all check macros.
_cstMetricsNumRecs
_cstMetricsCntNumRecs
This property determines whether to calculate and report record-level counts. An example value is 1=Yes, initialize cstMetricsCntNumRecs to 0.
_cstMetricsNumChecks
_cstMetricsCntNumChecks
This property determines whether to summarize and report the number of checks run. An example value is 1=Yes, initialize cstMetricsCntNumChecks to 0.
_cstMetricsNumBadChecks
_cstMetricsCntNumBadChecks
This property determines whether to summarize and report the number of check invocations that failed. An example is 1=Yes, initialize cstMetricsCntNumBadChecks to 0.
_cstMetricsNumErrors
_cstMetricsCntNumErrors
This property determines whether to summarize and report the total number of errors (resultseverity=Error) found. An example is 1=Yes, initialize cstMetricsCntNumErrors to 0.
_cstMetricsNumWarnings
_cstMetricsCntNumWarnings
This property determines whether to summarize and report the total number of warnings (resultseverity=Warning) found. An example is 1=Yes, initialize cstMetricsCntNumWarnings to 0.
_cstMetricsNumNotes
_cstMetricsCntNumNotes
This property determines whether to summarize and report the total number of notes (resultseverity=Note) found. An example value is 1=Yes, initialize cstMetricsCntNumNotes to 0.
_cstMetricsNumStructural
_cstMetricsCntNumStructural
This property determines whether to summarize and report the total number of structural (metadata) errors found. An example value is 1=Yes, initialize cstMetricsCntNumStructural to 0.
_cstMetricsNumContent
_cstMetricsCntNumContent
This property determines whether to summarize and report the total number of content (data) errors found. An example value is 1=Yes, initialize cstMetricsCntNumContent to 0.
_cstMetricsTimer
This property determines whether to report the elapsed time for each check invocation. An example value is 1=Yes.
By default, for all standards that support validation, Validation.Properties is here:
global standards library directory/standards/<standard>/programs
Properties can logically be associated with each study. Using the CDISC SDTM 3.1.1 sample study provided with the SAS Clinical Standards Toolkit as an example, a study-specific instance of the Validation.Properties file is located at: sample study library directory/cdisc-sdtm-3.1.3–1.5.

Messages

Each SAS Clinical Standards Toolkit registered standard that supports validation has a Validation Master data set, and an associated Messages data set. The Validation Master data set provides the super-set of checks defined for that standard. The Messages data set provides messages to be generated during the execution of each validation process. A distinct Messages data set record is expected for each set of checkid and checksource values in the Validation Master data set. Messages can be parameterized and internationalized.
By default, the standard-specific Messages data set is deployed to this directory in each supported standard:
global standards library directory/standards/<standard>/messages
All Messages data sets in the SAS Clinical Standards Toolkit should have the same structure. The structure is defined in Metadata File Descriptions.
During a process, the SAS Clinical Standards Toolkit appends any standard-specific messages that are required by the process to any generic SAS Clinical Standards Toolkit framework messages that are available to all processes. This appended Messages data set follows the naming convention that is defined within the global macro variable _cstMessages.

Validation Metrics

Generating the SAS Clinical Standards Toolkit validation metrics provides a meaningful denominator for most validation checks. This enables you to more accurately assess the relative scope of errors that are detected. Generally, the calculated denominator is a count of the number of records processed in a domain.
This code segment, which is extracted from a validation check macro, shows a typical calculation of the number of records in a domain. It also shows the macro call to add the count to the Validation Metrics data set:
data _null_;
if 0 then set &_cstDSName nobs=_numobs;
call symputx('_cstMetricsCntNumRecs',_numobs);
stop;
run;
 
* Write applicable metrics *;
%if &_cstMetrics %then %do;
%if &_cstMetricsNumRecs %then
   %cstutil_writemetric(
     _cstMetricParameter=# of records tested,
     _cstResultID=&_cstCheckID,
     _cstResultSeqParm=&_cstResultSeq,
     _cstMetricCnt=&_cstMetricsCntNumRecs,
     _cstSrcDataParm=&_cstDSname
   );   
%end; 
Because a check can evaluate multiple columns in a domain, the count will be greater. In addition, a metadata-level check that does not access the domain data directly might report the number of metadata records instead.
Metrics processing is enabled based on settings in the Validation.Properties file. See Properties in the Validation.Properties File.
This table provides a description of the Validation Metrics data set, including the meaning of each field.
Column Descriptions of the Validation Metrics Data Set
Column Name
Column Length
Description
metricparameter
$40
A descriptive text string that specifies the metric of interest. This string is hardcoded in the check macro and cannot be modified without code changes. Values should be non-null.
reccount
8.
A count of the number of records specific to the combination of metricparameter and resultid. This number is derived in the check macro and cannot be modified without code changes. This column can contain a summary count of records written to the Results data set (resultid=METRICS). Reccount can be null for selected metricparameters, such as the assessment of elapsed time for each check.
resultid
$8
The resultid is either the checkid or a hardcoded constant such as METRICS. The SAS Clinical Standards Toolkit has adopted a naming convention matching each standard. The checkid (resultid) values are prefixed with an up to 4-character prefix (CST for framework messaging; CDISC examples: ODM, SDTM, ADAM, and CRT). By convention, the prefix matches the mnemonic field in the Standards data set in global standards library directory/metadata. This prefix is followed by a 4-digit numeric that is unique within the standard (for example, SDTM1234). You can use any naming convention limited to eight characters. Values should be non-null.
srcdata
$200
The string that specifies the domain or check macro to which the metricparameter applies. Values should be non-null.
resultseq
8.
A counter that indicates the record number in checkid in the Validation Control run-time set of checks. If set to 1, then this counter is incremented only with each repeat invocation of a check. This value enables you to link to the Validation Control and Results data sets. Values should be non-null.
This display illustrates Validation Metrics output from a SAS Clinical Standards Toolkit validation process running CDISC SDTM 3.1.1 validation. The Validation Control data set contains three records: two SDTM0451 checks and one SDTM0623 check.
Sample Validation Metrics Data Set
Sample Validation Metrics data set
Lines 1 through 2 document that the SDTM0451 check was invoked twice. The missing recount value and the absence of other metrics indicate that the two check invocations failed. This should be reported in the Results data set.
Lines 3 through 7 provide metrics information about the SDTM0623 check. SDTM0623 checks that multiple standard units do not exist for any test in the findings domains. The SDTM0623 check was run on two domains using the cstcheck_notunique check macro. The number of subjects and records tested, and the elapsed time to run the check are reported.
Lines 8 through 14 are summary metrics reported at the end of the SDTM validation process in the sdtm_validate macro. There are no errors. It is noted that two checks could not be run (lines 9 and 14).
For more information about the Validation Metrics data set, see Column Descriptions of the Validation Metrics Data Set.