Special Topic: Validation Customization

Overview

One of the significant benefits of the SAS Clinical Standards Toolkit is that you can customize the solution to meet your needs. From a validation perspective, this includes:
  • modifying an existing standard or defining a new reference standard
  • using any set of source data and metadata
  • modifying the SAS validation checks for supported standards
  • adding new validation checks for supported standards
  • modifying existing validation check macros or adding new macros
  • modifying the SAS Clinical Standards Toolkit messaging, including internationalization
  • attempting to validate multiple studies in a single validation process

Case Study 1: Modifying an Existing Standard or Defining a New Reference Standard

Source data and metadata are validated in the SAS Clinical Standards Toolkit against a reference standard. For CDISC standards, the SAS Clinical Standards Toolkit provides a SAS interpretation of the supported CDISC standards. Because CDISC standards are guidelines, they are open to interpretation and customer-specific implementations. Not all clinical studies have all CDISC-defined standard domains, and most clinical studies have additional domains reflecting the focus of the clinical study. In addition, CDISC SDTM domain classes (findings, events, and interventions) enable the inclusion and exclusion of most columns, depending on the clinical data points collected in the study. CDISC guidelines generally do not specify column lengths.
Each of these factors suggests that the SAS Clinical Standards Toolkit CDISC reference standards will be modified or replaced with customer-derived standards. The SAS Clinical Standards Toolkit offers the option of building a reference standard to encompass domain and column customizations. Or, you can customize check macros and check logic to perform specific compliance assessments to a standard. For example, in CDISC SDTM, it is not uncommon to build multiple supplemental qualifier domains (for example, SUPPAE) associated with a core reference domain (for example, AE). It is at the customer's discretion whether the reference standard is modified to include each unique supplemental qualifier domain, or to use existing SAS Clinical Standards Toolkit validation check macros with unique code logic or custom check macros to validate the custom domains. These latter options are discussed in the following case studies.
It is likely that customers will derive multiple reference standards. From a SAS Clinical Standards Toolkit validation perspective, the only relevant reference standard is the one defined in the SASReferences data set (as type=referencemetadata).
For information about registering a new standard in the SAS Clinical Standards Toolkit, see Registering a New Version of a Standard.

Case Study 2: Using Any Set of Source Data and Metadata

From a SAS Clinical Standards Toolkit perspective, a source study is defined by the study domains, the study metadata represented in the source_tables and source_columns data sets, and anything that might be unique to a specific study, including controlled terminologies, properties, validation checks, and associated messages.
One key SAS Clinical Standards Toolkit requirement is that source study elements should be kept in synchronization. Another key requirement is that all relevant source study elements should be accurately represented in a SASReferences data set. The synchronization of study elements is a task that is often performed outside the SAS Clinical Standards Toolkit. The study data libraries must contain the domains of interest, the study metadata must provide the complete set of table-level and column-level metadata necessary to describe the source data, and any format catalogs and coding dictionaries supporting the study must be available.
Tip
Best Practice Recommendation: If a standard folder hierarchy is adopted for source studies, such as in the SAS Clinical Standards Toolkit CDISC SDTM 3.1.3 sample study (sample study library directory/cdisc-sdtm-3.1.3-1.6/sascstdemodata), using generic SASReferences files that use &studyRootPath in the path field might facilitate referencing new source studies.

Case Study 3: Modifying the SAS Validation Checks for Supported Standards

This case study addresses adding multiple instances of existing checks. The most common ways to modify SAS validation checks include:
  • Altering the scope of the domains and columns to be validated. Many checks are defined to be run against specific domains or columns, against specific classes of domains (for example, CDISC SDTM findings, events, or interventions), or against all available domains or columns. As you find it useful to modify a reference standard (for example, to include other domains you consistently use) or you have one or more studies that have new domains, changes are likely to involve alterations to the Validation Master and Validation Control (run time) tablescope or columnscope fields.
  • Changing the Validation Control codelogic field to alter the logic used to identify error conditions. This might be a necessary change if a check needs to be generalized to accommodate new domains or columns. Or, customer conventions might differ from those in the SAS Clinical Standards Toolkit checks.
  • If customer code changes are sufficiently significant, then it might be better to create a new validation check macro. (See Case Study 5: Modifying Existing Validation Check Macros or Adding New Macros.) If a new validation check macro is required, then the Validation Control codesource field needs to be modified to contain the name of the new check macro.
  • The Validation Control uniqueid field provides a way to uniquely identify a specific validation check for reference. Any substantive change to any Validation Control data set check field normally leads to a new uniqueid. For information about the structure of uniqueid, see Column Descriptions of the Validation Master Data Set.
  • The Validation Control checkstatus field provides an easy way to identify selected checks with a user-defined status (for example, draft, deprecated, or not available for a given study). The SAS Clinical Standards Toolkit does not reference this field within any validation check macro.
  • The Validation Control lookupsource field can be changed to reference a different SAS format or lookup data set (for example, a new version of MedDRA). In the latter case, a change to the pathname, memname, or both fields in the SASReferences data set might be a more appropriate action.

Case Study 4: Adding New Validation Checks for Supported Standards

To add a new validation check, consider this checklist:
  • Check metadata must conform to the Validation Master structure. (For more information, see Framework.)
  • Certain Validation Master fields accept any user-defined value (for example, checksource, sourceid, checktype, standardref, and checkstatus). These fields are not referenced by the validation check macros. The remaining fields are used in the validation check macros, so you must abide by the SAS Clinical Standards Toolkit conventions. These conventions are described in Framework.
  • A new check should be added to the (run time) Validation Control data set for testing. After testing, it can be promoted to the Validation Master data set to be available to applications and processes. These requirements follow a typical development process.
  • For each new validation check, a matching message is required. This is the message that you want written to the Results data set when an error condition is detected. For details, see Messages.
  • Use a similar validation check as a template to build the check metadata required by the SAS Clinical Standards Toolkit. Ask yourself the following types of questions:
    • What category or type of check is it?
      Look at the Validation Master data set checktype column. Does it look only at table or column metadata, and not at data values (Metadata)? Does it require a specific raw column value (ColumnValue), or a value that complies with some controlled terminology (Cntlterm)? Must the assessment look across multiple records (Multirecord) or multiple tables (Multitable)?
    • Does the check compare columns within a single table?
      Consider Validation Master records where the codesource column is cstcheck_columncompare, cstcheck_columnvarlist, or cstcheck_notunique.
    • Does the check compare tables?
      Consider Validation Master records where the codesource column is cstcheck_comparedomains or cstcheck_recnotfound.
    • Does the check look across multiple standards?
      Consider Validation Master records where the codesource column is cstcheck_crossstdcomparedomains or cstcheck_crossstdmetamismatch.
    • What tablescope and columnscope values are appropriate?
      • Tablescope
        Does the check apply to a specific class of tables (for example, Class:Findings)? Does the check apply to all tables for the standard (_ALL_)? Does the check apply only to one or more specific tables (for example, DM+TA)? Does the check apply to all tables except one (for example, _ALL_-DM)? Does the check compare the same column in two tables (for example, [DM][TA])?
      • Columnscope
        Does the check apply to all columns in the selected tables (_ALL_)? Does the check apply only to one column (for example, USUBJID)? Does the check compare two columns in the same table (for example, [AESDTH][AEOUT])? Does the check apply to all column names that end in a particular suffix (for example, **DTC)?
    • If column values are to be compared against an external source (coding dictionary or specific codelist), how are these values referenced for other checks in the lookuptype and lookupsource Validation Master columns?

Case Study 5: Modifying Existing Validation Check Macros or Adding New Macros

The SAS Clinical Standards Toolkit provides 21 validation check macros. These macros, located in the primary SAS Clinical Standards Toolkit autocall library, offer a variety of code examples that are available to all standards supporting validation. For information about the purpose and use of each check macro, see Special Topic: Validation Check Macros and the SAS Clinical Data Standards Toolkit: Macro API Documentation.
Some validation scenarios might require modifications to the SAS Clinical Standards Toolkit check macros or the derivations of new macros. If so, these guidelines should be followed. These guidelines facilitate the use of these macros in the general SAS Clinical Standards Toolkit framework and in the specific SAS Clinical Standards Toolkit validation framework.
  • Follow the current naming convention or adopt a consistent naming convention that conforms to SAS naming conventions.
  • Use the current autocall library or use a customized autocall library that has been defined in the SASReferences data set (type=autocall).
  • Conform to the basic check macro workflow. This workflow is described in Special Topic: Validation Check Macros.
  • Ensure that the macro correctly accepts and interprets the metadata provided as input from the Validation Control data set. If the new macro fails to do so, then it can be hardcoded to provide any specific functionality that is desired.
  • Ensure that the macro writes appropriate output to the Results and Metrics data sets.

Case Study 6: Modifying the SAS Clinical Standards Toolkit Messaging, Including Internationalization

This case study considers these three issues related to the support of the SAS Clinical Standards Toolkit messaging:
  1. Maintain the relationship between the SAS Clinical Standards Toolkit standard-specific messages and standard-specific validation checks.
  2. Maintain the relationship between messages and validation check macro code.
    (Deviations are acceptable to the extent that missing parameters have suitable defaults.)
  3. Internationalize messages.
A SAS Clinical Standards Toolkit message is created for each distinct combination of the Validation Master standard and checksource fields. This allows the SAS Clinical Standards Toolkit to support checksource-specific messaging and severity. A unique SAS Clinical Standards Toolkit message is required for each value of the Validation Master standardversion field if that value is not the wildcard ***.
Consider this CDISC SDTM 3.1.1 Validation Master record excerpt:
Validation Master Data Set Excerpt for Check SDTM0013
Validation Master data set record excerpt for Check SDTM0013
The SAS Clinical Standards Toolkit representation of the SDTM0013 check in the Messages data set is:
Messages Data Set Excerpt for Check SDTM0013
Messages data set excerpt
The Messages data set contains two records because there are two distinct checksource values for Validation Master checkid SDTM0013.
Consider this CDISC SDTM Validation Master record excerpt:
Validation Master Data Set Excerpt for Check CUST0073
Validation Master data set excerpt for Check CUST0073
Three separate invocations of CUST0073 are represented. Each record points to a different domain (tablescope). This example assumes that the CDISC SDTM 3.1.2 standard has been registered. The first and third records (AE and MH domains) indicate that this specific implementation of the check is applicable to all versions of CDISC SDTM. However, the second record is applicable to only CDISC SDTM 3.1.2 (because CE is a new domain in SDTM 3.1.2).
Only two Messages data set records are required:
Messages Data Set Excerpt for Check CUST0073
Messages data set excerpt for Check CUST0073
It is the distinct combinations of the Validation Master checkid, standardversion, and checksource fields that control the associated Messages data set records.
It is important to maintain the relationship between messages and validation check macro code. If the validation check macro code references an unknown resultid, the text <Message lookup failed to find matching record> is written to the Results data set.
The CUST0073 check defines a substitution parameter (&_cstParm1). (The SAS Clinical Standards Toolkit code assumes that message substitution parameters begin with the string &_cst.) For the calling validation check macro to support parameters when writing output to the Results data set, the parameters that are passed should be syntactically consistent with the messagetext field in the Messages data set.
Building the message record to use a default value (as specified in the parameter1 field) solves the problem when the calling macro fails to pass a substitution value. Using parameters is optional. Parameters might be needed only if the message is to be used in multiple contexts where substitutions of parameter values help interpret the message.
The SAS Clinical Standards Toolkit supports the internationalization of messages through specifying message file references in the SASReferences data set (type=messages). If referenced message files conform to the structure expected by the SAS Clinical Standards Toolkit, any text, including internationalized text, can be included.

Case Study 7: Validation of Multiple Studies

Most illustrations and discussions in this chapter assume a reference to a single clinical study. But, what if you need to validate multiple clinical studies at one time? A key consideration is the information that source data libraries and source metadata files contain, and how they should be referenced in the SASReferences data set used by the validation process.
Consider the following four methodologies, which are ordered based on estimated rates of adoption. Other candidate methodologies are possible.
  • A common methodology is to build single source data and metadata libraries that contain pooled data sets where metadata reconciliation has already occurred. (This is frequently done in integrated summaries of efficacy and safety.) In this case, the SASReferences data set will contain a single type=sourcedata record pointing to the pooled integrated data library. The SASReferences SAS librefs (where type=sourcemetadata) must match the source metadata library references in the sasref column of the table and column metadata data sets.
  • A second methodology is to build a SAS Clinical Standards Toolkit process that daisy-chains multiple job streams, where each study is defined in a unique SASReferences data set and validated independently. Within the same SAS session, unless your validation process deletes work files, the results and metrics files are appended. The files at the end of the process contain results for all studies.
  • An alternative approach defines a single SASReferences libref for multiple type=sourcedata records, each pointing to a different study source library. The SAS Clinical Standards Toolkit supports library concatenation, but SAS only reads data sets from the first defined library when the same data set name occurs in multiple libraries. Because standard domain names are expected, this approach does not work unless a unique domain-naming convention across studies is used. A similar approach is required for source metadata. These constraints make this approach less tenable.
  • Another alternative methodology is to use multiple SASReferences librefs (multiple type=sourcedata records). You have one for each study source library, and a single source metadata library (with one table and one column metadata data set, setting the SASRef column to each libref used in SASReferences). This methodology works for any validation check that does not compare columns across domains or compares domains.
    Source data libraries are considered when tablescope and columnscope parsing occurs in the SAS Clinical Standards Toolkit. However, if tablescope does not include the libref, unintended comparisons of multiple columns or multiple domains from different studies can occur. As a result, this methodology is not recommended unless you consistently use multiple librefs in the source metadata and validation check metadata.