Building a SASReferences File

Each SASReferences file requires content that is specific to its planned use. For example, a SAS Clinical Standards Toolkit process that creates a define.xml file requires the specification of XML and recommends the specification of style sheet information. A SAS Clinical Standards Toolkit process that validates data against a standard requires the specification of the validation checks to be run.
The SAS Clinical Standards Toolkit offers several ways to create a SASReferences file for use in subsequent processes.
  1. Use sample SASReferences files that are provided with the SAS Clinical Standards Toolkit. These sample SASReferences files contain the required and optional contents for specific tasks. For example, the task of validating the functionality of CDISC SDTM 3.1.2 uses the SASReferences file found in this location in SAS 9.3:
    !sasroot/../../SASClinicalStandardsToolkitSDTM312/1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata/control
    An excerpt of this sample SASReferences file is provided in A Sample SASReferences Data Set.
  2. The SAS Clinical Standards Toolkit provides SASReferences templates for use. These templates are either zero-observation data sets or data sets containing records that must be modified. A SASReferences data set template is here:
    <global standards library directory>/standards/cst-framework-1.4/templates
    The SAS Clinical Standards Toolkit provides default SASReferences data sets for each supported standard. These default SASReferences data sets contain records that are commonly required for certain SAS Clinical Standards Toolkit tasks (such as validation). However, all records that are required might not be included. Or, all records that are included might not be required for certain tasks. And, SAS librefs, filerefs, paths, and memname values might require modification. For example, see the StandardSASReferences data set found in:
    <global standards library directory>/standards/cdisc-sdtm-3.1.2-1.4/control
  3. The SAS Clinical Standards Toolkit provides the utility macros to build and return many SAS Clinical Standards Toolkit metadata data sets.
    • The %cst_getstandardsasreferences macro returns the StandardSASReferences data set. (See the file description in Metadata File Descriptions for the specified standard.)
    • The %cst_createds macro can be used to return an empty SASReferences data set.
      Use of these utility macros is illustrated later in this chapter.
The primary function of the SASReferences file is to define the SAS Clinical Standards Toolkit process inputs and outputs. What information does the process need to reference? What does the process produce? Where does the information come from and go? The “what” information is determined by the use of two SASReferences fields—type and subtype. The “where” information is determined by path and memname. The values for all of these fields are restricted for the SAS Clinical Standards Toolkit to values itemized in the framework Standardlookup data set found in:
<global standards library directory>/standards/cst-framework-1.4/control/standardlookup.sas7bdat
Customizing the type and subtype values in the Standardlookup data set is allowed. Customization is a prerequisite if you want to use the field values in any SASReferences data set that is used by the SAS Clinical Standards Toolkit.
This table lists and describes the acceptable type and subtype values in the framework Standardlookup data set.
SAS Clinical Standards Toolkit SASReferences Type and Subtype Values
Type
Subtype
Comments
autocall
One record for each library that contains macros to be included in the SAS autocall path. Typically, this includes one record for each standard that is referenced in the SASReferences file, excluding the SAS Clinical Standards Toolkit framework. The framework and cross-standard macros are already included in the autocall path at product deployment. User-written macros, as referenced in one or more additional code libraries, require an autocall record for each library.
classmetadata
column or table
Identifies the SAS data sets (sasref.memname) that contain the column and table metadata for specific CDISC SDTM template data sets that are used to build standard SDTM-compliant data sets. This type is provided by default in StandardSASReferences and is optional.
codemodule
Identifies an “external” code segment, which is identified using a SAS fileref, that might be included (%include) in a SAS Clinical Standards Toolkit process. Examples include code that derives a CDISC ADaM data set or that generates an ADaM report.
control
validation or reference
Identifies any run-time process control file, including the SASReferences data set itself. (In other words, it is a self-documentation record). For the SAS Clinical Standards Toolkit validation processes, the Validation Control data set that specifies the validation checks to be run is identified with subtype=validation.
externalxml
xml or tlfxml
Identifies an external XML file. Depending on the standard version and the subsequent macro that is called, this file can be read or written. Using CDISC CRT-DDS as an example, this type specifies the define.xml file that is created when the %crtdds_write() macro is called. When the %crtdds_read() macro is supported, this type identifies the XML file to be read. TLFXML refers to the tables, listings, and figures XML file that is used in ADaM 2.1.
fmtsearch
Provides a way to build the format search path for a validation process. The SAS Clinical Standards Toolkit sets the SAS fmtsearch type based on each record, specifying a SAS catalog that uses the order=n sequence. This type is not provided by default in StandardSASReferences, so you must specify a value. The type=fmtsearch value is optional unless one or more checks are to be run that assess value compliance against a SAS format.
lookup
Identifies a data set (Standardlookup) that is associated with each The SAS Clinical Standards Toolkit standard that contains valid values for discrete metadata fields. This type is provided by default in StandardSASReferences and is required for each standard. For example, the valid values for type and subtype that are documented in this table have been defined in one or more SAS Clinical Standards Toolkit Standardlookup data sets.
messages
Identifies one or more Messages data sets that are associated with each SAS Clinical Standards Toolkit standard. This type is provided by default in StandardSASReferences. You must specify value only with user customizations that require new or modified messages. The SAS Clinical Standards Toolkit populates the data set that is referenced by the global macro variable &_cstMessages with all Messages data sets that are included in SASReferences. This type is required for each standard.
properties
initialize, validation, or report
Initializes a standard version's required macro variables. Specification in SASReferences is optional. (These macro variables can be defined with calls to %cst_setstandardproperties or %cst_setproperties instead.) Each standard should have at least one properties (initialize) file. Each standard can have any additional files that are needed. A subtype=validation value is specific to SAS Clinical Standards Toolkit validation processes.
referencecontrol
validation or standardref
If subtype=validation, then the value identifies the standard-supplied master super-set of supported validation checks. Although this is key metadata, it is not typically referenced at run time and does not need to be included. It is the Validation Control file that is identified with type=control and subtype=validation that must be included.
If subtype=standardref, then the value identifies an optional data set that contains a list of references that provide the basis for each validation check that is included in the subtype=validation data set.
referencecterm
Identifies a SAS data set (sasref.memname) that most often contains controlled terminology, as opposed to a SAS format containing controlled terminology (for example, medDRA). The type=referencecterm value is optional unless one or more checks are to be run that assess value compliance against a SAS data set.
referencemetadata
column or table
Identifies the SAS data sets (sasref.memname) that contain the column and table metadata for a standard version. This type is provided by default in StandardSASReferences, so you must specify a value only to override the default for the standard. Records for both subtypes are required.
referencexml
stylesheet, map, or tlfxml
If subtype=stylesheet, then this value identifies the directory and filename of an XML style sheet. In the production of CDISC CRT-DDS XML files, this value should point to the style sheet to be copied into the directory with the XML file.
If subtype=map, then this value identifies the persisted location of a SAS XML map file. The SAS XML map file reads the Work cube.xml file generated by the SAS Clinical Standards Toolkit that translates an XML file into the SAS representation of the XML-based standard (such as CDISC CRT-DDS and CDISC ODM).
report
library or outputfile
Specifies the storage location of the SAS Clinical Standards Toolkit process reports. If a single, specific report is referenced, then it can be specified with a subtype of outputfile, a valid path, and valid memname values. If the process produces multiple reports, then a subtype of library is used with a valid path to the directory or folder. In the latter case, default report names as defined in the code are used.
results
results or validationresults, metrics or validationmetrics
Specifies the storage location of the Results and Metrics data sets that are generated by the SAS Clinical Standards Toolkit process. The Metrics data set is specific to the SAS Clinical Standards Toolkit validation processes and is optional depending on property settings. A results/validationresults record is required.
resultspackage
xml or log
This type is not used in the SAS Clinical Standards Toolkit 1.4. This type bundles a set of process inputs and outputs together for later access.
sourcedata
Defines the folder location of the data for a specific study. This type is required for validation processes if one or more checks are to be run that access a specific source data domain.
sourcemetadata
column, table, or study
Identifies the SAS data sets (sasref.memname) that contain the column and table metadata for a study or set of source data. This type is not provided by default in StandardSASReferences, so you must specify a value. Records for both subtypes are required.
standards
registeredstandards or registeredsasreferences
Identifies the template for the registered Standards and SASReferences data sets, respectively. This value is used by the framework when the global metadata library is created. This type is not used in post-deployment processes.
targetdata
Defines the location of the data to be derived for a specific standard. For example, for CDISC CTR-DDS, the crtdds_read macro derives a set of CRT-DDS data sets from the referenced define.xml file. This type is optional.
targetmetadata
column, table, or study
Identifies the SAS data sets (sasref.memname) that contain the column, table, and study metadata to be derived for a specific standard. For example, for CDISC CRT-DDS, the crtdds_read macro derives files that describe metadata about the targetdata data sets that are derived from the referenced define.xml file. If this type is used, then a record for each subtype is required.
transport
This type is not used in the SAS Clinical Standards Toolkit 1.4. This type identifies a library of SAS transport files that are optionally referenced by a define.xml file.
Every instance of the SASReferences file does not require a specific path and filename. At the beginning of this section, a call to this macro was described:
%cst_getstandardsasreferences(_cstStandard=CST-FRAMEWORK,_cstStandardVersion=1.2,
_cstOutputDS=sasreferences);
This macro call produces this SASReferences file:
Standard SASReferences File for CST-FRAMEWORK
SASReferences file that the macro call produces
Note the SASref and path fields. For most rows, SASref is set to csttmp and path is set to &_cstGRoot/standards/cst-framework-1.4/templates. The memname field points to empty examples of each file type. From a generic SAS Clinical Standards Toolkit framework perspective, these are the best available file references. All SAS Clinical Standards Toolkit processes require specification of some of these data and metadata sources (for example, generic properties, messages, and process results).
Standard SASReferences for CDISC SDTM shows the information returned by this call to %cst_getstandardsasreferences for the CDISC SDTM standard.
%cst_getstandardsasreferences(_cstStandard=CDISC-SDTM, _cstOutputDS=sasreferences);
Standard SASReferences for CDISC SDTM
SASReferences file that the macro call returns
A comparison of Standard SASReferences File for CST-FRAMEWORK and Standard SASReferences for CDISC SDTM shows little similarity in the record types and no overlap in references to specific files. The target inputs and outputs for CDISC SDTM are more focused on the task (for example, validating SDTM domains). The SAS Clinical Standards Toolkit validation processes require specification of a comparative reference standard. Here, there are references to a standard-specific macro library (autocall), Messages data set, and properties files. Unique SASref values by type are provided, pointing to distinct files and folders in the global standards library.
Consider an actual SASReferences file built to support CDISC SDTM 3.1.2 validation. The task of validating the functionality of CDISC SDTM 3.1.2 uses the SASReferences file in this location in SAS 9.3:
!sasroot/../../SASClinicalStandardsToolkitSDTM312/1.4/sample/cdisc-sdtm-3.1.2/sascstdemodata/control
This display shows the complete contents of the SASReferences file.
Sample SASReferences File for CDISC SDTM Validation
Complete SASReferences file contents
Explanation of Sample SASReferences File for CDISC SDTM Validation
Lines
Comment
1
Instructs the SAS Clinical Standards Toolkit to add any SDTM-specific macros to the autocall path.
2
Documents the name and location of this file. This information is used in the sample reports that are discussed in this document.
3
Points to the set of validation checks to be run in this validation assessment. The framework default values for SASref, path, and memname have been overridden.
4, 18
Two standards are referenced to create a format search path. Line 4 references the SDTM study-specific formats catalog. Line 18 references the more general CDISC Terminology cterms catalog. The precedence is set by the order column.
5, 19
These records are identical to the CST-FRAMEWORK and CDISC-SDTM StandardSASReferences records.
6
Illustrates the call to a standard-specific properties file that is used to initialize a global macro variable that is specific to that standard. Referencing a standard-specific properties files in the SASReferences data set is recommended. The call to the CST-FRAMEWORK initialize.properties file is a prerequisite setup step outside of SASReferences and performed before processing SASReferences.
7
The validation properties path has been modified to point to a location in the study hierarchy, rather than to the global standards library that is defined in the StandardSASReferences file.
8–9
11–12
Points to the reference standard for CDISC SDTM 3.1.2, but unlike the template defaults in Standard SASReferences for CDISC SDTM, path and memname are blank. Leaving them blank tells the SAS Clinical Standards Toolkit to look in the CDISC SDTM 3.1.2 StandardSASReferences file and use the defaults for that standard and version. This convention facilitates portability of the data set by doing a run-time lookup for the current information. The lookup results in the inclusion of the path and memname values as defined in Standard SASReferences for CDISC SDTM.
10
References a medDRA data set that is maintained in the study-specific hierarchy. A more common implementation might reference a non-study-specific coding dictionary.
13-14
Specifies that process results are to be stored in a location in the study hierarchy.
15
This is a new type not in the template files (StandardSASReferences). It defines the location of the study (source) data. The use of &studyRootPath, coupled with the assumption of a fixed-folder hierarchy, enables portability across studies. The memname value is not relevant for a library of SAS data sets.
16-17
These source metadata references are new. These values follow the style used in line 15 for source data. The same SASref is used for multiple subtypes in a single type because the subtypes reference two differently named SAS data sets from the same folder.
An alternative way to build the SASReferences file is to use the %cst_createds utility macro.
%cst_createds(_cstStandard=CST-FRAMEWORK,_cstType=control,_cstSubType=reference,
_cstOutputDS=work.sasreferences);
proc sql;
insert into work.sasreferences
values(CST-FRAMEWORK 1.2 messages messages libref 1 );
.
.
.
quit;
This macro copies the template. New records can be added various ways, including the previous PROC SQL technique. There is no requirement that the SASReferences file has to live outside the SAS Work area and be kept beyond the SAS Clinical Standards Toolkit process. However, these are best practices that enable future capabilities such as process reruns and reporting.