Special Topic: Creating Study Source Metadata to Create a CDISC Define-XML 2.0 define.xml File

Overview

The typical SAS Clinical Standards Toolkit workflow that supports the creation of a Define-XML 2.0 file includes the definition of metadata that describes the study, domains, columns, codelists, value-level metadata, and supporting documents. A CDISC ADaM study can also include analysis results metadata.
This metadata is in the following SAS data sets:
  • source_study
  • source_tables
  • source_colums
  • source_codelists
  • source_values
  • source_documents
  • source_analysisresults
The %CST_CREATEDSFROMTEMPLATE macro can create these source metadata data sets with zero observations and based on a template. Here is the syntax:
%cst_createdsfromtemplate(
   _cstStandard=CDISC-DEFINE-XML,
   _cstStandardVersion=2.0.0,
   _cstType=studymetadata,
   _cstSubType=study,
   _cstOutputDS=work.source_study
   ); 
The valid values for the _cstSubType parameter are study, table, column, codelist, value, analysisresults, and document.
Part of the metadata in these data sets can be derived by macros in the SAS Clinical Standards Toolkit based on various inputs such as these:
Note: These macros attempt to create an approximation of source metadata. No assumptions should be made that the result completely represents the study metadata. Incomplete reference metadata might not enable imputation of missing metadata. You might need to add or update some metadata.
These macros are called by driver programs that are responsible for properly setting up each SAS Clinical Standards Toolkit process to perform a specific task. These driver programs are examples that are provided with the SAS Clinical Standards Toolkit. You can use these driver programs or create your own. The names of these driver programs are not important. However, the content is important and demonstrates how the various SAS Clinical Standards Toolkit framework macros are used to generate the required metadata files.

Creating Study Source Metadata from Study Domain Data Sets

The %DEFINE_CREATESRCMETAFROMSASLIB macro derives source metadata files from a data library that contains SAS study domain data sets.
Here is the general strategy:
  1. Use PROC CONTENTS output as the primary source of the information.
  2. Use reference_tables, reference_columns, class_tables, and class_columns for matching the columns to impute missing metadata when _cstUseRefLib=Y is specified.
The source data is read from a single SAS library. You can modify the code to reference multiple libraries by using library concatenation. Only one study reference can be specified. Multiple study references require modification of the code.
The create_sourcemetadata_fromsaslib.sas driver program is provided by SAS. It is ready to run on any of the SDTM or ADaM study data samples. The driver program can be run interactively or in batch. To run the driver program interactively, start a SAS session, and load the driver program into the SAS editor.
The driver program is located here:
sample study library directory/cdisc-definexml-2.0.0–1.7/programs
To create the source_codelists study metadata data set, you must specify two items: a list of format catalogs that define the study formats and a SAS data set that contains CDISC/NCI codelist metadata.
You might need to specify study metadata in the driver program.
Here is an example:
data work.studymetadata;
   studyname="CDISC01";
   studydescription="CDISC Test Study";
   protocolname="CDISC01";
   studyversion="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2";
run;
The parameters can be specified by using a SASReferences file or by specifying the parameters in the macro call.
Here are examples of calls to the %DEFINE_CREATESRCMETAFROMSASLIB macro using the two methods to specify the parameters:
%define_createsrcmetafromsaslib(
  _cstTrgStandard=&_cstTrgStandard,
  _cstTrgStandardVersion=&_cstTrgStandardVersion,
  _cstLang=en,
  _cstUseRefLib=Y,
  _cstKeepAllCodeLists=N
);	

%define_createsrcmetafromsaslib(
  _cstSASDataLib=srcdata,
  _cstStudyMetadata=work.studymetadata,
  _cstTrgStandard=&_cstTrgStandard,
  _cstTrgStandardVersion=&_cstTrgStandardVersion,
  _cstTrgStudyDS=trgmeta.source_study,
  _cstTrgTableDS=trgmeta.source_tables,
  _cstTrgColumnDS=trgmeta.source_columns,
  _cstTrgCodeListDS=trgmeta.source_codelists,
  _cstTrgValueDS=trgmeta.source_values,
  _cstTrgDocumentDS=trgmeta.source_documents,
  _cstTrgAnalysisResultDS=trgmeta.source_analysisresults,
  _cstLang=en,
  _cstUseRefLib=Y,
  _cstRefTableDS=refmeta.reference_tables,
  _cstRefColumnDS=refmeta.reference_columns,
  _cstClassTableDS=refmeta.class_tables,
  _cstClassColumnDS=refmeta.class_columns,
  _cstKeepAllCodeLists=Y,
  _cstFormatCatalogs=cstfmt.formats ncifmt.cterms,
  _cstNCICTerms=ncifmt.cterms
  );
For more information about the %DEFINE_CREATESRCMETAFROMSASLIB macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.
After the driver program runs, the srcmeta_saslib_results data set is created. This data set contains informational, warning, and any error messages that were generated by the driver program.

Deriving Study Source Metadata from an Imported Define-XML 2.0 File for a Similar Study

The %DEFINE_CREATESRCMETAFROMDEFINE macro derives source metadata files from a data library that contains the SAS representation of a Define-XML V2.0.0 define.xml file for a study.
Here is the general strategy:
  1. Use the SAS representation of a Define-XML V2.0.0 define.xml file as the primary source of the information.
  2. Use reference_tables, reference_columns, class_tables, and class_columns for matching the columns to impute missing metadata when _cstUseRefLib=Y is specified.
The following SAS data sets must exist in this Define-XML V2.0.0 SAS data set library:
aliases
itemrefwhereclauserefs
codelistitems
itemvaluelistrefs
codelists
mdvleaf
definedocument
mdvleaftitles
documentrefs
metadataversion
enumerateditems
methoddefs
externalcodelists
pdfpagerefs
formalexpressions
study
itemdefs
translatedtext
itemgroupdefs
valuelistitemrefs
itemgroupitemrefs
valuelists
itemgroupleaf
whereclausedefs
itemgroupleaftitles
whereclauserangechecks
itemorigin
whereclauserangecheckvalues
When creating the source_analysisresults data set, the following SAS data sets must exist in this Define-XML V2.0.0 SAS data set library:
analysisdataset
analysisresultdisplays
analysisdatasets
analysisresults
analysisdocumentation
analysisvariables
analysisprogrammingcode
analysiswhereclauserefs
The create_sourcemetadata_fromsasdefine.sas driver program is provided by SAS. It is ready to run on any SAS representation of a Define-XML V2.0.0 define.xml file for an ADaM or SDTM study. The driver program can be run interactively or in batch. To run the program interactively, start a SAS session, and load the driver program into the SAS editor.
The driver program is located here:
sample study library directory/cdisc-definexml-2.0.0–1.7/programs
The parameters can be specified by using a SASReferences file or by specifying the parameters in the macro call.
Here are examples of calls to the %DEFINE_CREATESRCMETAFROMSASLIB macro using the two methods to specify the parameters:
%define_createsrcmetafromdefine(
  _cstTrgStandard=&_cstTrgStandard,
  _cstTrgStandardVersion=&_cstTrgStandardVersion,
  _cstLang=en,
  _cstUseRefLib=Y
);

%define_createsrcmetafromdefine(
  _cstDefineDataLib=srcdata,
  _cstTrgStandard=&_cstTrgStandard,
  _cstTrgStandardVersion=&_cstTrgStandardVersion,
  _cstTrgMetaLibrary=trgmeta,
  _cstTrgStudyDS=trgmeta.source_study,
  _cstTrgTableDS=trgmeta.source_tables,
  _cstTrgColumnDS=trgmeta.source_columns,
  _cstTrgCodeListDS=trgmeta.source_codelists,
  _cstTrgValueDS=trgmeta.source_values,
  _cstTrgDocumentDS=trgmeta.source_documents,
  _cstTrgAnalysisResultDS=trgmeta.source_analysisresults,
  _cstLang=en,
  _cstUseRefLib=Y,
  _cstRefTableDS=refmeta.reference_tables,
  _cstRefColumnDS=refmeta.reference_columns,
  _cstClassTableDS=refmeta.class_tables,
  _cstClassColumnDS=refmeta.class_columns,
  _cstReturn=_cst_rc,
  _cstReturnMsg=_cst_rcmsg
  );
For more information about the %DEFINE_CREATESRCMETAFROMDEFINE macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.
After the driver program runs, the srcmeta_define_results data set is created. This data set contains informational, warning, and error messages that were generated by the driver program.

Migrating Study Source Metadata Used for the Creation of a CRT-DDS 1.0 define.Xml File for the Study

The %CSTUTILMIGRATECRTDDS2DEFINE macro migrates source metadata data sets from CRT-DDS v1.0 to Define-XML v2.0.
For CRT-DDS 1.0.0, the following source metadata SAS data sets are defined in SAS Clinical Standards Toolkit starting with version 1.5:
  • source_study
  • source_tables
  • source_columns
  • source_values
  • source_documents
For Define-XML 2.0.0, the source metadata SAS data set source_codelists contains all metadata needed to create codelists in the define.xml file. The metadata includes external codelists (for example, MedDRA and WHODRUGG) and NCI metadata (for example, the so-called C-codes).
To create the source_codelists study metadata data set, you must specify two items: a list of format catalogs that define the study formats and a SAS data set that contains CDISC/NCI codelist metadata.
The migrate_crtdds_to_definexml_sdtm.sas and migrate_crtdds_to_definexml_adam.sas sample driver programs provide examples of migrating CRT-DDS 1.0.0 source metadata to Define-XML 2.0.0 source metadata. The drivers for ADaM and SDTM are similar in structure, so only the SDTM driver program is explained.
The driver program is located here:
sample study library directory/cdisc-definexml-2.0.0–1.7/programs
Here is an example of the librefs that are defined after the initial setup:
%**********************************************************************************;
%* Define libnames for input                                                      *;
%**********************************************************************************;
%* Original CRT-DDS v1 source metadata for SDTM 3.1.2 in CST 1.7;
libname crtdds "&studyRootPath/sascstdemodata/metadata";

%**********************************************************************************;
%* Define libnames for output                                                     *;
%**********************************************************************************;
%* Migrated Define-XML v2 source metadata;
libname defv2 "&studyOutputPath/derivedstudymetadata_crtdds/%lowcase(&_cstTrgStandard)-
               &_cstTrgStandardVersion";
%**********************************************************************************;
%* Define formats                                                                 *;
%**********************************************************************************;

*********************************************************************;
* Set CDISC NCI Controlled Terminology version for this process.    *;
*********************************************************************;
%cst_getstandardsubtypes(_cstStandard=CDISC-TERMINOLOGY,_cstOutputDS=work._cstStdSubTypes);
data _null_;
  set work._cstStdSubTypes (where=(standardversion="&_cstTrgStandard" and isstandarddefault='Y'));
 * User can override CT version of interest by specifying a different where clause:         *;
 * Example: (where=(standardversion="&_cstTrgStandard" and standardsubtypeversion='201104'))*;
  call symputx('_cstCTPath',path);
  call symputx('_cstCTMemname',memname);
run;

proc datasets lib=work nolist;
  delete _cstStdSubTypes;
quit;
run;

%* SDTM Study formats in CST 1.7;
libname studyfmt "&studyRootPath/sascstdemodata/terminology/formats";

%* CDISC-NCI Terminology to be used in CST 1.7;
libname ncisdtm "&_cstCTPath";

%* Formats to be used for SDTM;
options fmtsearch = (studyfmt.formats ncisdtm.&_cstCTMemname);
Note: You might need to modify the librefs.
Here is an example of some of the CRT-DDS 1.0.0 metadata that must be mapped to values as expected by Define-XML 2.0.0:
%**********************************************************************************;
%* Create some formats for mapping                                                *;
%**********************************************************************************;
proc format;
  value $_cststd
   /* Maps from CRT-DDS values to required Define-XML v2 values */
   "CDISC SDTM"="SDTM-IG"
   "CDISC SEND"="SEND-IG"
   "CDISC ADAM"="ADAM-IG"
  ;
  value $_cstdom
   /* Map to ItemGroup/@Domain attribute */
   "QSCG" = "QS"
   "QSCS" = "QS"
   "QSMM" = "QS"
  ;
  value $_cstdomd
   /* Map to ItemGroup/Alias[@Context='DomainDescription']/@Name attribute */
   "QSCG" = "Questionnaires"
   "QSCS" = "Questionnaires"
   "QSMM" = "Questionnaires"
  ;
  value $_cstcls
   /* Maps from CRT-DDS values to required Define-XML v2 values */
   "SPECIAL PURPOSE DOMAINS" = "SPECIAL PURPOSE"
   "SPECIAL PURPOSE DATASETS" = "SPECIAL PURPOSE"
   "FINDINGS ABOUT" = "FINDINGS"
   "ADSL" = "SUBJECT LEVEL ANALYSIS DATASET"
   "ADAE" = "ADAM OTHER"
   "BDS" = "BASIC DATA STRUCTURE"
  ;
  value $_cstvlm
   /* For SDTM maps to variables that are being described by Value Level Metadata */
   "EG.EGTESTCD" = "EGORRES"
   "IE.IETESTCD" = "IEORRES"
   "TI.IETESTCD" = "IECAT"
   "LB.LBTESTCD" = "LBORRES"
   "PE.PETESTCD" = "PEORRES"
   "SC.SCTESTCD" = "SCORRES"
   "VS.VSTESTCD" = "VSORRES"
   "SUPPAE.QNAM" = "QVAL"
  ;
run; 
Note: It is likely that you must modify some mappings based on the specific data values. It is important to use the format names as specified because these formats are used in the conversion macros.
Here is an example of the metadata conversion:
%**********************************************************************************;
%* Define the studyversion macro variable.                                        *;
%* This will become the MetaDataVersion/@OID attribute                            *;
%* In CRT-DDS this was the source_study.definedocumentname column                 *;
%* Also define the SASRef macro variable to use for the SASRef column in the      *;
%* source_xxx data sets.                                                          *;
%**********************************************************************************;
proc sql noprint;
 select definedocumentname, SASRef into :studyversion, :SASRef
 from crtdds.source_study;
quit;

%**********************************************************************************;
%* Migrate source tables                                                          *;
%**********************************************************************************;
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_study, 
                             _cstTrgDS=defv2.source_study, _cstStudyVersion=&studyversion,
                             _cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_tables,
                             _cstTrgDS=defv2.source_tables, _cstStudyVersion=&studyversion,
                             _cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_columns,
                             _cstTrgDS=defv2.source_columns, _cstStudyVersion=&studyversion,
                             _cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_values,
                             _cstTrgDS=defv2.source_values, _cstStudyVersion=&studyversion,
                             _cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_documents,
                             _cstTrgDS=defv2.source_documents, _cstStudyVersion=&studyversion,
                             _cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
The creation of the source_codelists table is a separate task because this table was not available in the CRT-DDS 1.0.0 source metadata.
Here is an example of the call to the %CSTUTILGETNCIMETADATA macro, in which the _cstFormatCatalogs parameter is blank. This indicates that the format catalogs that define the code lists to include in the source_codelists table are taken from the value of the FMTSEARCH option.
%**********************************************************************************;
%* Create source_codelists                                                        *;
%**********************************************************************************;

%* Get formats ;
 %cstutilgetncimetadata(
  _cstFormatCatalogs=,
  _cstNCICTerms=ncisdtm.cterms,
  _cstLang=en,
  _cstStudyVersion=&studyversion, 
  _cstStandard=&_cstTrgStandard,
  _cstStandardVersion=&_cstTrgStandardVersion,
  _cstFmtDS=work._cstformats,
  _cstSASRef=&SASRef,
  _cstReturn=_cst_rc,
  _cstReturnMsg=_cst_rcmsg
  );

%* Create a data set with all applicable formats. ;
data work.cl_column_value(keep=xmlcodelist);
  set defv2.source_columns defv2.source_values;
    xmlcodelist=upcase(xmlcodelist);
    if xmlcodelist ne '';
run;

proc sort data=work.cl_column_value nodupkey;
  by xmlcodelist;
run;
  
%* Only keep applicable formats. ;
proc sql;
  create table defv2.source_codelists
  as select
    nci.*
  from
    work._cstformats nci, work.cl_column_value cv
  where (upcase(compress(nci.codelist, '$')) = 
         upcase(compress(cv.xmlcodelist, '$')))
  ;
quit;
Here is an example of the last part of the sample driver program, in which metadata for external controlled terminology is added to the source_codelists data set:
%**********************************************************************************;
%* Updates for External Controlled Terminology                                    *;
%**********************************************************************************;

proc sql;
 insert into defv2.source_codelists
   (sasref, codelist, codelistname, codelistdatatype, dictionary, version, 
    studyversion, standard, standardversion)
    values ("&SASRef", "CL.AEDICT", "Adverse Event Dictionary", "text", "MEDDRA", "8.0", 
            "&studyversion", "&_cstTrgStandard", "&_cstTrgStandardVersion")
    values ("&SASRef", "CL.DRUGDCT", "Drug Dictionary", "text", "WHODRUG", "200204", 
            "&studyversion", "&_cstTrgStandard", "&_cstTrgStandardVersion")
            ;
quit;  

data defv2.source_columns;
  set defv2.source_columns;
  if table="AE" and column in ("AEDECOD" "AEBODSYS") then xmlcodelist="CL.AEDICT";
  if table="CM" and column in ("CMDECOD" "CMCLAS" "CMCLASCD")
    then xmlcodelist="CL.DRUGDCT";
run; 
For more information about the %CSTUTILMIGRATECRTDDS2DEFINE macro and the cstutilgetncimetadata macro, see the SAS Clinical Standards Toolkit: Macro API Documentation.