DQSCHEME Procedure

CREATE Statement

Creates a scheme or an analysis data set.

Syntax

Optional Arguments

ANALYSIS=analysis-data-set
Names the output data set that stores analytical data.
Restriction:This option is required if the SCHEME= option is not specified.
See:Concepts for additional information.
INCLUDE_ALL
specifies that the scheme is to contain all of the values of the input variable. This includes input variables with these conditions:
  • with unique match codes
  • that were not transformed
  • that did not receive a cluster number
Note:The INCLUDE_ALL option is not set by default.
LOCALE=locale-name
specifies the locale that contains the specified match definition. The value can be a locale name in quotation marks. It can be the name of a variable whose value is a locale name, or is an expression that evaluates to a locale name.
The specified locale must be loaded into memory as part of the locale list.
Default:The first locale in the locale list.
Restriction:If no value is specified, the default locale is used.
See:Load and Unload Locales for additional information.
MATCHDEF=match-definition
names the match definition in the specified locale that is used to establish cluster numbers. You can specify any valid match definition.
The value of the MATCHDEF= option is stored in the scheme as a meta option. This provides a default match definition when a scheme is applied. This meta option is used only when SCHEME_LOOKUP= MATCHDEF. The default value that is supplied by this meta option is superseded by match definitions specified in the APPLY statement or the DQSCHEMEAPPLY CALL routine.
Tip:Use definitions whose names end in (SCHEME BUILD) when using the ENUSA locale. These match definitions yield optimal results in the DQSCHEME procedure.
See:Meta Options for additional information.
MODE= ELEMENT | PHRASE
specifies a mode of scheme application. This information is stored in the scheme as metadata, which specifies a default mode when the scheme is applied. The default mode is superseded by a mode in the APPLY statement, or in the DQSCHEMEAPPLY function or CALL routine. See Applying Schemes for additional information.
ELEMENT
specifies that each element in each value of the input character variable is compared to the data values in the scheme. When SCHEME_LOOKUP= USE_MATCHDEF, the match code for each element is compared to match codes generated for each element in each DATA variable value in the scheme.
PHRASE
(default value) specifies that the entirety of each value of the input character variable is compared to the data values in the scheme. When SCHEME_LOOKUP= USE_MATCHDEF, the match code for the entire input value is compared to match codes that are generated for each data value in the scheme.
SCHEME=scheme-name
specifies the name or the fileref of the scheme that is created. The fileref must reference a fully qualified path with a filename that ends in .sch.bfd. Lowercase letters are required. To create a scheme data set in Blue Fusion Data format, specify the BFD option in the DQSCHEME procedure.
To create a scheme in SAS format, specify the NOBFD option in the DQSCHEME procedure and specify a one-level or two-level SAS data set name.
Restriction:The SCHEME= option is required if the ANALYSIS= option is not specified.
See:Syntax for additional information.
CAUTION:
In the z/OS operating environment, specify only schemes that use SAS formats. BFD schemes can be applied, but not created in the z/OS operating environment.
SCHEME_LOOKUP= EXACT | IGNORE_CASE | USE_MATCHDEF
specifies one of three mutually exclusive methods of applying the scheme to the values of the input character variable. Valid values are defined as follows:
EXACT
(default value) specifies that the values of the input variable are to be compared to the DATA values in the scheme without changing the input values in any way. The transformation value in the scheme is written into the output data set only when an input value exactly matches a DATA value in the scheme. Any adjacent blank spaces in the input values are replaced with single blank spaces before comparison.
IGNORE_CASE
specifies that capitalization is to be ignored when input values are compared to the DATA values in the scheme.
Interaction:Any adjacent blank spaces in the input values are replaced with single blank spaces before comparison.
USE_MATCHDEF
specifies that comparisons are to be made between the match codes of the input values and the match codes of the DATA values in the scheme.
Interactions:Specifying USE_MATCHDEF enables the options LOCALE=, MATCHDEF=, and SENSITIVITY=, which can be used to override the default values that might be stored in the scheme.

A transformation occurs when the match code of an input value is identical to the match code of a DATA value in the scheme.

The value of the SCHEME_LOOKUP= option is stored in the scheme as a meta option. This specifies a default lookup method when the scheme is applied. The default supplied by this meta option is superseded by a lookup method that is specified in the APPLY statement, or in the DQSCHEMEAPPLY function or CALL routine.
See:Meta Options for additional information.
SENSITIVITY=sensitivity-level
determines the amount of information that is included in the match codes that are generated during the creation and perhaps the application of the scheme. The value of the SENSITIVITY= option is stored in the scheme as a meta option. This provides a default sensitivity value when the scheme is applied.
Higher sensitivity values generate match codes that contain more information. These match codes generally result in the following:
  • fewer matches
  • greater number of clusters
  • fewer values in each cluster
Default:85
Interactions:The default value supplied by this meta option is superseded by a sensitivity value specified in the APPLY statement, or in the DQSCHEMEAPPLY CALL routine.

This meta option is used at apply time only when SCHEME_LOOKUP= MATCHDEF.

See:Meta Options for additional information.
VAR=input-character-variable
specifies the input character variable that is analyzed and transformed. The maximum length of input values is 1024 bytes.