DQSCHEME Procedure

CREATE Statement

Creates a scheme or an analysis data set.

Syntax

Optional Arguments

ANALYSIS=analysis-data-set

Names the output data set that stores analytical data.

Restriction This option is required if the SCHEME= option is not specified.
See Concepts for additional information.

INCLUDE_ALL

specifies that the scheme is to contain all of the values of the input variable. This includes input variables with these conditions:

  • with unique match codes
  • that were not transformed
  • that did not receive a cluster number
Note The INCLUDE_ALL option is not set by default.

LOCALE=locale-name

specifies the locale that contains the specified match definition. The value can be a locale name in quotation marks. It can be the name of a variable whose value is a locale name, or is an expression that evaluates to a locale name.

The specified locale must be loaded into memory as part of the locale list.
Default The first locale in the locale list.
Restriction If no value is specified, the default locale is used.
See Load and Unload Locales for additional information.

MATCHDEF=match-definition

names the match definition in the specified locale that is used to establish cluster numbers. You can specify any valid match definition.

The value of the MATCHDEF= option is stored in the scheme as a meta option. This provides a default match definition when a scheme is applied. This meta option is used only when SCHEME_LOOKUP= MATCHDEF. The default value that is supplied by this meta option is superseded by match definitions specified in the APPLY statement or the DQSCHEMEAPPLY CALL routine.
Tip Use definitions whose names end in (SCHEME BUILD) when using the ENUSA locale. These match definitions yield optimal results in the DQSCHEME procedure.
See Meta Options for additional information.

MODE= ELEMENT | PHRASE

specifies a mode of scheme application. This information is stored in the scheme as metadata, which specifies a default mode when the scheme is applied. The default mode is superseded by a mode in the APPLY statement, or in the DQSCHEMEAPPLY function or CALL routine. See Applying Schemes for additional information.

ELEMENT

specifies that each element in each value of the input character variable is compared to the data values in the scheme. When SCHEME_LOOKUP= USE_MATCHDEF, the match code for each element is compared to match codes generated for each element in each DATA variable value in the scheme.

PHRASE

(default value) specifies that the entirety of each value of the input character variable is compared to the data values in the scheme. When SCHEME_LOOKUP= USE_MATCHDEF, the match code for the entire input value is compared to match codes that are generated for each data value in the scheme.

SCHEME=scheme-name

specifies the name or the fileref of the scheme that is created. The fileref must reference a fully qualified path with a filename that ends in .sch.bfd. Lowercase letters are required. To create a scheme data set in QKB scheme file format, specify the BFD option in the DQSCHEME procedure.

To create a scheme in SAS format, specify the NOBFD option in the DQSCHEME procedure and specify a one-level or two-level SAS data set name.
Restriction The SCHEME= option is required if the ANALYSIS= option is not specified.
See Syntax for additional information.
CAUTION:
In the z/OS operating environment, specify only schemes that use SAS formats. QKB schemes can be applied, but not created in the z/OS operating environment.

SCHEME_LOOKUP= EXACT | IGNORE_CASE | USE_MATCHDEF

specifies one of three mutually exclusive methods of applying the scheme to the values of the input character variable. Valid values are defined as follows:

EXACT

(default value) specifies that the values of the input variable are to be compared to the DATA values in the scheme without changing the input values in any way. The transformation value in the scheme is written into the output data set only when an input value exactly matches a DATA value in the scheme. Any adjacent blank spaces in the input values are replaced with single blank spaces before comparison.

IGNORE_CASE

specifies that capitalization is to be ignored when input values are compared to the DATA values in the scheme.

Interaction Any adjacent blank spaces in the input values are replaced with single blank spaces before comparison.

USE_MATCHDEF

specifies that comparisons are to be made between the match codes of the input values and the match codes of the DATA values in the scheme.

Interactions Specifying USE_MATCHDEF enables the options LOCALE=, MATCHDEF=, and SENSITIVITY=, which can be used to override the default values that might be stored in the scheme.
A transformation occurs when the match code of an input value is identical to the match code of a DATA value in the scheme.
The value of the SCHEME_LOOKUP= option is stored in the scheme as a meta option. This specifies a default lookup method when the scheme is applied. The default supplied by this meta option is superseded by a lookup method that is specified in the APPLY statement, or in the DQSCHEMEAPPLY function or CALL routine.
See Meta Options for additional information.

SENSITIVITY=sensitivity-level

determines the amount of information that is included in the match codes that are generated during the creation and perhaps the application of the scheme. The value of the SENSITIVITY= option is stored in the scheme as a meta option. This provides a default sensitivity value when the scheme is applied.

Higher sensitivity values generate match codes that contain more information. These match codes generally result in the following:
  • fewer matches
  • greater number of clusters
  • fewer values in each cluster
Default 85
Interactions The default value supplied by this meta option is superseded by a sensitivity value specified in the APPLY statement, or in the DQSCHEMEAPPLY CALL routine.
This meta option is used at apply time only when SCHEME_LOOKUP= MATCHDEF.
See Meta Options for additional information.

VAR=input-character-variable

specifies the input character variable that is analyzed and transformed. The maximum length of input values is 1024 bytes.