SAS Institute. The Power to Know

SAS(R) Data Quality Server 9.2: Reference

space
Previous Page | Next Page

The DQMATCH Procedure

DQMATCH Procedure Syntax


Requirements: At least one CRITERIA statement is required. See CRITERIA Statement.
See Also:

PROC DQMATCH
<DATA=input-data-set>
<DELIMITER | NODELIMITER>
<CLUSTER=output-variable-name>
<CLUSTER_BLANKS | NO_CLUSTER_BLANKS>
<CLUSTERS_ONLY>
<LOCALE=locale-name>
<MATCHCODE=output-variable-name>
<OUT=data-set-name>
<CRITERIA1 options
...
CRITERIAn options
>;

The DQMATCH procedure offers the following options:

DATA=input-data-set

identifies the input SAS data set. The default input data set is the most recently created data set in the current SAS session.

CLUSTER=output-variable-name

specifies the name of the numeric variable in the output data set that contains the cluster number. If the CLUSTER= option is not specified and if the CLUSTERS_ONLY option is specified, then an output variable named CLUSTER is created.

CLUSTER_BLANKS | NO_CLUSTER_BLANKS

specify the default value CLUSTER_BLANKS to write blank values into the output data set, without an accompanying match code. Specifying NO_CLUSTER_BLANKS removes blank values from the output data set.

CLUSTERS_ONLY

excludes from the output data set any input character values that are not found to be part of a cluster. A cluster number is assigned only when two or more input values produce the same match code. Specifying CLUSTERS_ONLY excludes input character values that have unique match codes and are not blank. This option is not asserted by default. Normally, all input values are included in the output data set.

DELIMITER | NODELIMITER

when multiple CRITERIA statements are specified, the default value DELIMITER specifies that exclamation points (!) separate the individual match codes that make up the concatenated match code. Match codes are concatenated in the order of appearance of CRITERIA statements in the DQMATCH procedure.

The NODELIMITER option specifies that multiple match codes are concatenated without the exclamation points.

Note:   

The default in SAS differs from the default in the dfPower Studio software from DataFlux (a SAS company). SAS uses a delimiter by default; DataFlux does not. Be sure to use delimiters consistently if you plan to analyze, compare, or combine match codes created in SAS and dfPower Studio.  [cautionend]

LOCALE=locale-name

(optional) specifies the locale that will be used to create match codes. The value can be a locale name in quotation marks or the name of a variable whose value is a locale name or is an expression that evaluates to a locale name.

The specified locale must be loaded into memory as part of the locale list (see Load and Unload Locales). If no value is specified, the default locale is used. The default locale is the first locale in the locale list.

Note that the match definition, which is part of the specified locale, is specified in the CRITERIA statement. This specification allows different match definitions to be applied to different variables in the same procedure.

MATCHCODE=output-variable

specifies a name for the output character variable that stores the match codes. The DQMATCH procedure defines a sufficient length for this variable, even if a variable with the same name already exists in the input data set.

A default match code variable named MATCH_CD is generated if the following statements are all true:

  • No value is specified for the MATCHCODE= option in the PROC DQMATCH statement, and no values are specified for the MATCHCODE= option in subsequent CRITERIA statements.

  • No value is specified for the CLUSTER= option.

  • No value is specified for the CLUSTERS_ONLY option.

If the MATCHCODE= option is not specified in the PROC DQMATCH or in any CRITERIA statements, and if CLUSTERS= or CLUSTERS_ONLY is specified, then no match code output variable is created and no match codes are written into the output data set.

For further information on match codes, see Create Match Codes.

OUT=data-set-name

specifies the name of the output data set. If the specified data set does not exist, PROC DQMATCH creates it. The default output data set is the input data set.

space
Previous Page | Next Page | Top of Page