DQMATCH Procedure

CRITERIA Statement

Creates match codes and optional cluster numbers for an input variable.

Syntax

Optional Arguments

CONDITION=integer
groups CRITERIA statements to constrain the assignment of cluster numbers.
  • Multiple CRITERIA statements with the same CONDITION= value are all required to match the values of an existing cluster to receive the number of that cluster.
  • The CRITERIA statements are applied as a logical AND.
  • If more than one CONDITION= option is defined in a series of CRITERIA statements, then a logical OR is applied across all CONDITION= option values.
  • In a table of customer information, you can assign cluster numbers based on matches between the customer name AND the home address.
  • You can also assign cluster numbers on the customer name and organization address.
  • All CRITERIA statements that lack a CONDITION= option receive a cluster number based on a logical AND of all such CRITERIA statements.
Default:1
Restriction:If you specify a value for the MATCHCODE= option in the DQMATCH procedure, and you specify more than one CONDITION= value, SAS generates an error. To prevent the error, specify the MATCHCODE= option in CRITERIA statements only.
Note:If you have not assigned a value to the CLUSTER= option in the DQMATCH procedure, cluster numbers are assigned to a variable named CLUSTER by default.
DELIMSTR | VAR
specifies the name of a variable.
DELIMSTR=variable-name
specifies the name of a variable that has been parsed by the DQPARSE function, or contains tokens added with the DQPARSETOKENPUT function.
VAR=variable-name
specifies the name of the character variable that is used to create match codes. If the variable contains delimited values, use the DELIMSTR= option.
Restrictions:The values of this variable cannot contain delimiter added with the DQPARSE function or the DQPARSETOKENPUT function.

You cannot specify the DELIMSTR= option and the VAR= option in the same CRITERIA statement.

See:DQPARSE Function for additional information.

DQPARSETOKENPUT Function for additional information.

EXACT | MATCHDEF
assigns a cluster number.
EXACT
assigns a cluster number based on an exact character match between values.
Restriction:If you specify the EXACT= option, you cannot specify the MATCHDEF= option, the MATCHCODE= option, or the SENSITIVITY= option.
MATCHDEF= match-definition
specifies the match-definition that is used to create the match code for the specified variable.
Restrictions:The match-definition must exist in the locale that is specified in the LOCALE= option of the DQMATCH procedure.

If you specify the MATCHDEF= option, you cannot specify the EXACT option, the MATCHCODE= option, or the SENSITIVITY option.

Default:If the CLUSTER= option has not been assigned a variable in the DQMATCH procedure, then cluster numbers are assigned to the variable named CLUSTER.
Restriction:If you specify the MATCHCODE= option in the DQMATCH procedure, the match–code is a composite of the exact character-value and the match code that is generated by the match-definition.
MATCHCODE= character-variable
specifies the name of the variable that receives the match codes for the character variable that is specified in the VAR= option or the DELIMSTR= option.
Restrictions:The MATCHCODE= option is not valid if you also specify the MATCHCODE= option in the DQMATCH procedure.

If you are using multiple CRITERIA statements in a single procedure step, either specify the MATCHCODE=character-variable in each CRITERIA statement or generate composite matchcodes by specifying the MATCHCODE= option only in the DQMATCH procedure.

SENSITIVITY= sensitivity-level
determines the amount of information in the resulting match codes. Higher sensitivity values create match codes that contain more information about the input values. Higher sensitivity levels result in a greater number of clusters, with fewer values in each cluster.
Default:The default value is 85.

Details

Match codes are created for the input variables that are specified in each CRITERIA statement. The resulting match codes are stored in the output variables that are named in the MATCHCODE= option. The MATCHCODE= option can be specified in the DQMATCH procedure or the CRITERIA statement.
Simple match codes are created when the CRITERIA statements specify different values for their respective MATCHCODE= options. Composite match codes are created when two or more CRITERIA statements specify the same value for their respective MATCHCODE= options.
To create match codes for a parsed character variable, specify the DELIMSTR= option instead of the VAR= option. In the MATCHDEF= option, be sure to specify the name of the match-definition. This definition is associated with the parse definition that was used to add delimiters to the character variable. To determine the parse definition that is associated with a match definition, use the DQMATCHINFOGET function.