Previous Page | Next Page

The DQMATCH Procedure

CRITERIA Statement


Creates match codes and optional cluster numbers for an input variable.
Requirement: At least one CRITERIA statement is required in DQMATCH procedures.

CRITERIA CONDITION=<integer>
DELIMSTR=<variable-name>|VAR=<variable-name>
EXACT|MATCHDEF
MATCHCODE=<output-character-variable>
SENSITIVITY=<sensitivity-level>;

Options

CONDITION=integer

groups CRITERIA statements to constrain the assignment of cluster numbers.

  • Multiple CRITERIA statements with the same CONDITION= value are all required to match the values of an existing cluster to receive the number of that cluster.

  • The CRITERIA statements are applied as a logical AND.

  • If more than one CONDITION= option is defined in a series of CRITERIA statements, then a logical OR is applied across all CONDITION= option values.

  • In a table of customer information, you can assign cluster numbers based on matches between the customer name AND the home address.

  • You can also assign cluster numbers on the customer name and organization address.

  • All CRITERIA statements that lack a CONDITION= option receive a cluster number based on a logical AND of all such CRITERIA statements.

Default: 1
Restriction: If you specify a value for the MATCHCODE= option in the DQMATCH procedure, and you specify more than one CONDITION= value, SAS generates an error. To prevent the error, specify the MATCHCODE= option in CRITERIA statements only.
Note: If you have not assigned a value to the CLUSTER= option in the DQMATCH procedure, cluster numbers are assigned to a variable named CLUSTER by default.
See: The DQMATCHINFOGET Function
DELIMSTR | VAR

specifies the name of a variable.

Restriction: You cannot specify the DELIMSTR= option and the VAR= option in the same CRITERIA statement.
See: The DQPARSE Function and the DQPARSETOKENPUT Function.
DELIMSTR=variable-name

specifies the name of a variable that has been parsed by the DQPARSE function, or contains tokens added with the DQPARSETOKENPUT function.

VAR=variable-name

specifies the name of the character variable that is used to create match-codes. If the variable contains delimited values, use the DELIMSTR= option.

Restriction: The values of this variable cannot contain delimiters added with the DQPARSE function or the DQPARSETOKENPUT function.
EXACT | MATCHDEF

assigns a cluster number.

Default: If the CLUSTER= option has not been assigned a variable in the DQMATCH procedure, then cluster numbers are assigned to the variable named CLUSTER.
Restriction: If you specify the MATCHCODE= option in the DQMATCH procedure, the match-code is a composite of the exact character-value and the match-code that is generated by the match-definition.
EXACT

assigns a cluster number based on an exact character match between values.

Restriction: If you specify the EXACT option you cannot specify the MATCHDEF= option, the MATCHCODE= option or the SENSITIVITY= option.
MATCHDEF=match-definition

specifies the match-definition that is used to create the match-code for the specified variable.

Restriction: The match-definition must exist in the locale that is specified in the LOCALE= option of the DQMATCH procedure.
Restriction: If you specify the MATCHDEF= option, you cannot specify the EXACT option, the MATCHCODE= option, or the SENSITIVITY option.
MATCHCODE=character-variable

specifies the name of the variable that receives the match-codes for the character variable that is specified in the VAR= option or the DELIMSTR= option.

Restriction: The MATCHCODE= option is not valid if you also specify the MATCHCODE= option in the DQMATCH procedure.
Restriction: If you are using multiple CRITERIA statements in a single procedure step, either:
  • specify the MATCHCODE=character-variable in each CRITERIA statement

  • or generate composite match-codes by specifying the MATCHCODE= option only in the DQMATCH procedure.

SENSITIVITY=sensitivity-level

determines the amount of information in the resulting match codes. Higher sensitivity values create match codes that contain more information about the input values. Higher sensitivity levels result in a greater number of clusters, with fewer values in each cluster.

Default: The default value is 85.
Valid values: Valid values range from 50 to 95.

Details

Match codes are created for the input variables that are specified in each CRITERIA statement. The resulting match-codes are stored in the output variables that are named in the MATCHCODE= option. The MATCHCODE= option can be specified in the DQMATCH procedure or the CRITERIA statement.

Simple match-codes are created when the CRITERIA statements specify different values for their respective MATCHCODE= options. Composite match codes are created when two or more CRITERIA statements specify the same value for their respective MATCHCODE= options.

To create match codes for a parsed character variable, specify the DELIMSTR= option instead of the VAR= option. In the MATCHDEF= option, be sure to specify the name of the match-definition. This definition is associated with the parse definition that was used to add delimiters to the character variable. To determine the parse definition that is associated with a match definition, use the DQMATCHINFOGET function.

Previous Page | Next Page | Top of Page