You can use the DQMATCH procedure to generate
cluster numbers as it generates
match codes. An important application for clustering is commonly referred to as householding.
Members of a family or household are identified in clusters that are based on multiple
criteria and conditions.
To establish the criteria
and conditions for householding, use multiple CRITERIA statements
and CONDITION= options within those statements.
-
The integer values of the CONDITION=
options are reused across multiple CRITERIA statements to establish
groups of criteria.
-
Within each group, match codes are created for each criteria.
-
If a source row is to receive a cluster number, all of the match codes in the group
must match all of the codes in another source row.
-
The match codes within a group are therefore evaluated with a logical AND.
If more than one condition number is specified across multiple CRITERIA statements,
there are multiple groups and multiple groups of match codes. In this case, source
rows receive cluster numbers when any groups match any other group in another source
row. The groups are
therefore evaluated with a logical OR.
For an example of householding, assume that a data set contains customer information.
To assign cluster numbers, you use two groups of two CRITERIA statements. One group
(condition 1) uses
two CRITERIA statements to generate match codes based on the names of individuals
and an address. The other group (condition
2) generates match codes based on organization name and address. A cluster number
is assigned to a source row when either pair of match codes matches at least one group
that matches the match codes from another source row. The code and output for this
example are provided in
Clustering with Multiple CRITERIA Statements .