You can use the DQMATCH
procedure to generate cluster numbers as it generates match codes.
An important application for clustering is commonly referred to as
householding. Members of a family or household are identified in clusters
that are based on multiple criteria and conditions.
To establish the criteria
and conditions for householding, use multiple CRITERIA statements
and CONDITION= options within those statements.
-
The integer values of the CONDITION=
options are reused across multiple CRITERIA statements to establish
groups of criteria.
-
Within each group, match codes
are created for each criteria.
-
If a source row is to receive a
cluster number, all of the match codes in the group must match all
of the codes in another source row.
-
The match codes within a group
are therefore evaluated with a logical AND.
If more than one condition
number is specified across multiple CRITERIA statements, there are
multiple groups and multiple groups of match codes. In this case,
source rows receive cluster numbers when any groups match any other
group in another source row. The groups are therefore evaluated with
a logical OR.
For an example of householding,
assume that a data set contains customer information. To assign cluster
numbers, you use two groups of two CRITERIA statements. One group
(condition 1) uses two CRITERIA statements to generate match codes
based on the names of individuals and an address. The other group
(condition 2) generates match codes based on organization name and
address. A cluster number is assigned to a source row when either
pair of match codes matches at least one group that matches the match
codes from another source row. The code and output for this example
are provided in
Clustering with Multiple CRITERIA Statements.