E-mail (with Combinations)

Match Definition

E-mail (with Combinations)
Description

The E-mail (with Combinations) match definition generates match codes which can be used to cluster records containing e-mail addresses.

Max Length of Match Code 54 characters
Example 1 ID Data Cluster ID Score
1 info@acme1.com 0 85.00
2 info1@acme1.com 0 85.00
3 info2@acme1.com 0 85.00
5 infog@acme1.com 0 42.50
6 info9@acme1.com 0 85.00
7 infol@acme1.com 0 42.50
1 info@acme1.com 1 21.25
2 info1@acme1.com 1 21.25
3 info2@acme1.com 1 21.25
6 info9@acme1.com 1 21.25
The same records can appear in more than one cluster due to match codes produced by different token combination rules.
Example 2 1 info@acme1.com 3 42.50
4 inf0@acme1.com 3 85.00
In otherwise identical mailboxes, the letter "O" matches the digit 0.
2 info1@acme1.com 9 42.50
7 infol@acme1.com 9 42.50
In otherwise identical mailboxes, the lowercase "L" matches the digit 1.
5 infog@acme1.com 22 42.50
6 info9@acme1.com 22 42.50
In otherwise identical mailboxes, the lowercase "G" matches the digit 9.
Example 3 8 dave.wagner@acme3.com 38 21.25
9 wagner.dave@acme3.com 38 21.25
10 BillSmith@acme3.com 53 85.00
11 SmithBill@acme3.com 53 85.00
As long as given and family names are delimited, they can occur in either order in the mailbox and still match. Casing can be used as a method of delimiting the names.
Example 4 12 john.doe@acme4.com 63 85.00
13 john.doe+spam_tracker@acme4.com 63 85.00
14 john.doe+spam_tracker_2@acme4.com 63 85.00
An address tag (a sub-part of the mailbox delimited by the plus sign) does not affect the match.
Example 5 15 george.brown@acme5.com 80 42.50
16 gbrown@acme5.com 80 85.00
17 g-brown@acme5.com 80 85.00

A letter preceding a family name in the mailbox does not affect the match.

One letter matches a full given name starting with that letter as long as the given name is delimited from the family name.

Example 6 18 scott@acme6.com 91 85.00
19 scottr@acme6.com 91 42.50
20 scottw@acme6.com 91 42.50
A letter following a given name in the mailbox does not affect the match.
Example 7 21 suwhite@acme7.com 109 85.00
22 susan.white@acme7.com 109 63.75
Two letters match a full given name starting with those two letters as long as the full given name is delimited from the family name.
Example 8 23 mary@acme8.com 123 85.00
24 mary.johnson@acme8.com 123 21.25
25 mary.queen.of.the.scotts@acme8.com 123 21.25
Matching given names followed by delimited words match.
Example 9 25 mary.queen.of.the.scotts@acme8.com 144 21.25
27 Phil-Scotts@acme8.com 144 42.50
26 tom-black@acme9.com 153 21.25
28 russ-william.black@acme9.com 153 21.25
29 russ.c.black@acme9.com 153 21.25
30 russ_black@acme9.com 153 21.25
Matching family names preceded by delimited words match.
Remarks

Note: The results listed above reflect the default match sensitivity (85).

This definition has many rules resulting in many ways of clustering similar records. The examples show some of the ways the data will be clustered.

Single trailing digits in the mailbox are excluded from the match code.

Delimiter characters are hyphen, underscore, and period. The delimiters do not need to match as long as the rest of the conditions are satisfied.

For more information about combination-based match definitions, see the Multiple Match Codes section in the Match Definitions documentation.