E-mail (with Combinations)
Match Definition
E-mail (with Combinations) | ||||
---|---|---|---|---|
Description |
The E-mail (with Combinations) match definition generates match codes which can be used to cluster records containing e-mail addresses. |
|||
Max Length of Match Code | 54 characters | |||
Example 1 | ID | Data | Cluster ID | Score |
1 | info@acme1.com | 0 | 85.00 | |
2 | info1@acme1.com | 0 | 85.00 | |
3 | info2@acme1.com | 0 | 85.00 | |
5 | infog@acme1.com | 0 | 42.50 | |
6 | info9@acme1.com | 0 | 85.00 | |
7 | infol@acme1.com | 0 | 42.50 | |
1 | info@acme1.com | 1 | 21.25 | |
2 | info1@acme1.com | 1 | 21.25 | |
3 | info2@acme1.com | 1 | 21.25 | |
6 | info9@acme1.com | 1 | 21.25 | |
The same records can appear in more than one cluster due to match codes produced by different token combination rules. | ||||
Example 2 | 1 | info@acme1.com | 3 | 42.50 |
4 | inf0@acme1.com | 3 | 85.00 | |
In otherwise identical mailboxes, the letter "O" matches the digit 0. | ||||
2 | info1@acme1.com | 9 | 42.50 | |
7 | infol@acme1.com | 9 | 42.50 | |
In otherwise identical mailboxes, the lowercase "L" matches the digit 1. | ||||
5 | infog@acme1.com | 22 | 42.50 | |
6 | info9@acme1.com | 22 | 42.50 | |
In otherwise identical mailboxes, the lowercase "G" matches the digit 9. | ||||
Example 3 | 8 | dave.wagner@acme3.com | 38 | 21.25 |
9 | wagner.dave@acme3.com | 38 | 21.25 | |
10 | BillSmith@acme3.com | 53 | 85.00 | |
11 | SmithBill@acme3.com | 53 | 85.00 | |
As long as given and family names are delimited, they can occur in either order in the mailbox and still match. Casing can be used as a method of delimiting the names. | ||||
Example 4 | 12 | john.doe@acme4.com | 63 | 85.00 |
13 | john.doe+spam_tracker@acme4.com | 63 | 85.00 | |
14 | john.doe+spam_tracker_2@acme4.com | 63 | 85.00 | |
An address tag (a sub-part of the mailbox delimited by the plus sign) does not affect the match. | ||||
Example 5 | 15 | george.brown@acme5.com | 80 | 42.50 |
16 | gbrown@acme5.com | 80 | 85.00 | |
17 | g-brown@acme5.com | 80 | 85.00 | |
A letter preceding a family name in the mailbox does not affect the match. One letter matches a full given name starting with that letter as long as the given name is delimited from the family name. |
||||
Example 6 | 18 | scott@acme6.com | 91 | 85.00 |
19 | scottr@acme6.com | 91 | 42.50 | |
20 | scottw@acme6.com | 91 | 42.50 | |
A letter following a given name in the mailbox does not affect the match. | ||||
Example 7 | 21 | suwhite@acme7.com | 109 | 85.00 |
22 | susan.white@acme7.com | 109 | 63.75 | |
Two letters match a full given name starting with those two letters as long as the full given name is delimited from the family name. | ||||
Example 8 | 23 | mary@acme8.com | 123 | 85.00 |
24 | mary.johnson@acme8.com | 123 | 21.25 | |
25 | mary.queen.of.the.scotts@acme8.com | 123 | 21.25 | |
Matching given names followed by delimited words match. | ||||
Example 9 | 25 | mary.queen.of.the.scotts@acme8.com | 144 | 21.25 |
27 | Phil-Scotts@acme8.com | 144 | 42.50 | |
26 | tom-black@acme9.com | 153 | 21.25 | |
28 | russ-william.black@acme9.com | 153 | 21.25 | |
29 | russ.c.black@acme9.com | 153 | 21.25 | |
30 | russ_black@acme9.com | 153 | 21.25 | |
Matching family names preceded by delimited words match. | ||||
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
|||
This definition has many rules resulting in many ways of clustering similar records. The examples show some of the ways the data will be clustered. Single trailing digits in the mailbox are excluded from the match code. Delimiter characters are hyphen, underscore, and period. The delimiters do not need to match as long as the rest of the conditions are satisfied. |
||||
For more information about combination-based match definitions, see the Multiple Match Codes section in the Match Definitions documentation. |