SAS Quality Knowledge Base for Contact Information 25
In the Quality Knowledge Base, definitions designated as Global are shared by all locales. The Global definitions are described below.
Case Definitions
Gender Analysis Definitions
Identification Analysis Definitions
Match Definitions
Parse Definitions
Pattern Analysis Definitions
Standardization Definitions
Lower | ||
---|---|---|
Description | The Lower case definition lowercases general text. | |
Example | Input | Output |
Lowercase General Text | lowercase general text | |
Remarks |
Proper | ||
---|---|---|
Description | The Proper case definition performs generic propercasing. | |
Example | Input | Output |
propercase general text | Propercase General Text | |
Remarks |
Proper (Address Number) | ||
---|---|---|
Description | The Proper (Address Number) case definition propercases address numbers. | |
Examples | Input | Output |
box 123a | Box 123A | |
4201a | 4201A | |
Remarks |
Upper | ||
---|---|---|
Description | The Upper case definition uppercases general text. | |
Example | Input | Output |
uppercase general text | UPPERCASE GENERAL TEXT | |
Remarks |
None.
Date (DMY Validation - Numeric Only) | ||
---|---|---|
Description | The Date (DMY Validation - Numeric Only) identification analysis definition identifies date strings as valid or invalid. | |
Examples | Input String | Identity |
10/18/2007 | INVALID | |
10/5/07 | VALID | |
October 5, 2007 | INVALID | |
07/10/05 | VALID | |
5/10/07 | VALID | |
Remarks |
Date (MDY Validation - Numeric Only) | ||
---|---|---|
Description | The Date (MDY Validation - Numeric Only) identification analysis definition identifies date strings as valid or invalid. | |
Examples | Input String | Identity |
10/18/2007 | VALID | |
10/5/07 | VALID | |
October 5, 2007 | INVALID | |
07/10/05 | VALID | |
5/10/07 | VALID | |
Remarks |
Date (YMD Validation - Numeric Only) | ||
---|---|---|
Description | The Date (YMD Validation - Numeric Only) identification analysis definition identifies date strings as valid or invalid. | |
Examples | Input String | Identity |
10/18/2007 | INVALID | |
10/5/07 | VALID | |
October 5, 2007 | INVALID | |
07/10/05 | VALID | |
5/10/07 | INVALID | |
Remarks |
E-mail (Country Identification) | ||
---|---|---|
Description | The E-mail (Country Identification) identification analysis definition identifies the country of an e-mail address. | |
Examples | Input | Output |
john.smith@hotmail.co.uk | UNITED KINGDOM | |
john.smith@example.com | UNKNOWN | |
john.smith@harvard.edu | UNITED STATES | |
john.smith@example.co.jp | JAPAN | |
Remarks |
E-mail (Validation) | ||
---|---|---|
Description | The E-mail (Validation) identification analysis definition identifies whether or not an e-mail address is syntactically correct. | |
Examples | Input | Output |
VIRGINIE.MAZOUINGMAIL.COM | INVALID | |
WBARBER@EARTHLINK.NCT | INVALID | |
wrobinson@helixtechnology | INVALID | |
Ajari@evolvingedge.net | VALID | |
al@crestnational.com | VALID | |
ALIEMARATI@YAHOO.COM | VALID | |
Remarks | This definition performs a simple validation of the syntax of the e-mail input. It is not meant to be 100% correct according to international standards, but it is a practical approach to screen your e-mail addresses. Addresses with embedded comments are recognized as invalid, even though the address may be correct. Certain rarely-occurring but otherwise valid constructions (such as valid control characters in a quoted mailbox) are recognized as invalid. Certain rarely-occurring but otherwise invalid constructions (such as double-hyphens in a sub-domain name) are recognized as valid. |
Account Number | ||
---|---|---|
Description | The Account Number match definition generates match codes which can be used to cluster records containing account numbers. Data values are rendered right-to-left in the match codes, so matches are found based on the least-significant positions in the input data. | |
Max Length of Match Code | 16 characters | |
Examples | Input | Cluster ID |
111-AB-3333-48-26656 | 0 | |
211-AB-3333-48-26656 | 0 | |
111-AB-3338-48-26651 | 1 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
Description |
The E-mail match definition generates match codes which can be used to cluster records containing e-mail addresses. |
|
Max Length of Match Code | 38 Characters | |
Input | Cluster ID | |
Examples | jane.smith@gmail.com | 0 |
jane-smith@gmail.com | 0 | |
janesmith@gmail.com | 0 | |
<mailto:janesmith@gmail.com> | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
|
The E-mail (v23) match definition is now deprecated and will be removed in a future release of the QKB. The E-mail match definition has been replaced with a copy of the E-mail (v23) definition which takes advantage of updated processing. If you changed your jobs to use the E-mail (v23) definition it is suggested that you change them back. |
||
E-mail (v23) | ||
---|---|---|
Description |
The E-mail (v23) match definition generates match codes which can be used to cluster records containing e-mail addresses. |
|
Max Length of Match Code | 38 Characters | |
Input | Cluster ID | |
Examples | jane.smith@gmail.com | 0 |
jane-smith@gmail.com | 0 | |
janesmith@gmail.com | 0 | |
<mailto:janesmith@gmail.com> | 0 | |
Remarks |
Note: The results listed above reflect the default match sensitivity (85). |
|
The E-mail (v23) match definition is now deprecated and will be removed in a future release of the QKB. The E-mail match definition has been replaced with a copy of the E-mail (v23) definition which takes advantage of updated processing. If you changed your jobs to use the E-mail (v23) definition it is suggested that you change them back. |
||
E-mail (with Combinations) | ||
---|---|---|
Description | The E-mail (with Combinations) match definition generates match codes which can be used to cluster records containing e-mail addresses, with a score for each match code. The types of matches produced by this definition include, but are not limited to, those shown in the examples below. | |
Max Length of Match Code | 54 characters | |
Example 1 | Input | Cluster ID |
Sensitivities 50 - 100 Weight 100 |
info@dataflux.com | 0 |
info1@dataflux.com | 0 | |
info2@dataflux.com | 0 | |
Remarks | An e-mail address with a single trailing digit in the mailbox shall match the same email address with no trailing digit in the mailbox. | |
Example 2 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 100 |
dave.wagner@acme.com | 1 |
wagner.dave@acme.com | 1 | |
DaveWagner@acme.com | 1 | |
WagnerDave@acme.com | 1 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and their mailboxes contain a two-word name (delimited by hyphen, underscore, full stop, or change of case) in the same order or opposite order. | |
Example 3 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 100 |
john.doe@mailbox.com | 2 |
john.doe+spam_tracker@mailbox.com | 2 | |
john.doe+spam_tracker_2@mailbox.com | 2 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and their mailboxes differ by the inclusion, omission, or value of a sub-part of the e-mail address (also called an address tag) delimited by the plus sign. | |
Example 4 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 25 |
bepstein@acme.com | 3 |
epstein@acme.com | 3 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and a mailbox containing a family name and another containing the same family name preceded by the first letter of a given name. | |
Example 5 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
davidw@acme.com | 4 |
david@acme.com | 4 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and a mailbox containing a given name and another containing the same given name followed by the first letter of a given name. | |
Example 6 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 75 |
br-epstein@acme.com | 5 |
brian.epstein@acme.com | 5 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and the first two letters of the given name and the delimited family name (delimited by hyphen, underscore, or full stop) match. | |
Example 7 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
b-epstein@acme.com | 6 |
brian.epstein@acme.com | 6 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and the first letter of the given name and the delimited family name (delimited by hyphen, underscore, or full stop) match. | |
Example 8 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
dave-william.wagner@acme.com | 7 |
dave.c.wagner@acme.com | 7 | |
dave_wagner@acme.com | 7 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and the first and last parts of the mailbox (delimited by hyphen, underscore, or full stop) match. | |
Example 9 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 25 |
dave-william.wagner@acme.com | 8 |
dave@acme.com | 8 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and the first part of the mailbox (delimited by hyphen, underscore, or full stop and recognized as a family name) match. | |
Example 10 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 25 |
andersen@acme.com | 9 |
n_rask_andersen@acme.com | 9 | |
Remarks | Two e-mail addresses shall match if their sub-domains match and the last part of the mailbox (delimited by hyphen, underscore or full stop and recognized as a family name) match. | |
Example 11 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
soon1923@lgphilips-lcd.com | 10 |
soon1g23@lgphilips-lcd.com | 10 | |
Remarks | Two e-mail addresses shall match if one contains lowercase "G" and another contains the digit 9 at the corresponding position. | |
Example 12 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
soonl923@lgphilips-lcd.com | 11 |
soon1923@lgphilips-lcd.com | 11 | |
Remarks | Two e-mail addresses shall match if one contains lowercase "L" and another contains the digit 1 at the corresponding position. | |
Example 13 | Input | Cluster ID |
Sensitivities 50 - 89 Weight 50 |
abc1O23@lgphilips-lcd.com | 12 |
abc1o23@lgphilips-lcd.com | 12 | |
abc1023@lgphilips-lcd.com | 12 | |
Remarks | Two e-mail addresses shall match if one contains letter "O" and another contains the digit 0 at the corresponding position. | |
Remarks |
This definition generates one or more match codes for each input string. The number of match codes generated for an input string depends on the content of the string. Each match code represents a combination of different parts of the input string; this enables two strings to be matched even when some parts of one or both of the strings differ. See the examples above for an illustration of clusters that may be produced using match codes generated by this definition. Note that a consequence of the generation of multiple match codes is that a record may be placed in more than one cluster by a subsequent clustering operation. Therefore, special attention should be given to the entity resolution process when using this definition. Generation of multiple match codes is achieved through the use of token-combination rules in the Match definition. Each match code generated by the definition is associated with one token-combination rule. There is a weight assigned to each rule; each rule's weight is used to calculate a score that is assigned to the match code that is generated by that rule. The score for a match code is equal to the weight of the rule used to generate the match code times the sensitivity that is selected when the definition is executed. When a record is clustered, the score for the record’s match code represents the confidence with which we can assert that the record belongs in the cluster. Note that when different rules lead to identical clustering results, the scores of the match codes generated by the different rules may be aggregated using the Cluster Aggregation node in a Data Job. The Cluster Aggregation node allows several different methods for aggregating match code scores, such as minimum, maximum, or mean across instances of a record, or minimum, maximum, or mean across all records in a cluster. For information on the Cluster Aggregation node, please refer to your DataFlux Data Management Studio documentation. |
Description | The E-mail parse definition parses e-mail addresses into a set of tokens. | ||
Output Tokens | Mailbox Sub-Domain Top-Level Domain Additional Info |
||
Example 1 | Input | Output | |
info@dataflux.com | Mailbox | info | |
Sub-Domain | dataflux | ||
Top-Level Domain | com | ||
Additional Info | |||
Example 2 | Input | Output | |
John Smith <johnsmith@dataflux.com> | Mailbox | johnsmith | |
Sub-Domain | dataflux | ||
Top-Level Domain | com | ||
Additional Info | John Smith | ||
Remarks |
IBAN | |||
---|---|---|---|
Description | The IBAN parse definition parses International Bank Account Numbers into a general set of tokens. | ||
Output Tokens | Country Code Basic Bank Account Number Key |
||
Example 1 | Input | Output | |
NL91ABNA0417164300 | Country Code | NL | |
Basic Bank Account Number | ABNA0417164300 | ||
Key | 91 | ||
Example 2 | Input | Output | |
CH6906470016006671002 | Country Code | CH | |
Basic Bank Account Number | 06470016006671002 | ||
Key | 69 | ||
Example 3 | Input | Output | |
FR7030002005500000157845Z02 | Country Code | FR | |
Basic Bank Account Number | 30002005500000157845Z02 | ||
Key | 70 | ||
Remarks | This parse definition has been configured for 59 country codes according to the ECBS definition. The ISO 13616 standard specifies the structure of an ISO-compliant national IBAN format. A copy of the ISO 13616 standard can be obtained through the ISO home page, http://www.iso.org/. |
IBAN (Detailed) | |||
---|---|---|---|
Description | The IBAN (Detailed) parse definition parses International Bank Account Numbers into a set of tokens. | ||
Output Tokens | Country Code Control Key Basic Bank Account Number Bank Code Sort Code Account Number Key |
||
Example 1 | Input | Output | |
NL91ABNA0417164300 | Country Code | NL | |
Control Key | 91 | ||
Basic Bank Account Number | |||
Bank Code | ABNA | ||
Sort Code | |||
Account Number | 0417164300 | ||
Key | |||
Example 2 | Input | Output | |
CH6906470016006671002 | Country Code | CH | |
Control Key | 69 | ||
Basic Bank Account Number | |||
Bank Code | 06470 | ||
Sort Code | |||
Account Number | 016006671002 | ||
Key | |||
Example 3 | Input | Output | |
FR7030002005500000157845Z02 | Country Code | FR | |
Control Key | 70 | ||
Basic Bank Account Number | |||
Bank Code | 30002 | ||
Sort Code | 00550 | ||
Account Number | 0000157845Z | ||
Key | 02 | ||
Remarks | This parse definition has been configured for 59 country codes according to the ECBS definition. The ISO 13616 standard specifies the structure of an ISO-compliant national IBAN format. A copy of the ISO 13616 standard can be obtained through the ISO home page, http://www.iso.org/. |
Website | |||
---|---|---|---|
Description | The Website parse definition parses Web sites into a set of tokens. | ||
Output Tokens | Scheme Hostname Project |
||
Example | Input | Output | |
http://www.dataflux.com/News-and-Events/ | Scheme | http:// | |
Hostname | www.dataflux.com | ||
Project | News-and-Events | ||
Remarks |
Character | ||
---|---|---|
Description | The Character pattern analysis definition determines the pattern of characters in the input string. | |
Output Symbols | Symbol | Meaning |
A | uppercase letter | |
a | lowercase letter | |
9 | numeric digit | |
* | other (punctuation, and so on) | |
Examples | Input | Output |
1 877-846-Flux | 9 999*999*Aaaa | |
JND 5134 | AAA 9999 | |
Remarks | Whitespace in the input string is represented as whitespace in the output. |
Character (Script Identification) | ||
---|---|---|
Description | The Character (Script Identification) pattern analysis definition determines the Unicode script of each character in the input, and outputs a character representing that script. | |
Symbol | Meaning | |
Output Symbols | L | Uppercase Latin character |
l | Lowercase Latin character | |
漢 | Kanji/Han | |
ア | Katakana | |
あ | Hiragana | |
가 | Hangul | |
Я | Uppercase Cyrillic character | |
я | Lowercase Cyrillic character | |
Θ | Uppercase Greek character | |
θ | Lowercase Greek character | |
ก | Thai | |
أ | Arabic character | |
א | Hebrew character | |
9 | Numeric digit | |
* | other (punctuation, and so on) | |
Input | Output | |
Examples | 1ー13ー1 イヌイビル・カチドキ8F 501号室 | 9*99*9 アアアアア*アアアア9L 999漢漢 |
JOHN DOE | LLLL LLL | |
(7F, SAS Institute)スズキイチロウ | *9L* LLL Lllllllll*アアアアアアア | |
李大伟 赛仕(北京) | 漢漢漢 漢漢*漢漢* | |
爱新觉罗·溥仪 | 漢漢漢漢*漢漢 | |
陈耀昌(Chan,Ed Yiu-Cheong) | 漢漢漢*Llll*Ll Lll*Llllll* | |
星光大道62号海王星科技大厦A座6楼 | 漢漢漢漢99漢漢漢漢漢漢漢漢L漢9漢 | |
珠海市 245400(玫瑰楼) | 漢漢漢 999999*漢漢漢* | |
二零零九年十月二十一日 | 漢漢漢漢漢漢漢漢漢漢漢 | |
14Mar, 2001 | 99Lll* 9999 | |
2009/10/21 | 9999*99*99 | |
H134981(5)------ | L999999*9******* | |
0174685503(D) | 9999999999*L* | |
22020319691106184X | 99999999999999999L | |
碧丽服装(北京)有限公司 | 漢漢漢漢*漢漢*漢漢漢漢 | |
电话(+86)10-12345678 | 漢漢**99*99*99999999 | |
Fax:01082741510 | Lll*99999999999 | |
(010)82741510-345 | *999*99999999*999 | |
Αθήνα | Θθθθθ | |
Банк | Яяяя | |
רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) | אאאאא אאאאא אאאאא אאאא אאאאא Lllll Llllllll 9999 | |
Remarks |
Word | ||
---|---|---|
Description | The Word pattern analysis definition determines the pattern of words in the input string. | |
Output Symbols | Symbol | Meaning |
A | alphabetic | |
9 | numeric digit | |
M | mixed alphabetic/numeric | |
* | other (punctuation, and so on) | |
Examples | Input | Output |
1 877-846-Flux | 9 9*9*A | |
JND 5134 | A 9 | |
216 E 116th St | 9 A M A | |
Remarks | Whitespace in the input string is represented as whitespace in the output. |
Word (Script Identification) | ||
---|---|---|
Description | The Word (Script Identification) pattern analysis definition determines the Unicode script of each word in the input, and outputs a character representing that script. | |
Symbol | Meaning | |
Output Symbols | L | Latin character |
漢 | Kanji/Han | |
ア | Katakana | |
あ | Hiragana | |
가 | Hangul | |
Я | Cyrillic | |
Θ | Greek | |
ก | Thai | |
أ | Arabic | |
א | Hebrew | |
9 | Numeric digit | |
* | other (punctuation, and so on) | |
Input | Output | |
Examples | 1ー13ー1 イヌイビル・カチドキ8F 501号室 | 9*9*9 ア*ア9L 9漢 |
JOHN DOE | L L | |
(7F, SAS Institute)スズキイチロウ | *9L* L L*ア | |
ΑNDREΑS ZIΑKΑS | W W | |
李大伟 赛仕(北京) | 漢 漢*漢* | |
爱新觉罗·溥仪 | 漢*漢 | |
陈耀昌(Chan,Ed Yiu-Cheong) | 漢*L*L L*L* | |
星光大道62号海王星科技大厦A座6楼 | 漢9漢L漢9漢 | |
珠海市 245400(玫瑰楼) | 漢 9*漢* | |
二零零九年十月二十一日 | 漢 | |
14Mar, 2001 | 9L* 9 | |
2009/10/21 | 9*9*9 | |
H134981(5)------ | L9*9* | |
0174685503(D) | 9*L* | |
22020319691106184X | 9L | |
碧丽服装(北京)有限公司 | 漢*漢*漢 | |
电话(+86)10-12345678 | 漢*9*9*9 | |
Fax:01082741510 | L*9 | |
(010)82741510-345 | *9*9*9 | |
ΑNDREΑS ZIΑKΑS | W W | |
רודיה סקאלה כשאני אוהב (הערות Liner) Sonotone (1990) | א א א א א L L 9 | |
Remarks | If a word contains a mix of Greek and Cyrillic, Latin and Cyrillic, or Latin and Greek glyphs (as in the final example, wherein the character Α is the Greek "Alpha" glyph), this definition will output a W, indicating a warning of potentially fraudulent data. |
ASCII Non-Printable Character Removal | ||
---|---|---|
Description | The ASCII Non-Printable Character Removal standardization definition removes control characters and other non-printable characters. | |
Examples | Input | Output |
Mr. John Smith[DELETE] | Mr. John Smith | |
2004 Honda Accord[NEXT LINE] | 2004 Honda Accord | |
Remarks |
Description | The E-mail standardization definition standardizes e-mail addresses. | |
Examples | Input | Output |
John Smith <john.smith@dataflux.com> | john.smith@dataflux.com | |
JOHN.SMITH@DATAFLUX.COM | john.smith@dataflux.com | |
mail: john.smith@dataflux.com | john.smith@dataflux.com | |
"john.smith@dataflux.com" | john.smith@dataflux.com | |
"john.smith@hotmail.com" | john.smith@hotmail.com | |
john.Smith.@hotmail ..com. | john.smith@hotmail.com | |
Remarks | The E-mail standardization definition removes unnecessary additional information. In some cases, it is also able to correct typos. |
Hyphen/Dash Removal | ||
---|---|---|
Description | The Hyphen/Dash Removal standardization definition removes hyphen and dash characters. | |
Examples | Input | Output |
Mary-Ann | MaryAnn | |
12-12-2000 | 12122000 | |
Remarks |
Hyphen/Dash Space Replacement | ||
---|---|---|
Description | The Hyphen/Dash Space Replacement standardization definition replaces hyphen and dash characters with a space character. | |
Examples | Input | Output |
North-Carolina | North Carolina | |
12-12-2000 | 12 12 2000 | |
Remarks |
IBAN (Electronic) | ||
---|---|---|
Description | The IBAN (Electronic) standardization definition standardizes International Bank Account Numbers for electronic storage. | |
Examples | Input | Output |
NL91ABNA0417164300 | NL91ABNA0417164300 | |
CH6906470016006671002 | CH6906470016006671002 | |
FR7030002005500000157845Z02 | FR7030002005500000157845Z02 | |
MT84 MALT 0110 0001 2345 MTLC AST0 01S | MT84MALT011000012345MTLCAST001S | |
Remarks |
IBAN (Printed) | ||
---|---|---|
Description | The IBAN (Printed) standardization definition standardizes International Bank Account Numbers for printout. | |
Examples | Input | Output |
NL91ABNA0417164300 | IBAN NL91 ABNA 0417 1643 00 | |
CH6906470016006671002 | IBAN CH69 0647 0016 0066 7100 2 | |
FR7030002005500000157845Z02 | IBAN FR70 3000 2005 5000 0015 7845 Z02 | |
MT84 MALT 0110 0001 2345 MTLC AST0 01S | IBAN MT84 MALT 0110 0001 2345 MTLC AST0 01S | |
Remarks |
Multiple Space Collapse | ||
---|---|---|
Description | The Multiple Space Collapse standardization definition collapses multiple space characters to one space character. | |
Examples | Input | Output |
Jack Miller | Jack Miller | |
12 12 12 | 12 12 12 | |
Remarks |
Non-Alphanumeric Removal | ||
---|---|---|
Description | The Non-Alphanumeric Removal standardization definition removes all non-alphanumeric characters including spaces. | |
Examples | Input | Output |
Cary.NC.27513 | CARYNC27513 | |
#AA-456-A12 | AA456A12 | |
Remarks | Output will be in uppercase. |
Non-Number Removal | ||
---|---|---|
Description | The Non-Number Removal standardization definition removes all non-number characters. | |
Examples | Input | Output |
John Smith 123 | 123 | |
AAA111 111 | 111111 | |
Remarks |
Number Removal | ||
---|---|---|
Description | The Number Removal standardization definition removes all number characters. | |
Examples | Input | Output |
John Smith 123 | John Smith | |
AAA111 111 | AAA | |
Remarks |
Phone Country Code to Country Name | ||
---|---|---|
Description | The Phone Country Code to Country Name standardization definition transforms a phone country code into its corresponding country name. | |
Examples | Input | Output |
+1 | United States/Canada | |
+49 | Germany | |
33 | France | |
0034 | Spain | |
Remarks |
Punctuation Removal | ||
---|---|---|
Description | The Punctuation Removal standardization definition removes all punctuation except hyphen/dash characters. | |
Examples | Input | Output |
100 Main St. Apt. #100 | 100 Main St Apt 100 | |
Joan Allen:Steve Allen | Joan AllenSteve Allen | |
Remarks |
Punctuation Space Replacement | ||
---|---|---|
Description | The Punctuation Space Replacement standardization definition replaces all punctuation except hyphen/dash characters with a space character. | |
Examples | Input | Output |
100 Main St. Apt. #100 | 100 Main St Apt 100 | |
Joan Allen:Steve Allen | Joan Allen Steve Allen | |
Remarks |
Space Removal | ||
---|---|---|
Description | The Space Removal standardization definition removes all space characters. | |
Examples | Input | Output |
10 : 10 | 10:10 | |
N A | NA | |
Remarks |
Surrounding Quote Removal | ||
---|---|---|
Description | The Surrounding Quote Removal standardization definition removes quote characters surrounding an entire string. | |
Examples | Input | Output |
"1" Steel Tube" | 1" Steel Tube | |
"John O'Malley" | John O'Malley | |
Remarks |
URL | ||
---|---|---|
Description | The URL standardization definition standardizes URLs. | |
Examples | Input | Output |
http://www.dataflux.com/News-and-Events/ | http://www.dataflux.com/news-and-events | |
ftp:/file.txt | ftp://file.txt | |
Remarks |
Website | ||
---|---|---|
Description | The Website standardization definition standardizes Web sites. | |
Examples | Input | Output |
WWW.DATAFLUX.COM | www.dataflux.com | |
http://www.dataflux.com/News-and-Events/ | www.dataflux.com/news-and-events | |
Remarks |
Documentation Feedback: yourturn@sas.com
|
Doc ID: QKBCI_global_defs.html |