DataFlux Data Management Studio 2.5: User Guide
A phonetics rule specifies a pattern you want to locate, the replacement text (if any), and the action you want taken after the replacement is made.
Rule text identifies the pattern you want to locate. Rule text can consist of literal characters and a small set of meta-characters:
The dot meta-character (".")
A single period/dot in the rule text matches any single character in the input string. For example, the rule of "CA." would match any phrase that has a "C" followed by an "A" followed by any other character.
Character class ("[" and "]")
Enclose a set of literal characters in square brackets to match any one of the specified characters. For example, the rule text "CA[TB]" would match "CAT" and "CAB," but would not match "CAP."
If the first character within a character class is a circumflex ("^"), it will match only characters that are not listed in the character class. For example, the rule text "CA[^TB]" would not match "CAT" and "CAB," but it would match "CAP," "CAN," or any other phrase that has a "CA" followed by any character except a "T" or "B."
Beginning of word ("^")
Use a circumflex to indicate that the following pattern must be found at the beginning of a word. For example, the rule text "^SAND" would match "SANDWICH," but not "STREISAND."
End of word ("$")
Use a dollar sign to indicate that the preceding pattern must be found at the end of a word. For example, the rule text "SAND$" would match "STREISAND" as well as the word "SAND" (because it is at the end of the word) but not "SANDWICH."
Replace the first n characters ("/")
Use an embedded forward slash to search on an entire pattern but then replace only the characters prior to the slash. For example, the rule text "SCH/OOL" with the replacement text "SK" will match the word "SCHOOL" and produce an output string of "SKOOL."
This is different than the rule "SCH," which will match the pattern "SCH" anywhere within a string. It is also different from "^SCH," which will match the pattern "SCH" if it is at the beginning of any word, regardless of what follows it.
Literal characters
Specify all literal characters in uppercase. If you want to use a literal that is also used as a meta-character (the dollar sign, dot/period, circumflex, forward slash, backslash, or square bracket), escape that character by preceding it with a backslash. For example, to look for a dollar sign, use the rule text "\$." To match a backslash, use "\\."
To match against a space
You cannot match against a space unless the space is the only thing in the rule. For example, to replace a space with an underscore, use the rule text "\ " with the replacement text "_." Other than this one purpose, you should not use white space in Phonetics rules.
Replacement text replaces the entire pattern found by the rule text, except in the case of a forward slash. If you use the forward slash in your rule text, the replacement text replaces the pattern up to the slash.
Replacement text can contain only literal text, no meta-characters.
Priority
There are two factors that determine the order in which Phonetics rules are processed: priority and order.
Phonetics rules are processed in decreasing priority. In the case of multiple rules with the same priority, rules are processed in the order in which they appear (top to bottom). Priority must be a numeric value greater than 0 and less than 100.
For more information, see Changing Rule Order.
Reset flag
When you select the Reset flag, the character following the replacement is considered the beginning of a word, potentially allowing it to match rules using the "^" meta-character.
For example, with an input string of "MCKNIGHT", if the rule text is "MC" and the Reset flag is selected, after any replacement has been made, the "K" would be considered the beginning of a word meaning that "KN" would be replaced by "N" due to the "KN" rule.
Rewind flag
Phonetics maintains a search pointer that references a position in the input string. When processing begins, the search pointer points to the beginning of the string. When a Phonetics rule is applied to the string, the search pointer moves forward through the string to point to the position just after the text that was just replaced.
For example, suppose the input string is "ISAACS", and your Phonetics library has rule "C" with replacement text "K". After this rule is applied, the string would look like "ISAAKS," and the search pointer would point to the letter "S."
But suppose you also have the rule "KS" and the replacement string "X". If the search pointer points to the letter "S" in "ISAAKS," there is no way to match the text "KS" and make the replacement. This is where the Rewind flag is helpful. If you use the rewind flag on the first rule, then after "C" is replaced with "K," the search pointer will be rewound to point to the letter "K" in "ISAAKS," Now your rule "KS" will be applied, and the string will be changed to "ISAAX."
Ignore Replace flag
If you select the Ignore Replace flag, when a rule is matched, DataFlux Data Management Studio will not make any replacements, even if replacement text is specified. DataFlux Data Management Studio will continue processing rules from the end of the matched pattern forward.
This is useful when you want to make certain that a specific pattern will never be replaced by other Phonetics rules. For example, suppose you have a rule "Y" with replacement text "I," but you do not want to change "Y" to "I" if the "Y" is followed by another vowel. You could use a rule such as "Y[AEIOU]" and set the Ignore Replace flag. As long as this rule has a higher priority than the first rule, the "Y" will be preserved when it is followed by another vowel. There is no need to write any replacement text for the "Y[AEIOU]" rule; the replacement text field is ignored when the Ignore Replace flag is set.
Note: It does not make sense to use the Ignore Replace flag and the Rewind flag in the same rule. These two flags are mutually exclusive.
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_Cstm_Phon_17002.html |