DataFlux Data Management Studio 2.6: User Guide
Regex Library Editor
Regular expressions are powerful
tools you can use to transform data. In the DataFlux Data Management Studio implementation, regular
expressions are organized into libraries that you can use for parsing,
standardization, and matching. To build and test these libraries, use
the Customize Regex Library Editor. ("Regex" is short for "regular expression.")
In the context of parse definitions, standardization definitions, and match
definitions, regular expressions are primarily intended for character-level
cleansing and transformations. For word- and phrase-level transformations,
you should instead use standardization data libraries.
Regular expressions you create in the Regex Library Editor must adhere
to the syntax defined for Perl regular expressions. For information on
writing Perl regular expressions, see Mastering Regular
Expressions by Jeffrey E.F. Friedl or other readily available
reference material on the subject.
Building Regex Libraries
- Open the Regex
Library Editor. On the Customize main screen, select Tools > Other QKB Editors > Regex Library Editor. The Regex Library Editor
dialog appears.
- Set a QKB. Select Options > Set QKB. The Regex Library Editor window appears.
Select the appropriate QKB and locale and click Open. The QKB and locale setting is saved from session to session, so you do not need to specify it again unless you need to build a Regex Library file for a different locale.
- Create a New Regex
Library File. Select File > New.
- Add
Your Regular Expressions. By default, when you create a new Regex Library file, the Regex Library Editor will ask you to supply the first Regular Expression/Substitution pair. Type the Regular Expression and the Substitution in the appropriate fields and click OK.
Before you add more regular expressions, be aware that regular expressions will be applied sequentially in the order in which they appear in the Regex Library.
So, it is possible that a value could be modified by one regular expression, and then the result of that modification could be modified again by a regular expression that appears further down. This could occur several times as DataFlux Data Management Studio applies each regular expression from top to bottom. Therefore, you should be careful not to subvert the effects of one regular expression with another.
With that in mind, all new regular expression rules are added to the end of the list. Click on the newly created row and drag the rule to the desired position in the list. Repeat this process until you have defined all your regular expressions.
- Save Your Regex
Library. Select File > Save. Because this is a new Regex Library,
the Regex Library Editor will prompt you for a file name.
- Test
Your Regex Library. After creating a Regex Library, you can use the Regex Library Editor
to test your expressions. At the bottom of the Regex Library is the Test
Area, which allows you to type sample input strings to verify that you
have written your regular expressions correctly. Type an input string
and observe the result. If the result is not what you intended, you can
modify your regular expressions or their order and re-test. When you are
satisfied with the results, be certain to save your changes. Note that your input string in the Test Area is highlighted where it appears in your regular expressions.
Modifying Regex Library
Files
After using the Regex Library Editor Test Area to test your regular
expressions, you might find some unintended effects because of the order
of your expressions.
To alter the expression order in a Regex Library:
- In the Regex Library
Editor, select the row you want to move.
- Drag the row to the
desired location, and then release the mouse button. The row appears in its
new position.
Tips:
- If your substitution string
is empty, the matched pattern is removed because it is replaced with nothing.
- During parse normalization
and matching, strings are capitalized before regular expressions are applied.
So, you should make regular expressions case-insensitive in libraries
that will be used for these purposes. Or, you can use only capital letters
when representing literal text in the expressions. During parse pre-processing,
strings are not capitalized before regular expressions are applied. Therefore,
you can use case-sensitive regular expressions in libraries used for pre-processing
if there is some benefit to doing this.
Related Topics
Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.
|
Doc ID: dfU_Cstm_RegEx_16000.html
|