New Argument for Use with the COMPRESS Function
by Ron Cody
You might be familiar with the COMPRESS function. It has been around for quite a while. It allows you to remove selected characters from a character value. If you execute this function with only one argument (a character value), the default action is to remove blanks from the string. By specifying a second argument, you can specify which characters you want to remove from the string. Here is an example of how the previous (SAS 8) version of this function is used to remove all uppercase and lowercase letters, periods, dashes, and blanks from a character value:
data numbers_only;
input ID $10.;
number_part = compress(
lowcase(ID),
'abcdefghijklmnopqrstuvwxyz.- ');
datalines;
12 ab c.-34
abc1 2 -.89
;
The SAS 9.1 upgrade to this function allows a third argument (modifiers) that lets you specify character classes (that is, letters, digits, punctuation, white space, and so on) to remove from the character value. In addition, other modifiers change the way this function works. You can use one modifier to tell the function to remove all characters except the ones specified and another modifier to trim trailing blanks or ignore case. Following are just a few of the more useful modifiers.
Selected list of COMPRESS modifiers (uppercase or lowercase)
a adds uppercase and lowercase letters
d adds numerals (digits)
i ignores case
k keeps listed characters instead of removing them
s adds space (blank, tabs, lf, cr) to the list
p adds punctuation
For these examples, char = "A C123XYZ" , phone = "(908) 777-1234"
| Function |
Value Returned |
| COMPRESS("A C XYZ") |
ACXYZ |
| COMPRESS(phone," (-)") |
9087771234 |
| COMPRESS(CHAR,"0123456789") |
A CXYZ |
| COMPRESS(CHAR,,"ds") |
ACXYZ |
| COMPRESS(CHAR,"12345","k") |
123 |
| COMPRESS(PHONE,,"ps") |
9087771234 |
To demonstrate the power of the COMPRESS modifiers, let's rewrite the program above so that only digits remain in the ID. It looks like this:
data numbers_only;
input ID $10.;
number_part = compress(ID,,'kd');
datalines;
12 ab c.-34
abc1 2 -.89
;
In this example, the 'd' modifier specifies all digits and the 'k' modifier says to remove all characters except the ones indicated.
Take a moment to look at the SAS documentation on the COMPRESS function to see the power of these newly added modifiers.