Sample 25977: Determine the closest match for a target string when values of varying lengths, or values containing wildcards
Perform an efficient table look up for a specified string against a second table of values that vary in length due to 'wildcarding' on the end of the string.
Note:
For detailed information regarding object dot programming in the DATA step, please refer to SAS 9.2 Language Reference: Concepts, Using DATA Step Component Objects.
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
/* Create a 'source' data set. */
data source;
input x $ y $;
datalines;
12345678 xyz
123456** abc
12345*** efg
1234**** hij
;
/* Create a data set of numbers to look up. Use the value from SOURCE that is */
/* the closest, best match and obtain the value of Y from that observation. */
data target;
input x $ ;
datalines;
12345678
12345601
12345602
12345001
12345002
12340001
12340002
;
data out;
length x y $ 8;
/* On the first iteration only, create a hash object named H and load it */
/* with the values from SOURCE. The KEY used for the look up is X. */
/* Load the hash object with the values of X in descending order. This is */
/* crucial for the retrieval of the first, closest match. The data in */
/* the hash object associated with the KEY are X and Y. Declare a hash */
/* iterator named h_iter that will iterate over each KEY value in the */
/* hash object. */
if _n_=1 then do;
declare hash h(dataset:'source', ordered:'descending');
h.definekey('x');
h.definedata('x','y');
declare hiter h_iter('h');
h.definedone();
call missing(x, y);
end;
/* Set TARGET, renaming X to TX so its value is not overwritten by X */
/* when the data values are returned from the hash object. */
set target(rename=(x=tx));
rc=h_iter.first();
do while (rc=0);
if substr(tx,1,length(compress(x,'*')))=compress(x,'*') then leave;
/* If the 'non-wildcard' bytes do not match then iterate to the next */
/* KEY value to see if it is the best match. */
rc1=h_iter.next();
if rc1 ne 0 then do;
y='not';
leave;
end;
end;
run;
proc print;
run;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
Obs x y tx rc rc1
1 12345678 xyz 12345678 0 .
2 123456** abc 12345601 0 0
3 123456** abc 12345602 0 0
4 12345*** efg 12345001 0 0
5 12345*** efg 12345002 0 0
6 1234**** hij 12340001 0 0
7 1234**** hij 12340002 0 0
Perform an efficient table look up for a specified string against a second table of values that vary in length due to 'wildcarding' on the end of the string.
| Type: | Sample |
| Topic: | SAS Reference ==> DATA Step SAS Reference ==> Component Objects SAS Reference ==> Component Objects ==> hash iterator object
|
| Date Modified: | 2010-02-15 14:14:11 |
| Date Created: | 2005-10-05 09:59:04 |
Operating System and Release Information
| SAS System | Base SAS | OpenVMS Alpha | 9.1 TS1M0 | n/a |
| z/OS | 9.1 TS1M0 | n/a |
| Microsoft® Windows® for 64-Bit Itanium-based Systems | 9.1 TS1M0 | n/a |
| 64-bit Enabled HP-UX | 9.1 TS1M0 | n/a |
| 64-bit Enabled AIX | 9.1 TS1M0 | n/a |
| 64-bit Enabled Solaris | 9.1 TS1M0 | n/a |
| HP-UX IPF | 9.1 TS1M0 | n/a |
| Linux | 9.1 TS1M0 | n/a |
| Tru64 UNIX | 9.1 TS1M0 | n/a |