The GEOCODE Procedure |
Overview of Street Geocoding |
Note: This feature is for the third maintenance release of SAS 9.2 and later.
The STREET geocoding method matches street addresses to coordinates on a map. The GEOCODE procedure attempts to match the street name and ZIP code. If no match is found, then the GEOCODE procedure attempts to match the street name, city name, and two-character postal code. If the second match fails, then the ZIP method and the CITY method are used in succession.
If a street match is found, X and Y coordinate values are interpolated by using the house number, street type suffix, directional prefix, and directional suffix from the input address.
Note: To obtain the best results from STREET geocoding, use the most complete street addresses possible in your input data set. For example, "111 North Main Street" might produce a more accurate result than a "111 Main Street" or "111 North Main." Also include ZIP code values in the input data set to improve the accuracy of your results.
Data Sets for Street Geocoding |
The STREET geocoding method requires five different lookup data sets:
contains street names, ZIP codes, FIPS codes, and references to observation numbers for the street segment data set. The name of this data set must end with the letter M.
The STREET geocoding method uses the street matching data set to match the street name and to determine which observations in the street segment data set are associated with the matching street.
The FIRST variable identifies the first observation in the street segment data set and the LAST variable identifies the last observation in the street segment data set that is associated with the street match.
Default: | the SASHELP.USM data set |
Tip: | The street matching data set is specified by the LOOKUPSTREET= option. See the documentation for the LOOKUPSTREET= option for more information. |
contains variables to identify the street type, street direction prefix, and street direction suffix. Each street segment is associated with a range of house numbers, which is specified by the FROMADD and TOADD variables. The START variable identifies the first observation in the street coordinate data set that is associated with the street segment. The N variable specifies the number of observations in the street coordinate data set that are associated with the street segment.
The street segment data sets that are provided by SAS contain attribute variables with additional information, such as census tracts and county FIPS codes. The name of this data set must end with the letter S.
Default: | the SASHELP.USS data set |
Tip: | The street segment data set is specified indirectly through the LOOKUPSTREET= option. See the documentation for the LOOKUPSTREET= option. |
contains X and Y coordinates. The name of this data set must end with the letter P.
Default: | the SASHELP.USP data set |
Tip: | The street coordinate data set is specified indirectly through the LOOKUPSTREET= option. See the documentation for the LOOKUPSTREET= option. |
contains city names, two-character postal codes, and FIPS codes. If a match cannot be found by using the street name and ZIP code, then the STREET geocoding method uses the FIPS data set to determine the FIPS code for the city name and two-character postal code of the input address.
Note: If you choose to create a customized version of this data set, then you must use the same variable names, data types, order, and index that are used in the SASHELP.PLFIPS data set.
Default: | the SASHELP.PLFIPS data set |
Tip: | The street FIPS data set is specified by the FIPS= option. |
contains street type suffixes. The STREET geocoding method uses the street type data set to convert street type suffixes to standardized forms.
Note: If you choose to create a customized version of this data set, then you must use the same variable names, data types, order, and index that are used in the SASHELP.GCTYPE data set.
Default: | the SASHELP.GCTYPE data set |
Tip: | The street type data set is specified by the TYPE= option. |
The default street matching data sets (USM, USS, and USP) are not installed with SAS/GRAPH. These data sets contain address lookup data for the entire United States. You can download these data sets from the SAS Maps Online Web site at www.sas.com/mapsonline.
The USM, USS, and USP data sets are created from US Census Bureau TIGER/Line files. As new TIGER/Line data is released, updated versions of the USM, USS, and USP data sets will be made available.
The GEOEXM, GEOEXS, and GEOEXP data sets in the SASHELP library are installed with SAS by default. These data sets contain data for Wake County in North Carolina in the United States.
Output Variables for Street Geocoding |
In addition to the default output variables, the STREET geocoding method creates the following variables in the output data set:
M_ADDR |
contains the street address for the match. The M_ADDR value is the match value from the lookup data set. |
M_CITY |
contains the city name for the match. The M_CITY value is the match value from the lookup data set. |
M_STATE |
contains the two-character postal code for the match. The M_STATE value is the match value from the lookup data set. |
M_ZIP |
contains the ZIP code value from the lookup data set. |
M_OBS |
contains the row number for the match in the lookup data set. |
_STATUS_ |
indicates the type of match that was found. The following values are used with the _STATUS_ variable:
|
_NOTES_ |
contains tokens that provide additional information about the match. For more information, see Street Geocoding Note Values. |
_SCORE_ |
Contains a numeric value that indicates an estimate of the relative accuracy of the match. |
Street Geocoding Note Values |
The STREET geocoding method creates a _NOTES_ variable in the output data set. This variable provides details about the quality of the address match by using token strings. For example, the value "AD ZC NM" contains three tokens that indicate that the street name, ZIP code, and house number matched.
Each token in the _NOTES_ value has an associated score, and the sum of the scores make up the value of the _SCORE_ variable.
The following table displays the tokens and their scores:
Token |
Score |
Description |
---|---|---|
AD | 20 | The street name matched. |
CT | 5 | The city name matched. |
DP | 10 | The street direction prefix matched. |
DS | 10 | The street direction suffix matched. |
ENDNM | 0 | The house number was outside the ranges of values in the lookup data set for the matching street. The geocoded coordinates for the nearest end of the street were used. |
MZC | 0 | Multiple matches were found for the street address and ZIP code. |
MZS | 0 | Multiple matches were found for the street address and city-state pair. |
NM | 20 | The house number matched. |
NMOS | 15 | The house number matched an address range in the lookup data set, but is on the opposite side of the street from the matched range. |
NOCT | -5 | The city name and postal code could not be matched in the FIPS data set. |
NODPA | -10 | The input address had no direction prefix but the matching street did have a direction prefix. For example, the input street name was "Main St." but the matching street was "N Main St." |
NODPM | -10 | The input address had a direction prefix but it was not the same as the direction prefix of the matching street. For example, the input street name was "North Main St." but the matching street was "Main St." |
NODSA | -10 | The input address had no direction suffix but the matching street did have a direction suffix. For example, the input street name was "Johnson Ave" but the matching street was "Johnson Ave S." |
NODSM | -10 | The input address had a direction suffix but it was not the same as the direction suffix of the matching street. For example, the input street name was "Johnson Ave South" but the matching street was "Johnson Ave." |
NOLNM | 0 | The lookup data set contains missing values for the house numbers of the matching street. The geocoded coordinates for the center of the matching street were used. |
NONM | 0 | The input address has no house number. The geocoded coordinates for the center of the matching street were used. |
NOTYA | -5 | The input address had no street type suffix, but the matching address did have a street type suffix. For example, the input address was "110 Main St." but the matching address was "110 N. Main St." |
NOTYM | -5 | The street type suffix of the input address was not the same as the type suffix of the matching street. For example, the input street name was "Park St." but the matching street name was "Park Ave." |
ST | 5 | The two-character postal code matched. |
TY | 5 | The street type suffix matched. |
ZC | 15 | The ZIP code matched. |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.