GEOCODE Procedure

Understanding Street Geocoding

Overview of Street Geocoding

The street geocoding method computes geographical coordinates for specified street addresses. This method converts a full street address that includes a house or building number, street name, city, state, and ZIP code to a map location. This method requires additional lookup data sets and additional options.
The GEOCODE procedure attempts to match the street name and ZIP code. If no match is found, then the GEOCODE procedure attempts to match the street name, city name, and two-character postal code. If the second match fails, then the ZIP code and the city methods are used in succession. You can disable this behavior by using the NOZIP and the NOCITY options in the GEOCODE statement.
If a street match is found, X and Y coordinate values are interpolated by using the house number, street type suffix, directional prefix, and directional suffix from the input address.

About Street Input Data

The street input data set should minimally contain the street address and ZIP code for each observation. Data that also includes the city and state provides a fallback way of finding a street level match if the initial attempt fails.
For best results, use the most complete street addresses possible in your input data set. For example, “111 North Main Street” is more likely to find a match than “111 Main Street” or “111 North Main.”

About Street Lookup Data

Overview of the Required Data Sets

The STREET geocoding method requires five different lookup data sets:
street matching data set
contains street names, ZIP codes, FIPS codes, and references to observation numbers for the street segment data set. The name of this data set can be any valid SAS name, but it must end with the letter M.
The STREET geocoding method searches this street matching data set for the current street name and ZIP or city or state values. The FIRST and LAST variables are then used to read the observations in the street segment data set that apply to this street.
The FIRST variable identifies the starting observation in the street segment data set. The LAST variable identifies the ending observation in the street segment data set that is associated with the street match.
Default:the SASHELP.USM data set
Notes:The default data set is not installed with SAS/GRAPH. You can download the data set from the SAS Maps Online Web site. For more information, see Obtaining Street Lookup Data Sets.

The SASHELP.GEOEXM data set is installed with SAS. This data set is a street matching data set for Wake County, N.C. For more information, see Obtaining Street Lookup Data Sets.

Tip:Any non-default street matching data set is specified by the LOOKUPSTREET= option. When you specify this option for the data set, the library must also contain two corresponding data sets whose names end with S (segment) and P (coordinate) respectively. For more information, see LOOKUPSTREET=street-matching-data-set .
street segment data set
contains variables to identify the street type, street direction prefix, and street direction suffix. The name of this data set must be the same as the street matching data set name, except the last character must be S instead of M.
Each street segment is associated with a range of house numbers along one side of the street. The range is specified by the SIDE, FROMADD, and TOADD variables. The START variable identifies the first observation in the street coordinate data set that is associated with the street segment. The N variable specifies the number of observations in the street coordinate data set containing the coordinates that delineate the street segment.
The street segment data sets that are provided by SAS contain attribute variables with additional information pertaining to the street segment and side. U.S. census tracts and county FIPS codes are examples of such attribute variables.
You can specify that non-geocoding variables from the street segment lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.For more information, see Adding Variables to the Output Data Set.
Default:the SASHELP.USS data set
Notes:The default data set is not installed with SAS/GRAPH. You can download the data set from the SAS Maps Online Web site. For more information, see Obtaining Street Lookup Data Sets.

The SASHELP.GEOEXS data set is installed with SAS. This data set is a street segment data set for Wake County, N.C. For more information, see Obtaining Street Lookup Data Sets.

Tip:Any non-default street segment data set is indirectly referenced through the LOOKUPSTREET= option. For more information, see LOOKUPSTREET=street-matching-data-set .
street coordinate data set
contains X and Y coordinates. The name of this data set must be the same as the street matching data set name, except the last character must be P instead of M.
Default:the SASHELP.USP data set
Notes:The default data set is not installed with SAS/GRAPH. You can download the data set from the SAS Maps Online Web site. For more information, see Obtaining Street Lookup Data Sets.

The SASHELP.GEOEXP data set is installed with SAS. This data set is a street coordinate data set for Wake County, N.C. For more information, see Obtaining Street Lookup Data Sets.

Tip:Any non-default street coordinate data set is specified indirectly through the LOOKUPSTREET= option. For more information, see LOOKUPSTREET=street-matching-data-set .
street FIPS data set
contains city names, two-character postal codes, and FIPS codes. In some instances a match cannot be found by using the street name and ZIP code. In this case the STREET geocoding method uses the FIPS data set to determine the FIPS code for the city name. It also uses the two-character postal code of the input address.
Note: When creating a customized version of this data set you must use the same variable names, data types, order, and composite index that are used in the SASHELP.PLFIPS data set.
Default:the SASHELP.PLFIPS data set, which is installed with SAS/GRAPH
Tip:Any non-default street FIPS data set is specified by the FIPS= option.
street type data set
contains street type suffixes. The STREET geocoding method uses the street type data set to convert street type suffixes from the input address observation to standardized forms.
Default:the SASHELP.GCTYPE data set, which is installed with SAS/GRAPH
Notes:If you create a customized version of the data set, then you must use the same variable names, data types, order, and simple index that are used in the SASHELP.GCTYPE data set.

The SASHELP.GCTYPE data set contains standard U.S. Postal Service street type suffixes, such as Avenue and Drive. SASHELP.GCTYPE includes some suffixes that are not in the USPS table but are found in U.S. Census Bureau TIGER/Line data for various U.S. localities. If addresses in your geocoding input data contain unusual or nonstandard suffixes, then create a custom version of this data set that adds those suffixes. The complete list of USPS street types can be found in Appendix C of the USPS Publication 28, Postal Addressing Standards.

Tips:Any non-default street type data set is specified by the TYPE= option.

The %TIGER2GEOCODE macro program imports TIGER/Line shapefiles from the US Census Bureau and creates the lookup data sets used by PROC GEOCODE’s STREET method. Download this macro program from SAS Maps Online Web site. For more information see SAS Maps Online Web Site.

Obtaining Street Lookup Data Sets

The default street matching data sets (USM, USS, and USP) are not installed with SAS/GRAPH. These data sets contain address lookup data for the entire United States. You can download these data sets from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.
The USM, USS, and USP data sets are created from US Census Bureau TIGER/Line files. As new TIGER/Line data is released, updated versions of the USM, USS, and USP data sets will be made available.
The GEOEXM, GEOEXS, and GEOEXP data sets in the SASHELP library are installed with SAS/GRAPH by default. These data sets contain street method lookup data for Wake County, N.C., USA. They can be used for trial geocoding tests to determine whether to download and install the nation-wide lookup data sets from the SAS Maps Online Web site.
A SAS %TIGER2GEOCODE macro program can be used to import TIGER/Line shapefiles for specific states and counties. This program is available from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.

Output Variables for Street Geocoding

In addition to the default output variables, the STREET geocoding method creates the following variables in the output data set:
M_ADDR
contains the street address for a street match. The M_ADDR value is the match value from the lookup data.
_MATCHED_
indicates how the coordinates were found. The following values are used with the _MATCHED_ variable:
Street
A match was found for either the street address and ZIP code or the street address, city, and state.
ZIP
A match was found for the ZIP code. The GEOCODE procedure uses the ZIP method when a ZIP+4 method finds no match between ZIP+4 values and when a single ZIP code match is found
ZIP+4
A match was found for the ZIP code and ZIP+4 extension. The location is the center of the street segment.
ZIP mean
Multiple observations in the lookup data set specified with the PLUS4 geocoding method matched the five-digit ZIP code and the matching latitude and longitude coordinate values were averaged. The GEOCODE procedure uses this method when no matches are located with the ZIP+4 values.
City
A match was found for the city and state. The GEOCODE procedure uses this method when either a specified ZIP or ZIP+4 method fails to return a match
City mean
Multiple observations in the SASHELP.ZIPCODE lookup data set matched the city and state and the matching latitude and longitude coordinate values were averaged.
variable-name
For CUSTOM and RANGE geocoding, a variable name indicates that a match was found for that variable.
None
No match was found for the address.
M_CITY
contains the city name for a city and state match. The M_CITY value is the match value from the lookup data.
M_STATE
contains the two-character postal code for a city and state match. The M_STATE value is the match value from the lookup data.
M_ZIP
contains the ZIP code value for a ZIP level match from the lookup data.
M_OBS
contains the row number for the matching observation in the street matching lookup data set.
_STATUS_
indicates the type of match that was found. The following values are used with the _STATUS_ variable:
City and State Match
The street address did not match but a match was found for the city name and two-character postal code.
Found
The street address matched.
ZIP Match
The street address did not match but a match was found for the ZIP code.
(Blank)
No match was found.
_NOTES_
contains tokens that provide additional information about the match. For more information, see Street Geocoding Note Values.
_SCORE_
Contains a numeric value indicating the relative accuracy of the match.

Street Geocoding Note Values

The STREET geocoding method creates a _NOTES_ variable in the output data set. This variable provides details about the quality of the address match by using token strings. For example, the value "AD ZC NM" contains three tokens that indicate that the street name, ZIP code, and house number matched.
Each token in the _NOTES_ value has an associated value, and the sum of these values makes up the value of the _SCORE_ variable.
The following table displays the tokens and their scores:
Tokens for the _NOTES_ Variable
Token
Score
Description
AD
20
The street name matched.
CT
5
The city name matched.
DP
15
The street direction prefix matched.
DS
15
The street direction suffix matched.
ENDNM
0
The house number was outside the ranges of values in the lookup data set for the matching street. The geocoded coordinates for the nearest end of the street were used.
MZC
0
Multiple matches were found for the street address and ZIP code.
MZS
0
Multiple matches were found for the street address and city-state pair.
NM
10
The house number matched.
NMOS
5
The house number matched an address range in the lookup data set, but is on the opposite side of the street from the matched range.
NOCT
-5
The city name and postal code could not be matched in the FIPS data set.
NODPA
-10
The input address had no direction prefix but the matching street did have a direction prefix. For example, the input street name was "Main St." but the matching street was "N Main St."
NODPM
-15
The input address had a direction prefix but it was not the same as the direction prefix of the matching street. For example, the input street name was "North Main St." but the matching street was "Main St."
NODSA
-10
The input address had no direction suffix but the matching street did have a direction suffix. For example, the input street name was "Johnson Ave" but the matching street was "Johnson Ave S."
NODSM
-15
The input address had a direction suffix but it was not the same as the direction suffix of the matching street. For example, the input street name was "Johnson Ave South" but the matching street was "Johnson Ave."
NOLNM
0
The lookup data set contains missing values for the house numbers of the matching street. The geocoded coordinates for the center of the matching street were used.
NONM
0
The input address has no house number. The geocoded coordinates for the center of the matching street were used.
NOTYA
-10
The input address had no street type suffix, but the matching address did have a street type suffix. For example, the input address was "110 Main." but the matching address was "110 Main St."
NOTYM
-20
The street type suffix of the input address was not the same as the type suffix of the matching street. For example, the input street name was "Park St." but the matching street name was "Park Ave."
ST
5
The two-character postal code matched.
TY
20
The street type suffix matched.
ZC
15
The ZIP code matched.

Tips for Street Geocoding

The following table contains suggestions and comments for the STREET geocoding method.
Category
Suggestions and Comments
Most recent software and lookup data
Install the most recent SAS release. The STREET method is relatively new and each release will contain improvements.
Installing the most recent SAS release also updates the SASHELP.PLFIPS and SASHELP.GCTYPE data sets. Both are used in STREET geocoding.
Obtain the most recent street lookup data (including segments and coordinates). The nationwide U.S. lookup data available on SAS Maps Online is updated when the U.S. Census Bureau releases new TIGER/Line files.
The %TIGER2GEOCODE macro program imports TIGER/Line shapefiles from the US Census Bureau and creates the lookup data sets used by PROC GEOCODE’s STREET method. Download this macro program from SAS Maps Online Web site.
For more information see SAS Maps Online Web Site.
Correct data
Examine the input address values for observations that do not get a street match. Here are some common reasons why street matches are not found:
  • The address is a P.O. box or contains apartment information. (See the text immediately below this bulleted list for details.)
  • The direction suffix is positioned such that is it not detected. For example, the input address might be ‘Green Level West Road ’ instead of ‘Green Level Road West.’
  • The street name is not in the TIGER data that is being used.
  • The city or street name in the lookup data uses an alternate spelling (for example, Hillsboro versus Hillsborough).
  • Data is misspelled for street or city names.
  • Digits are incorrect or transposed for house numbers or ZIP codes.
The address processing code attempts to strip apartment numbers, suite numbers, floor numbers, mailbox numbers, P.O. box numbers, and other non-street related address elements. However, depending on how these elements are interspersed in the address string, the GEOCODE procedure might not remove them. You might need to remove them yourself.
Note: Do not remove house numbers.
Adequate amount of data
Include as many elements of a mailing address as possible. The street name is required. Include the house number, the ZIP code, the city, and the state values whenever possible.
ZIP codes
Always include the ZIP code with the input address, if available. If ZIP is omitted, the city and state values must be used, and they can be misleading.
The city value of a street segment from the original TIGER data is not the mailing address city. Instead, it denotes whether that segment is physically within the corporate limits of a city. A missing city value means that the street segment is not within any incorporated city.
For example, a house with a Leon, Iowa mailing address is in an unincorporated portion of Decatur County, Iowa. The TIGER data for this house might have a missing city value despite the house having a valid mailing address.
Also, it is not unusual for a house to be inside one city’s limits but have its mail delivered from another city’s post office branch. For example, there are streets near SAS that are within Apex, North Carolina, but have a Cary, North Carolina mailing address. Specifying the ZIP code for all input addresses can prevent geocoding problems that result from this situation.
The STREET method locates an address using the street name and ZIP code or by using the street name with the city and state. Because a ZIP code represents less data than a city, when you use ZIP codes the procedure runs faster and provides more reliable locations.