GEOCODE Procedure

Understanding ZIP Code Geocoding

Overview of ZIP Code Geocoding

The terms ZIP code and postal code are used interchangeably. Both terms refer to a group of linear mail delivery routes. ZIP code is the United States Postal Service's designation while postal code is used by other national post offices. ZIP codes can also be assigned to a single building or to a post office. Whereas a ZIP code is not considered a polygonal area, a county, state, or country is considered such.
Generally, ZIP code address data specifies a center location for the ZIP code. The center approximates the geographic center of the ZIP code. The government does not create ZIP code boundaries. Each data vendor draws its own boundaries, which causes ZIP center variations among different vendors’ products.
The basic ZIP code in the United States is five digits. ZIP code data provides the general vicinity of an address, but usually not the actual street. If you want to locate the street, you can try using ZIP+4 geocoding. For more information, see Understanding ZIP+4 Geocoding .

About ZIP Code Input Data

Your input data must contain a valid ZIP or postal code for each observation. If a ZIP code is not found, then the GEOCODE procedure attempts to find the city center location. If you are interested in the ZIP code location only, you can turn off this behavior using the NOCITY option in the GEOCODE statement.

About ZIP Code Lookup Data

ZIP code geocoding uses the SASHELP.ZIPCODE lookup data set by default. You can use the LOOKUP= option to specify alternate data sources. The CITY method has no default lookup data set, but SASHELP.ZIPCODE can be used for U.S. cities. You can also import postal codes outside the U.S. from other sources of data. To import postal codes, you can use the IMPORT procedure or a DATA step, depending on the format of the third-party data.
After you import non-US postal codes with centroids into a SAS data set, you can use this data set as the lookup data set with the ZIP method. For an explanation of the sources of free Australian and British postal code locations, see Non-U.S. Postcodes.
You can specify that non-geocoding variables from the lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.

About the SASHELP.ZIPCODE Lookup Data Set

The default lookup data set for ZIP code geocoding is SASHELP.ZIPCODE. This data set can also be used as an alternate lookup data set with the CITY method for U.S. locations. SASHELP.ZIPCODE is provided with Base SAS (in the SASHELP library) and is updated for each SAS release.
You can download quarterly updates of the SASHELP.ZIPCODE data set from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.
SASHELP.ZIPCODE contains the following variables:
Name:
Label:
ZIP
The 5-digit ZIP code
Y
Latitude (decimal degrees) of the center of the ZIP code. 0.0 for APO and FPO
X
Longitude (decimal degrees) of the center of the ZIP code. 0.0 for APO and /FPO
ZIP_CLASS
ZIP code classification: M=APO and FPO; P=Post office box; U=Unique ZIP code used for large organization, businesses, or buildings; Blank=Standard/non-unique
CITY
Name of the city or organization
STATE
Two-digit number (FIPS code) for the state or territory
STATECODE
Two-letter abbreviation for the state name
STATENAME
Full name of the state or territory
COUNTY
FIPS county code. Blank for APO and FPO.
COUNTYNM
Name of county or parish. No county for the APO and FPO.
MSA
Metropolitan Service Area code by common population; no MSA for rural areas
AREACODE
Area code for the ZIP code. None for the APO and FPO.
AREACODES
Multiple area codes for the ZIP code. None for the APO and FPO.
ALIAS_CITY
Alternate names for the city. Each name is separated by “||”.
TIMEZONE
Time zone for the ZIP code. None for the APO and FPO.
GMTOFFSET
Difference (hours) between GMT and time zone for the ZIP code.
DST
ZIP code observes Daylight Savings Time: Y is Yes N is No
PONAME
USPS Post Office name

About Alternate ZIP Code Lookup Data

The SASHELP.ZIPCODE data set is the default lookup data set for the GEOCODE procedure. However, data from other sources can be used as long as it is read into a SAS data set.
For ZIP code geocoding, any lookup data set must contain the following variables:
Default Name:
Description:
ZIP
Five-digit ZIP code
X
Longitude of the center coordinate
Y
Latitude of the center coordinate
Note: The character values in your input and lookup data sets do not need to be a case-sensitive match. Character value matching in the GEOCODE procedure is not case sensitive.
Additional attribute variables can also be in the alternate lookup data set. You can specify that non-geocoding variables from the lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.

Non-U.S. Postcodes

Postcode data from other countries can be used with the ZIP method if it includes either longitude and latitude or X and Y coordinates. You will have to import postcode data into a SAS data set, which becomes the lookup data set for the ZIP method. If the postcodes contain alpha characters, the characters should be uppercased. Remove any spaces and punctuation from the postcodes.
In addition, make sure the geographic system of the imported data is compatible with your needs. For example, if you want geocoded locations in the World Geographic System 1984 (WGS84) geodetic datum, your imported values should be in that datum. If they are not, then you must apply either the appropriate coordinate conversions or datum transformations or both.
Great Britain’s national mapping agency, the Ordnance Survey (OS), provides location data for 1.7 million Royal Mail postcodes in their free Code-Point Open product. The SAS macro program %CODEPOINT2GEOCODE imports the OS files. The GEOCODE procedure uses these OS files to locate British mailing addresses by postcode. The macro also converts the British National Grid coordinates to World Geodetic System 1984 (WGS84) longitude and latitude.
The Australian Bureau of Statistics (ABS) produces a generalized map of Australian postal areas. The SAS macro program %ABS2GEOCODE imports those postcode areas, computes the centroids, and creates a lookup data set for use by the GEOCODE procedure. The macro also creates a SAS/GRAPH map data set of the postal area polygons.
The %CODEPOINT2GEOCODE and the %ABS2GEOCODE macro programs and their accompanying documentation are available for download from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.

U.S. Military ZIP Codes

ZIP codes for U.S. military post offices are provided in the ZIPMIL data set in the SASHELP library. You can combine this data set with the ZIPCODE data set to support military ZIP codes.

Tips for ZIP Code Geocoding

The following table contains suggestions and comments for the ZIP geocoding method.
Category
Suggestions and Comments
Most recent software and lookup data
Installing the most recent SAS release updates the SASHELP.ZIPCODE data set that is used in ZIP geocoding.
You can also download quarterly updates of SASHELP.ZIPCODE from SAS Maps Online. See SAS Maps Online Web Site.
Correct data
Remove all data entry errors from your addresses, if possible. For example, transposed digits in the input ZIP code will either fail to find a match, or worse, return a completely wrong location.