GEOCODE Procedure

Overview: GEOCODE Procedure

About the GEOCODE Procedure

Geocoding is the process of adding geographic coordinates (latitude and longitude values) to an address. This process provides a way to convert address data into map locations. The geographic coordinates typically represent the center of a ZIP code, a city, an address, or any geographic region. After geocoding, the coordinates can be used to display a point on a map or to calculate distances. Geocoding also enables you to add attributes values such as census blocks to an address.
The GEOCODE procedure processes geographic information for the following entities:
  • street addresses
  • cities
  • U.S. ZIP codes and ZIP+4 extension codes
  • foreign postal codes
  • custom variables in the data set, such as sales territories
  • Internet Protocol (IP) addresses or other ranges
    Note: The process of adding coordinates for IP addresses is usually called geolocating. IP data is a form of range data and was not designed to be geographic. For more information, see Understanding Range Geocoding .

About the Required Input Data

The GEOCODE procedure requires two types of SAS data sets:
input address data sets
contain variables that relate to specific geographic locations. For example, a data set might contain mailing address variables such as ZIP codes and street addresses, or custom geographic variables such as sales regions.
lookup data sets
contain reference variables and geographic coordinates. For example, a lookup data set for the ZIP method contains ZIP codes and the geographic coordinates that are associated with the ZIP codes. Some geocoding methods require multiple lookup data sets. This data is essential to transform address data into location information that can be viewed on a map.
Lookup data sets can also contain attribute variables containing data about the locations.
For each observation in the input data set, the GEOCODE procedure attempts to match the address variable value to a value in the lookup data set. To increase the chances of a match, default lookup variable values in lookup data sets supplied by SAS are always normalized; uppercased with special characters and spaces removed. These data sets are either available with the SAS release or available for download from the SAS Maps Online Web site.
If you choose to geocode with any non-default lookup variable that is contained in a lookup data set supplied by SAS, there is a significantly decreased chance of matching data.
The GEOCODE procedure normalizes the default lookup variable values in lookup data sets that are not supplied by SAS.
The GEOCODE procedure is not shipped with all the lookup data that you might require. In some cases, you must download or purchase the data. You can download lookup data sets from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.
SAS provides macro code programs to import some third-party data. These macros can be modified to import data from additional sources. Example macros are %TIGER2GEOCODE, %ABS2GEOCODE, and %CODEPOINT2GEOCODE. The macros and their accompanying documentation are available for download from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.

See Also

%TIGER2GEOCODE macro code program for importing TIGER shapefiles for specific states and counties is described in Obtaining Street Lookup Data Sets
%ABS2GEOCODE and %CODEPOINT2GEOCODE macro code programs for importing postcode data from other countries are described in Non-U.S. Postcodes

SAS Maps Online Web Site

The SAS Maps Online Web site contains map-related information for areas throughout the world. You can easily locate and identify specific regions in each of the following categories: world maps, continents, countries, and maps of political groups.
The Web site contains the following:
  • archived maps from previous releases
  • sample programs
  • recent mapping and geocoding updates
  • geocoding examples, techniques, and lookup data
  • macro code programs
Follow these steps to access this map-related information:
  1. The SAS Maps Online Web site can be accessed at www.sas.com/mapsonline
  2. Click on the world image on the page to enter Maps Online.
  3. Click the Downloads link in the banner to obtain the geocoding downloads mentioned in this chapter. (To download SAS macro code programs and other tools, click the Resources link in the banner and then click Tools in the left navigation bar.)
    Maps Online Banner
  4. Click the Geocoding link in the left navigation bar on the Downloads page to access the geocoding downloads.
    Maps Online Geocoding link
  5. You can click on any of the available links in the left navigation bar to access other information, such as archived maps or recent mapping and geocoding updates.

About the Output Data

The GEOCODE procedure adds matching geographic coordinates to the observations in the output data set. In addition, the GEOCODE procedure adds a variable named _MATCHED_ that indicates how the coordinates were found. You can also choose to add variables from the lookup data set to the output data set by using the ATTRIBUTEVAR= option.
For more information, see Understanding Output Data.

Deciding Which Lookup Data to Use

The type of geocoding you want to do determines the type of lookup data that is required. Granularity of information is an important consideration in determining which geocoding process to use. For example, does the location need to be an actual house location, or is a ZIP code or even a city sufficient? If you are viewing the addresses on a state or U.S. map, then the ZIP code or city location is probably accurate enough.
The age of the lookup data also affects your decision. How current does the data need to be? Street address data frequently changes with the addition of new roads and changes to postal codes. The older your lookup data, the more likely it is that some address matches might be incorrect or missed completely. On the other hand, city and state lookup data do not change as often.
The more up-to-date, accurate, and fine-grained the data, the more it costs to purchase and maintain. Also, higher-resolution data requires more disk storage space and takes longer to geocode. There are free sources for some types of data, but these might not be updated as frequently as the data you purchase.
It is important to remember that both purchased and free lookup data might give incorrect results. There are no guarantees with any geocoding lookup data, so the results should be with used with caution.