Previous Page | Next Page

The GEOCODE Procedure

Overview of the GEOCODE Procedure

Geocoding is the process of adding geographic coordinates (latitude and longitude values) to an address. The coordinates typically represent the center of a ZIP code, a city, an address, or any geographic region. After geocoding, the coordinates can be used to display a point on a map or to calculate distances. Geocoding also enables you to add attribute values such as census blocks to an address.

The GEOCODE procedure requires two types of SAS data sets:

input address data sets

contain variables that relate to specific geographic locations. For example, mailing address variables such as ZIP codes and street addresses, or custom geographic variables such as sales regions.

lookup data sets

contain reference variables and geographic coordinates. For example, a lookup data set for the ZIP method contains ZIP codes and the geographic coordinates that are associated with the ZIP codes. Some geocoding methods require multiple lookup data sets.

When the GEOCODE procedure finds a match in the lookup data set, it adds the associated coordinates to the observation in the output data set. Longitude is stored as the X variable, and latitude is stored as the Y variable.

The following image shows how the ZIP geocoding method of the GEOCODE procedure obtains coordinates for the output data set by matching the ZIP code in the input data set:

Geocoding with ZIP Codes

[A diagram showing how ZIP code geocoding works]

The GEOCODE procedure also adds a variable named _MATCHED_ that indicates how the coordinates were found. Possible values for the _MATCHED_ variable are as follows:

Street

A match was found for either the street address and ZIP code or the street address, city, and state.

ZIP

A match was found for the ZIP code.

ZIP+4

A match was found for the ZIP code and ZIP+4 extension.

ZIP mean

Multiple observations in the lookup data set matched the ZIP code, and the coordinate values were averaged.

City

A match was found for the city and state.

City mean

Multiple observations in the lookup data set matched the city and state, and the coordinate values were averaged.

variable-name

For CUSTOM and RANGE geocoding, a variable name indicates that a match was found for that variable.

None

No match was found for the address.

For each observation in the input data set, the GEOCODE procedure attempts to match the address variable value to a value in the lookup data set. For most geocoding methods, the lookup data set is expected to contain only one matching observation. For example, the SASHELP.ZIPCODE data set contains only one observation for each ZIP code. If the lookup data set contains multiple matches, then the first matching observation is returned, except as noted in the following paragraph.

Some geocoding methods do process multiple matches. For example, if you are using ZIP code geocoding and no match is found, then the GEOCODE procedure attempts to find a matching city-and-state pair. The SASHELP.ZIPCODE data set contains multiple observations for many city-and-state pairs. When a ZIP code is not found in this lookup data set, a matching city-and-state pair is searched for. If one match is found, then the coordinates for the matching pair are used. However, if multiple matches are found, then the coordinate values for those matches are averaged. If you are using the STREET or PLUS4 geocoding method and no match is found for the combined ZIP code and ZIP+4 values, then the GEOCODE procedure searches for the five-digit ZIP code only.

Previous Page | Next Page | Top of Page