GEOCODE Procedure

Understanding Output Data

How Geocoded Data Is Produced

When the GEOCODE procedure finds a match in the lookup data set, the procedure adds the associated coordinates to the observation in the output data set. Longitude is stored as the X variable, and latitude is stored as the Y variable.
The following image shows how the GEOCODE procedure obtains coordinates for the output data set by matching the ZIP code in the input data set:
Geocoding with ZIP Codes
Geocoding with ZIP Codes
The GEOCODE procedure also adds a variable named _MATCHED_ that indicates how the coordinates were found. Possible values for the _MATCHED_ variable are as follows:
Street
A match was found for either the street address and ZIP code or the street address, city, and state.
ZIP
A match was found for the ZIP code.
ZIP+4
A match was found for the ZIP code and ZIP+4 extension.
ZIP mean
Multiple observations in the lookup data set specified with the PLUS4 geocoding method matched the five-digit ZIP code and the matching latitude and longitude coordinate values were averaged.
City
A match was found for the city and state.
City mean
Multiple observations in the SASHELP.ZIPCODE or user-supplied lookup data set matched the city and state. In either case, the matching latitude and longitude coordinate values were averaged.
variable-name
For CUSTOM and RANGE geocoding, a variable name indicates that a match was found for that variable.
None
No match was found for the address.
For each observation in the input data set, the GEOCODE procedure attempts to match the address variable value to a value in the lookup data set. For most geocoding methods, the lookup data set is expected to contain only one matching observation. For example, the SASHELP.ZIPCODE data set contains only one observation for each ZIP code. If the lookup data set contains multiple matches, then the first matching observation is returned, except as noted in the following paragraph.
Some geocoding methods process multiple matches. For example, if you are using ZIP code geocoding and no match is found, then the GEOCODE procedure attempts to find a matching city-and-state pair. The SASHELP.ZIPCODE data set contains multiple observations for many city-and-state pairs. When a ZIP code is not found in this lookup data set, the GEOCODE procedure searches for a matching city-and-state pair. If one match is found, then the coordinates for the matching pair are used. When the GEOCODE procedure uses either the STREETor PLUS4 geocoding method and no match is found for the combined ZIP code and ZIP+4 values, it then searches for the five-digit ZIP code only.

Output Data Sets

By default, the GEOCODE procedure produces an output data set that contains all of the variables from the input address data set and the X, Y, and _MATCHED_ variables. The X and Y coordinates use the same system as the lookup data set. The lookup data coordinate system is typically based on world latitudes and longitudes, but X-Y values in a specific map projection can also be used. If you want to use a different coordinate system for the output, you can convert the geocoded coordinates using a projection system application such as the GPROJECT procedure.
The default name for the output data set is DATAn, where n is the smallest integer that makes the name unique. For example, if the DATA1 data set already exists, then the default name for the output data set is DATA2.
The label of the output data set contains the text, "Geocoded date" where date is the date when the output was created. This text is appended to the label from the input data set, if one exists.
For the STREET geocoding method, additional variables are included in the output data set. See Output Variables for Street Geocoding..

Adding Variables to the Output Data Set

You can specify that non-geocoding variables from the lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement. For example, if you are using SASHELP.ZIPCODE as the lookup data set, then you could assign the county name (COUNTYNM) to each matched observation in the output data set.