GEOCODE Procedure

Understanding City Geocoding

Overview of City Geocoding

This section describes details about geocoding with either U.S. or international city names. The requirements for input data and lookup data vary depending on which area you choose. SAS provides a default lookup data set for each type of geocoding, but an alternate data set can be specified instead.

About City Input Data Sets

For U.S. city geocoding each observation in your input data must contain a city name and its two-character state abbreviation.
For worldwide geocoding each observation in your input address data must include the city name and a two- or three-character country name abbreviation. An example is GB or GBR for the United Kingdom. An optional state, province, or region name can also be specified.

About U.S. City Lookup Data

Starting with the second maintenance release of SAS 9.3, the default lookup data set for U.S. CITY geocoding is MAPSGFK.USCITY_ALL. This data set is licensed by SAS from GfK GeoMarketing GmBH, who is the single source of this data and all of its updates. The data set is covered by the GfK GeoMarketing copyright, is provided with SAS/GRAPH in the MAPSGFK library, and is updated for each SAS release.
You can download updates of the MAPSGFK.USCITY_ALL data set from the SAS Maps Online Web site. For more information, see SAS Maps Online Web Site.
You can specify that non-geocoding variables from the lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.
The default lookup data set MAPSGFK.USCITY_ALL contains the following variables:
Default Name:
Label:
STATE
Numeric State code
COUNTY
Numeric County code
CITY
City name
ID
Region code
ID1
Admin1 code
ID2
Admin2 code
X
Projected longitude using the experimental Miller II projection method
Y
Projected latitude using the experimental Miller II projection method
CITY2
Normalized CITY name for geocoding
Note: Values are uppercased and all spaces and characters that are not alphabetic or numeric are stripped
STATECODE
State postal code
COUNTY_NAME
County Name
LAT
Unprojected degrees latitude
LONG
Unprojected degrees longitude
Capital
Capital city
POP_TYPE
GfK pop2010 category
FEATURE_ID
Census Government ID
ALT_M
Altitude in meters
ALT_FT
Altitude in feet

About World City Lookup Data

Starting with the second maintenance release of SAS 9.3, the default lookup data set for world CITY geocoding is MAPSGFK.WORLD_CITIES. This data set contains more than 200,000 cities worldwide. The data set is licensed by SAS from GfK GeoMarketing GmBH, who is the single source of this data and all of its updates. The data set is covered by the GfK GeoMarketing copyright, is provided with SAS/GRAPH in the MAPSGFK library, and is updated for each SAS release.
You can download updates of the MAPSGFK.WORLD_CITIES data set from the SAS Maps Online Web site. An unabridged data set named MAPSGFK.WORLD_CITIES_ALL, with more than one million observations, is also available for downloading. The unabridged data set is updated annually. For more information, see SAS Maps Online Web Site.
You can specify that non-geocoding variables from the lookup data set be added to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.
The default lookup data set MAPSGFK.WORLD_CITIES and the unabridged data set MAPSGFK.WORLD_CITIES_ALL both contain the following variables:
Name:
Label:
X
Projected longitude using the experimental Miller II projection method.
Y
Projected latitude using the experimental Miller II projection method.
ID
Country Alpha code.
MapIDName2
Normalized state or province name for geocoding.
Note: Values are uppercased and stripped of all spaces and characters that are not alphabetic or numeric.
CITY2
Normalized CITY name for geocoding.
Note: Values are uppercased and stripped of all spaces and characters that are not alphabetic or numeric.
CONT
Numeric number for continent.
ISONAME
ISO country name.
CITY
City name.
CtType
POP categories for cities where applicable.
Rank
Grouping of CtType high to low.
Vintage
Recorded Year of data.
LONG
Unprojected degrees longitude.
LAT
Unprojected degrees latitude.
ISO
ISO country code.
ISOALPHA2
ISO Alpha2-code for country.
ISOALPHA32
ISO Alpha3-code for country.
MapID
ID from MAP data set.
Note: The map data set specified in the SAS/GRAPH GMAP procedure
MapIDName
IDNAME from MAP data set.
Note: The map data set specified in the SAS/GRAPH GMAP procedure
MapLevel
Map level.
IDNAME
Country name.
MapIDName1
ID1NAME from MAP data set

About Alternate U.S. City Lookup Data

The GEOCODE procedure uses the MAPSGFK.USCITY_ALL as the default lookup data set for U.S. CITY geocoding. However, you can use the LOOKUPCITY= option to specify an alternate lookup data set. For example, if you want to exclude Puerto Rico and the U.S. Virgin Islands, use the MAPSGFK.USCITY data set. The data set is covered by the GfK GeoMarketing copyright, is provided with SAS/GRAPH in the MAPSGFK library, and is updated for each SAS release. Another alternate data set is SASHELP.ZIPCODE, which is made available with SAS/GRAPH in the SASHELP library. See About the SASHELP.ZIPCODE Lookup Data Set for details.
When using the lookup data set SASHELP.ZIPCODE, which contains multiple observations for cities, the city location is found by averaging the ZIP code data for the city and state to find the mean location.
Any other alternate lookup data set for U.S. CITY geocoding must contain the following information. These exact variable names are not required, but the variable values must contain the following information:
CITY
Name of the city (does not have to be normalized)
Note: Values are not uppercased and spaces and non-alphabetic and non-numeric characters remain intact.
STATECODE
Two-character abbreviation of the state name
Note: Instead of a two-character STATECODE, you have the option of using the complete state name. However, the state values in the alternate lookup data set must match the values that are in your input address data. The state names do not have to be normalized, meaning that values are not uppercased and spaces and non-alphabetic and non-numeric characters remain intact. When using a custom lookup data set, the GEOCODE procedure will uppercase the state name and will remove special characters and spaces.
LAT
Latitude of the city center
LONG
Longitude of the city center

About Alternate World City Lookup Data

The GEOCODE procedure uses the MAPSGFK.WORLD_CITIES as the default lookup data set for world CITY geocoding. However, you can use the LOOKUPCITY= option to specify an alternate lookup data set. An example data set is MAPSGFK.WORLD_CITIES_ALL, which is made available for download from SAS MapsOnline ( http://support.sas.com/rnd/datavisualization/mapsonline/html/downloads.html).
Any other alternate lookup data set for worldwide CITY geocoding must contain the following information. These exact variable names are not required, but the variable values must contain the following information:
CITY
Name of the city
ISOALPHA2 or ISOALPHA3
Two- or three-character abbreviation of the country name
Note: Instead of a two- or three-character country abbreviation, you can use the complete country name. However, the country values in the alternate lookup data set must match the values that are in your input address data.
LAT
Latitude of the city center
LONG
Longitude of the city center
The alternate world CITY geocoding lookup data set can also include an optional state, province, or region name. This allows the GEOCODE procedure to differentiate between cities that share the same name but are located in different regions. The state, province, or region names in the lookup data do not have to be normalized, meaning uppercased and stripped of spaces and non-alphabet and non-numeric characters.

Tips for City Geocoding

The following table contains suggestions and comments for using the CITY geocoding method.
Category
Suggestions and Comments
Most recent software and lookup data
Installing the most recent SAS release updates the following data sets that can be used by the GEOCODE procedure:
  • MAPSGFK.USCITY_ALL
  • MAPSGFK.WORLD_CITIES
  • SASHELP.ZIPCODE
MAPSGFK.WORLD_CITIES_ALL is an unabridged version of world cities available for download. See SAS Maps Online Web Site for details.
You can also download quarterly updates of SASHELP.ZIPCODE from SAS Maps Online. See SAS Maps Online Web Site for details.
Correct data
Here are some common reasons why CITY matches are not found:
  • The city name is misspelled.
  • The city name in the lookup data uses an alternate spelling (for example, Hillsboro versus Hillsborough).
  • The two-character state ID is transposed (for example, YN instead of NY).
  • The non-US city name in the lookup data uses a local spelling.
  • There are multiple cities with that name in the country. Use the ADDRESSSTATEVAR= option to specify a state, province, or region variable to differentiate the cities.
  • The town can be very small and not included in the MAPSGFK.WORLD_CITIES data. Download the more complete MAPSGFK.WORLD_CITIES_ALL data set. See SAS Maps Online Web Site for details.