Previous Page | Next Page

The GEOCODE Procedure

Concepts


Output Data Sets

By default, the GEOCODE procedure produces an output data set that contains all of the variables from the input address data set and the X, Y, and _MATCHED_ variables. You can also choose to add variables from the lookup data set to the output data set by using the ATTRIBUTEVAR= option. For example, if you are using SASHELP.ZIPCODE as the lookup data set, then you could assign the county name (COUNTYNM) to each matched observation in the output data set.

The default name for the output data set is DATAn, where n is the smallest integer that makes the name unique. For example, if the DATA1 data set already exists, then the default name for the output data set is DATA2.

The label of the output data set contains the text, "geocoded date" where date is the date when the output was created. This text is appended to the label from the input data set, if one exists.

For the STREET geocoding method, additional variables are included in the output data set. See Output Variables for Street Geocoding.


The SASHELP.ZIPCODE Data Set

The default lookup data set for ZIP code geocoding and CITY geocoding is SASHELP.ZIPCODE. This data set is provided with Base SAS, and is updated for each SAS release.

You can download updated versions of the SASHELP.ZIPCODE data set from the SAS Maps Online Web site: www.sas.com/mapsonline.

SASHELP.ZIPCODE contains the following variables:

Name:

Label:

ZIP

The 5-digit ZIP code

Y

Latitude (decimal degrees) of the center of the ZIP code. 0.0 for APO/FPO

X

Longitude (decimal degrees) of the center of the ZIP code. 0.0 for APO/FPO

ZIP_CLASS

ZIP code classification: M=APO/FPO; P=Post office box; U=Unique ZIP code used for large organization, businesses, or buildings; Blank=Standard/non-unique

CITY

Name of the city or organization

STATE

Two-digit number (FIPS code) for the state or territory

STATECODE

Two-character postal code for the state or territory name

STATENAME

Full name of the state or territory

COUNTY

FIPS county code. Blank for APO/FPO addresses.

COUNTYNM

Name of county or parish. Blank for APO/FPO addresses.

MSA

Metropolitan Service Area code by common population; no MSA for rural areas

AREACODE

Area code for the ZIP code. Blank for APO/FPO addresses.

AREACODES

Multiple area codes for the ZIP code. Blank for APO/FPO addresses.

ALIAS_CITY

Alternate names for the city. Each name is separated by "||".

TIMEZONE

Time zone for the ZIP code. Blank for APO/FPO addresses.

GMTOFFSET

Difference (hours) between GMT and time zone for the ZIP code.

DST

ZIP code observes Daylight Savings Time: Y is Yes N is No

PONAME

USPS Post Office name


Alternate ZIP Code and ZIP+4 Lookup Data Sets

While the SASHELP.ZIPCODE data set is the default lookup data set for the ZIP and CITY geocoding methods, data from other sources can be used as long as it is read into a SAS data set.

For ZIP code geocoding, any lookup data set must contain the following variables:

Default Name:

Description:

ZIP

Five-digit ZIP code

X

Longitude of the center coordinate

Y

Latitude of the center coordinate

For CITY geocoding, these additional variables are required:

CITY

Name of the city

STATECODE

Two-character postal code for the state or province name

Note:   If you use an alternative ZIP code lookup data set, then the variable data types should match those of the SASHELP.ZIPCODE data set.  [cautionend]

When you use ZIP+4 geocoding, you must specify an alternative lookup data set because the SASHELP.ZIPCODE data set does not contain any ZIP+4 values. This data set must contain the following variables:

Default Name:

Description:

ZIP

Five-digit ZIP code

PLUS4

Four-digit ZIP+4 extension

X

Longitude of the central coordinate

Y

Latitude of the central coordinate

You can specify different names for the variables by using options in the PROC GEOCODE statement. For example, the LOOKUPPLUS4 option specifies the name of the ZIP+4 extension variable in the lookup data set.

The ZIP and PLUS4 variables can contain either character data or numeric data. The data type must match the type of the corresponding variable in your input data set.

Note:   The character values in your input and lookup data sets do not need to be a case-sensitive match. Character value matching in the GEOCODE procedure is not case sensitive.   [cautionend]

Additional attribute variables can also be in the alternate lookup data set even if they are not used to find matches. You can add these variables to the output data set by using the ATTRIBUTEVAR= option in the PROC GEOCODE statement.

You can obtain a lookup data set for ZIP+4 geocoding from the SAS Maps Online Web site at www.sas.com/mapsonline. On the Downloads page, select Geocoding to access the downloads that are related to geocoding.

An alternative source for ZIP+4 lookup data is the Geo*Data product from Melissa Data. You can use the %GCDMEL9 autocall macro to convert Geo*Data files to SAS data sets. For more information, see %GCDMEL9 Autocall Macro.


U.S. Military ZIP Codes

ZIP codes for U.S. military post offices are provided in the ZIPMIL data set in the SASHELP library. You can combine this data set with the ZIPCODE data set to support military ZIP codes.


Data Sets for Range Geocoding

Note:   Range geocoding is for SAS 9.2 Phase 2 and later.  [cautionend]

For Range geocoding, a lookup data set and a range data set are required. The range data set identifies ranges of IP addresses. The lookup data set contains geographic coordinates. Both the range data set and the lookup data set must contain a key variable that identifies locations for each IP range.

The lookup data set must contain the following variables:

The range data set must contain the following variables:

You can obtain lookup and range data from third-party vendors. One vendor is MaxMind, Inc. at www.maxmind.com . You can use the %MAXMIND autocall macro to convert comma-separated value (CSV) files from MaxMind into SAS data sets. For more information, see %MAXMIND Autocall Macro.


%GCDMEL9 Autocall Macro


Overview of the %GCDMEL9 Autocall Macro

The %GCDMEL9 autocall macro enables you to directly import Geo*Data files from Melissa Data as SAS data sets. Geo*Data files contain third-party ZIP+4 lookup data for use with PLUS4 geocoding.

Geo*Data files are available for each state. The files are provided as text files within compressed (ZIP) archives. Melissa Data also provides the PKUNZIP utility to extract the text files.

The %GCDMEL9 macro uses the following macro variables:

DATASETNAME

specifies the name of the output data set.

DATASETPATH

specifies the location where the output data set is created.

DATASETLABEL

(optional) specifies a label for the output data set.

LIBNAME

specifies the name for a new library that is assigned for the location that you specified in the DATASETPATH macro variable.

UNZIPPEDPATH

specifies the location of the extracted Geo*Data files that you want to import. The %GCDMEL9 macro attempts to read all of the text (.txt) files in this directory.

WORKPATH (Optional)

specifies the path where temporary files are written. The default path is the path for the WORK library.


Usage Example for the %GCDMEL9 Autocall Macro

In this example, a Geo*Data file for the state of Delaware (DE.txt) is extracted to C:\Mydata. The lookup data set is created in the directory C:\Geocode and assigned the libref ZIP4. The resulting data set is named ZIP4.DELAWARE.

The following code imports the data:

/* Define macro variables */
   %let UNZIPPEDPATH=C:\Mydata;
   %let DATASETPATH=C:\Geocode;
   %let DATASETNAME=Delaware;
   %let LIBNAME=ZIP4;
   %let DATASETLABEL=ZIP+4 lookup data for Delaware;
   /* Submit autocall macro */
   %GCDMEL9;


%MAXMIND Autocall Macro


Overview of the %MAXMIND Autocall Macro

The %MAXMIND autocall macro enables you to convert IP geocoding data from MaxMind, Inc. into SAS data sets. The %MAXMIND autocall macro supports MaxMind's IP data in comma-separated value (CSV) format.

Note:   This feature is for SAS 9.2 Phase 2 and later.  [cautionend]

The %MAXMIND macro uses the following macro variables:

CSVPATH

specifies the path where the MaxMind CSV files are located. You must extract the files from the ZIP archive before using the %MAXMIND autocall macro.

IPDATAPATH

specifies the path where the output SAS data sets are created. You must have write permissions for this path.

CSVBLOCKSFILE

specifies the filename for the CSV file that contains IP address range values. The file that you specify must contain the startIpNum and endIpNum variables.

CSVLOCATIONFILE

specifies the filename for the CSV file that contains longitude and latitude values.

CSVCOUNTRYFILE (Optional)

specifies the name of the optional MaxMind CSV file that contains country names.

WORKPATH (Optional)

specifies the path where temporary files are written. The default path is the path for the WORK library.

The %MAXMIND macro creates the CITYBLOCKS and CITYLOCATION data sets in the path that you specified for the IPDATAPATH variable. The libref IPDATA is created automatically for this path.


Usage Example for the %MAXMIND Autocall Macro

In this example, data from MaxMind is located in C:\Mydata. The output SAS data sets are created in the directory C:\Geocode.

The following code imports the data:

%let CSVPATH=C:\Mydata;
%let IPDATAPATH=C:\Geocode;
%let CSVBLOCKSFILE=GeoLiteCity-Blocks.csv;
%let CSVLOCATIONFILE=GeoLiteCity-Location.csv;
%let CSVCOUNTRYFILE=GeoIPCountryWhois.csv;
%maxmind;

The imported data sets are IPDATA.CITYBLOCKS and IPDATA.CITYLOCATION.


Optimizing Performance


Overview of Enhancing Performance

Geocoding often requires very large lookup data sets, which can affect the performance of the GEOCODE procedure. You can optimize your geocoding performance by performing the following actions:


Indexing your Lookup Data Sets

If you use alternative lookup data sets, then indexing your lookup data sets can improve performance. You should create an index by using the variables that are appropriate for your geocoding method.

Note:   The SASHELP.ZIPCODE data set and the ZIP4 data set from SAS Maps Online are optimized for use with the GEOCODE procedure. Additionally, data sets that you convert by using the %GCDMEL9 and %MAXMIND autocall macros are indexed automatically. No modifications are needed for any of these data sets.  [cautionend]

Note:   The STREET geocoding data sets that are provided by SAS are already indexed for the GEOCODE procedure.  [cautionend]

For ZIP+4 geocoding, you should create a simple index on the ZIP variable and a compound index on the ZIP and ZIP+4 variables.

For RANGE geocoding, you should sort your lookup data set by the key variable, and then create a simple index with the key variable. You should sort the range data set by the beginning IP address variable, and then create two simple indexes for the beginning and ending IP address variables.

For more information, see Understanding SAS Indexes in the SAS Language Reference: Concepts.


Loading Data Sets Into Memory

You can load your lookup data sets into memory by using the SASFILE statement. Loading data into memory reduces I/O processing and can improve the speed of your geocoding operation. You should test your geocoding operations with the lookup data sets loaded into memory to determine whether there is sufficient memory and whether your performance is increased.

For more information, see SASFILE statement in the SAS Language Reference: Dictionary.

Previous Page | Next Page | Top of Page