GEOCODE Procedure

Optimizing Performance

Overview of Enhancing Performance

Geocoding often requires very large lookup data sets, which can affect the performance of the GEOCODE procedure. You can optimize your geocoding performance by performing the following actions:
  • Index your lookup data sets by using the appropriate variables.
  • Load the lookup data sets into memory by using the SASFILE statement.
  • Minimize running over a network or to an external disk drive
  • Avoid using cross-environment data access (CEDA) when accessing lookup data sets.

Indexing Your Lookup Data Sets

If you use alternative lookup data sets, then indexing your lookup data sets can improve performance. You should create an index by using the variables that are appropriate for your geocoding method.
Note: The SASHELP.ZIPCODE data set and the ZIP4 data set from SAS Maps Online are optimized for use with the GEOCODE procedure. In addition, data sets that you convert by using the %GCDMEL9 and %MAXMIND autocall macros are indexed automatically. No modifications are needed for any of these data sets.
Note: The STREET geocoding lookup data sets that are provided by SAS are already indexed for the GEOCODE procedure.
If you use SAS procedures to copy or move the lookup data sets, any associated indexes are preserved. However, if you use an operating system utility and do not also copy or move the index files, any indexes will need to be rebuilt.
For ZIP+4 geocoding, you should create a simple index on the ZIP variable and a compound index on the ZIP and ZIP+4 variables.
For RANGE geocoding, you should sort your lookup data set by the key variable, and then create a simple index with the key variable. You should sort the range data set by the beginning IP address variable, and then create two simple indexes for the beginning and ending IP address variables.
For more information, see Understanding SAS Indexes in SAS Language Reference: Concepts.

Loading Data Sets into Memory

You can load your lookup data sets into memory by using the SASFILE statement. Loading data into memory reduces I/O processing and can improve the speed of your geocoding operation. You should test your geocoding operations with the lookup data sets loaded into memory to determine whether there is sufficient memory and whether your performance is increased.
For more information, see SASFILE Statement in SAS Statements: Reference in the SAS Statements: Reference.

Minimizing the Use of a Network

When geocoding large numbers of addresses, network issues can affect run times. You can obtain faster run times occur if the lookup data sets are stored locally, and if they are stored on an internal disk rather than on an external drive.
However, system administrators sometimes install the large lookup data sets in a central location that can be accessed only over the network. In that case, you can obtain faster run times by geocoding large amounts of data during times of less network traffic.

Avoiding the Use of CEDA When Accessing Lookup Data Sets

The cross-environment data access (CEDA) feature enables a SAS session on one operating system to access data sets residing on another. For example, a SAS session running under Windows can process data sets that reside on a UNIX system.
Because CEDA does not support the use of indexes, it can negatively affect geocoding. This is not a concern when you geocode a small number of addresses. However, it does impede larger geocoding runs.
For best performance, the lookup data sets should reside on the same operating system used for geocoding. You can either install the lookup data sets there initially or use either CPORT or CIMPORT to move them. If you import them, make sure that the appropriate data set indexes are preserved or rebuilt.

See Also

IBUFSIZE= System Option in SAS System Options: Reference