Previous Page | Next Page

The GEOCODE Procedure

PROC GEOCODE Statement


Identifies the data set that contains the address data that you want to geocode. You can also specify an output data set, the geocoding method, alternate names for geocoding variables, and additional attributes variables to associate with the matched addresses.


Syntax

PROC GEOCODE <option(s)>;

option(s) can be one or more of the following:

DATA= address-data-set

ADDRESSCITYVAR= character-variable

ADDRESSPLUS4VAR= variable

ADDRESSSTATEVAR= character-variable

ADDRESSVAR= variable

ADDRESSZIPVAR= variable

ATTRIBUTEVAR= variable-list

BEGINRANGEVAR= numeric-variable

ENDRANGEVAR= numeric-variable

FIPS= FIPS-data-set

LOOKUP= lookup-data-set

LOOKUPCITYVAR= character-variable

LOOKUPKEYVAR= variable

LOOKUPPLUS4VAR= variable

LOOKUPSTATEVAR= character-variable

LOOKUPSTREET= street-matching-data-set

LOOKUPVAR= variable

LOOKUPXVAR= numeric-variable

LOOKUPYVAR= numeric-variable

LOOKUPZIPVAR= variable

METHOD= geocoding-method

NOCITY

NOZIP

NOSTIMER

OUT=output-data-set

RANGEDATA= data-set

RANGEDECIMAL

RANGEKEYVAR= variable

TYPE= street-type-data-set


Options

To facilitate converting existing SAS/GIS batch geocoding programs that use the %GCBATCH autocall macro to the GEOCODE procedure, the option name from the %GCBATCH autocall macro is an acceptable alias for most options. For more information, see the SAS/GIS: Spatial Data and Procedure Guide.

DATA= address-data-set

specifies the SAS data set that contains address observations that you want to geocode. If you do not specify this option, then the most recently created SAS data set is used.

Note:   The character variables in your input address data set must be left-aligned. That is, the values must not contain leading spaces. You can use the LEFT function in a DATA step to align your data if necessary.  [cautionend]

ADDRESSCITYVAR= character-variable

specifies the character variable in the input address data set that contains the city names.

Default: CITY
ADDRESSPLUS4VAR= variable

specifies the variable in the input address data set that contains ZIP+4 extensions. The variable can be either numeric or character, but it must be the same type as the ZIP+4 variable in the lookup data set (LOOKUPPLUS4VAR=).

Default: PLUS4
ADDRESSSTATEVAR= character-variable

specifies the character variable in the input address data set. This variable contains the two-character postal code for state (for example, NY).

Default: STATE
ADDRESSVAR= variable

for STREET geocoding, specifies the variable in the address data set that contains the street address values (for example, "1229 North Main St.")

For CUSTOM and RANGE geocoding, the ADDRESSVAR= option specifies the variable in the address data set that contains non-address input values. The variable can be character or numeric. This is used together with the LOOKUPVAR= option to geocode with unconventional values. Examples include internal sales territories, Metropolitan Statistical Areas (MSA), and Internet Protocol (IP) addresses.

Default: For STREET geocoding, the default name is ADDRESS.
ADDRESSZIPVAR= variable

specifies the variable in the input address data set that contains the 5-digit ZIP code values. The variable can be either numeric or character, but it must be the same type as the ZIP code variable in the lookup data set (specified by the LOOKUPZIPVAR= option).

Note:   The values for the ZIP code variable must be five digits. You can use the Z5. format to prepend leading zeros to any ZIP code values that have fewer than five digits.  [cautionend]

Default: ZIP
ATTRIBUTEVAR= (variable-1, variable-2, ...variable-n)

lists non-geocoding variables in the lookup data set that are to be added to the output data set. Examples include county, census block, and time zone. Variable names can be separated by commas or spaces.

Note:   The values for additional attribute variables are not added to observations in output data set where the match type is "City mean" or "ZIP mean".  [cautionend]

Note:   If an attribute variable has the same name as a variable in the address data set, then that variable is not added to the output data set.  [cautionend]

Note:   For the STREET geocoding method, only attribute variables from the street segment lookup data set can be included.  [cautionend]

Example: ATTRIBUTEVAR=(STATENAME, COUNTYNM)
BEGINRANGEVAR= variable

specifies the numeric variable in the your range data set that contains the beginning IP address for each range of addresses.

ENDRANGEVAR= variable

specifies the numeric variable in the your range data set that contains the ending IP address for each range of addresses.

FIPS= FIPS-data-set

specifies a SAS data set that is used STREET geocoding method to convert two-character postal codes and city names into US FIPS codes.

Note:   The values of the city and state variables in the FIPS data set must be uppercase.  [cautionend]

Default: The SASHELP.PLFIPS data set.
LOOKUP= lookup-data-set

specifies a SAS data set that associates coordinates with addresses. The data set is searched for observations that match the address observations. The variables that are required for your lookup data set depend on your geocoding method. See Alternate ZIP Code and ZIP+4 Lookup Data Sets.

The data set can also include other attribute variables (such as COUNTY, TIME ZONE, AREA CODE) that can be added to the address observation by using the ATTRIBUTEVAR= option.

Note:   The character variables in your lookup data set must be left-aligned. That is, the values must not contain leading spaces. You can use the LEFT function in a DATA step to align your data if necessary.  [cautionend]

Default: For the ZIP geocoding method, the SASHELP.ZIPCODE data set is the default. For other methods, you must specify the LOOKUP= option.
LOOKUPCITYVAR= character-variable

specifies the character variable in the lookup data set that contain the city names.

Default: CITY
LOOKUPKEYVAR= variable

specifies the key variable for the lookup data set. The values of the key variable correspond to values in the variable that you specify for the RANGEKEYVAR= option. The data type of the key variable must match the variable that you specify for the RANGEKEYVAR= option.

LOOKUPPLUS4VAR= variable

specifies the variable in the lookup address data set that contains ZIP+4 extensions. The variable can be either numeric or character, but it must be the same type as the ZIP+4 variable in the input address data set (ADDRESSPLUS4VAR=).

Default: PLUS4
LOOKUPSTATEVAR= character-variable

specifies the character variable in the lookup data set that contains the two-character postal code for the state or province.

Default: STATECODE
LOOKUPSTREET= street-matching-data-set

specifies the street matching data set for associating coordinates with addresses when performing STREET geocoding.

The GEOCODE procedure expects the street matching data set to have a name that ends with M. The library must also contain two corresponding datasets whose names end with S (segment) and P (coordinate). For example, if you specify the street matching data set MYMAPS.STREETM, then the MYMAPS library must also contain the STREETS and STREETP data sets.

For more information about the data sets for STREET geocoding, see Data Sets for Street Geocoding.

Default: The SASHELP.USM data set. You can download the USM, USS, and USP data sets for the entire United States from SAS Maps Online Web site at www.sas.com/mapsonline.
LOOKUPVAR= variable

specifies the variable in the lookup data set that contains non-address values. The variable can be character or numeric. This is used together with the ADDRESSVAR= option to geocode with unconventional values. Examples include internal sales territories, Metropolitan Statistical Areas (MSA), and Internet Protocol (IP) addresses.

LOOKUPXVAR= numeric-variable

specifies the numeric variable in the lookup data set that contains the longitude of the geocoding location.

Default: X
LOOKUPYVAR= numeric-variable

specifies the numeric variable in the lookup data set that contains the latitude of the geocoding location.

Default: Y
LOOKUPZIPVAR= variable

specifies the variable in the lookup data set that contains the five-digit ZIP code values. The variable can be either character or numeric, but it must be the same type as ZIP code variable in the input address data set (ADDRESSZIPVAR=).

Note:   The values for a character ZIP code variable must be five digits. You can use the Z5. format to prepend leading zeros to any ZIP code values that have fewer than five digits.  [cautionend]

Default: ZIP
METHOD= geocoding-method

specifies the geocoding method. This parameter is optional. Specify one of the following:

CITY

specifies the CITY geocoding method. The GEOCODE procedure attempts to match the city and state from the address data set with the lookup data set. Separate city and state variables are required in the address and lookup data sets. If multiple matches are found, then the coordinates of the matches are averaged.

Note:   The city and state matching method is case insensitive.  [cautionend]

Requirements: CITY geocoding requires the LOOKUP= option.

CITY geocoding also uses the following options:

ADDRESSCITYVAR=

ADDRESSSTATEVAR=

LOOKUPCITYVAR=

LOOKUPSTATEVAR=

LOOKUPXVAR=

LOOKUPYVAR=

If your data does not use the default variable names for any of these options, then you must specify the options that do not use the default.

CUSTOM

specifies the CUSTOM geocoding method. The GEOCODE procedure attempts to match custom variables that you specify by using the LOOKUPVAR= and ADDRESSVAR= variables. Examples include internal sales territories and Metropolitan Statistical Areas (MSA).

Requirements: CUSTOM geocoding requires the following options:

ADDRESSVAR=

LOOKUP=

LOOKUPVAR=

If your lookup data set does not use the default variable names for X and Y, then the LOOKUPXVAR= and LOOKUPYVAR= options are required.

PLUS4

specifies the PLUS4 geocoding method. The GEOCODE procedure attempts to match the five-digit ZIP code and ZIP+4 extension from the address data set with the lookup data set. If no match is found, then the ZIP method is used instead. If multiple ZIP matches are found, then the coordinates of the matches are averaged.

Interaction: You can disable the secondary matching by using the NOZIP option.
Requirements: PLUS4 geocoding requires the LOOKUP= option.

PLUS4 geocoding also uses the following options:

ADDRESSPLUS4VAR=

ADDRESSZIPVAR=

LOOKUPPLUS4VAR=

LOOKUPZIPVAR=

LOOKUPXVAR=

LOOKUPYVAR=

If your data does not use the default variable names for any of these options, then you must specify the options that do not use the default.

RANGE

specifies the RANGE geocoding method. The GEOCODE procedure attempts to match an Internet Protocol (IP) address from the address data set to a range of IP addresses from the range data set. If a match is found, then a key variable is used to match the IP address to a set of coordinates in the lookup data set.

Note:   This feature is for SAS 9.2 Phase 2 and later.  [cautionend]

Requirements: RANGE geocoding requires the following options:

ADDRESSVAR=

BEGINRANGEVAR=

ENDRANGEVAR=

LOOKUP=

LOOKUPKEYVAR=

RANGEKEYVAR=

If your lookup data set does not use the default variable names for X and Y, then the LOOKUPXVAR= and LOOKUPYVAR= options are required.

STREET

specifies the STREET geocoding method. The GEOCODE procedure attempts to match the street name and ZIP code. If no match is found, then the GEOCODE procedure attempts to match the street name, city name, and two-character postal code. If the second match fails, then the ZIP method and the CITY method are used instead.

If a street match is found, X and Y coordinate values are interpolated by using the house number, street type suffix, directional prefix, and directional suffix from the input address.

Note:   This feature is for the third maintenance release of SAS 9.2 and later.  [cautionend]

For more information about the STREET geocoding method, see Street Geocoding.

Interaction: You can disable the secondary ZIP matching by using the NOZIP option.

You can disable the secondary CITY matching by using the NOCITY option.

Requirements: STREET geocoding uses the following options:

ADDRESSCITYVAR=

ADDRESSSTATEVAR=

ADDRESSZIPVAR=

ADDRESSVAR=

FIPS=

LOOKUPCITYVAR=

LOOKUPSTATEVAR=

LOOKUPSTREET=

LOOKUPZIPVAR=

LOOKUPXVAR=

LOOKUPYVAR=

TYPE=

If your data does not use the default variable names for any of these options, then you must specify the options that do not use the default.

The following options are not required if you specify the NOCITY option:

ADDRESSCITYVAR=

ADDRESSSTATEVAR=

LOOKUPCITYVAR=

LOOKUPSTATEVAR=

The following options are not required if you specify the NOZIP option:

ADDRESSZIPVAR=

LOOKUPZIPVAR=

ZIP

specifies the ZIP code geocoding method. The GEOCODE procedure attempts to match the five-digit ZIP code from the address data set with the lookup data set. If no match is found, then the CITY method is used instead. If multiple CITY matches are found, then the coordinates of the matches are averaged.

Interaction: You can disable the secondary matching by using the NOCITY option.
Requirements: ZIP geocoding uses the following options:

ADDRESSCITYVAR=

ADDRESSSTATEVAR=

ADDRESSZIPVAR=

LOOKUPCITYVAR=

LOOKUPSTATEVAR=

LOOKUPZIPVAR=

LOOKUPXVAR=

LOOKUPYVAR=

If your data does not use the default variable names for any of these options, then you must specify the options that do not use the default.

The following options are not required if you specify the NOCITY option:

ADDRESSCITYVAR=

ADDRESSSTATEVAR=

LOOKUPCITYVAR=

LOOKUPSTATEVAR=

Default: ZIP
Interaction: If you specify more than one method, then the last method that you specify is used.
NOCITY

disables the secondary matching attempt by city and state if STREET or ZIP code geocoding does not find a match.

By default, if ZIP code geocoding does not find a match, or if STREET geocoding does not find a match for the street address or ZIP code, then the GEOCODE procedure attempts to match the city and state values and then averages the results.

Interaction: You cannot use the NOCITY option with the CITY geocoding method.
NOSTIMER

disables the informational messages sent to the SAS log that tracks the progress of the geocoding operation. If the input data set includes 1,000 or more observations, then the GEOCODE procedure writes periodic messages to the SAS log showing the percentage completed and estimated time remaining. This option disables those messages.

Note:   If you do not specify this option (because you want the status messages) and your input data set has 1,000 or more observations, and you are still not receiving periodic status messages, then check the setting of the LOGPARM system parameter. Set LOGPARM="WRITE=IMMEDIATE" to cause messages to be written immediately to the SAS log rather than buffered for later output.  [cautionend]

NOZIP

disables the secondary matching attempt by ZIP code when PLUS4 or STREET geocoding do not find a match. By default, if PLUS4 or STREET geocoding do not find a match, then the GEOCODE procedure attempts to match the five-digit ZIP code and average each matching ZIP code coordinate.

Note:   If your data set contains many missing ZIP+4 values, then the NOZIP option might improve performance.  [cautionend]

Interaction: You cannot use the NOZIP option with the ZIP geocoding method.
OUT= output-data-set

specifies a data set for the geocoded addresses. All of the variables in the input address data set are copied to the output data set. Also added to the output data set are the following:

  • X and Y variables for the location of the match

  • optional variables specified by the ATTRIBUTEVAR option

  • a variable named _MATCHED_ indicating how the match was made (by ZIP code, by city and state, by averaging coordinates, or no match)

If the output data set that you specify already exists, then it is replaced without warning. If the output data set is the same as the input data set, then the input data set is updated by the geocoding operation.

If you omit the OUT= option, then the name of the output data set is DATAn, where n is the smallest integer that produces a unique name. For example, if the DATA1 data set exists, then the default name for the output data set is DATA2.

RANGEDATA= data-set

specifies a data set that associates ranges of IP addresses with locations. The data set should contain variables that identify the starting IP number, ending IP number, and location ID for each range of IP addresses.

RANGEDECIMAL

specifies that the values of the ADDRESSVAR= variable are in decimal form. By default, the IP addresses in the ADDRESSVAR= variable are in dotted quad notation. For example, the IP address 192.168.0.1 is represented as 3232235521 in decimal form.

RANGEKEYVAR= variable

specifies the key variable for the lookup data set. The values of the key variable correspond to values in the variable that you specify for the LOOKUPKEYVAR= option. The data type of the key variable must match the variable that you specify for the LOOKUPKEYVAR= option.

TYPE= type-data-set

specifies a SAS data set that is used by the STREET geocoding method to standardize variations of common street address elements. For example, the type data set might standardize "parkway", "parkwy", and "pkwy" to a standard form "pkwy" to facilitate matching.

Default: The SASHELP.GCTYPE data set.

Previous Page | Next Page | Top of Page