The GEOCODE Procedure |
PROC GEOCODE <option(s)>; |
option(s) can be one or more of the following:
DATA= address-data-set | |
ADDRESSCITYVAR= character-variable | |
ADDRESSPLUS4VAR= variable | |
ADDRESSSTATEVAR= character-variable | |
ADDRESSVAR= variable | |
ADDRESSZIPVAR= variable | |
ATTRIBUTEVAR= variable-list | |
BEGINRANGEVAR= numeric-variable | |
ENDRANGEVAR= numeric-variable | |
FIPS= FIPS-data-set | |
LOOKUP= lookup-data-set | |
LOOKUPCITYVAR= character-variable | |
LOOKUPKEYVAR= variable | |
LOOKUPPLUS4VAR= variable | |
LOOKUPSTATEVAR= character-variable | |
LOOKUPSTREET= street-matching-data-set | |
LOOKUPVAR= variable | |
LOOKUPXVAR= numeric-variable | |
LOOKUPYVAR= numeric-variable | |
LOOKUPZIPVAR= variable | |
METHOD= geocoding-method | |
NOCITY | |
NOZIP | |
NOSTIMER | |
OUT=output-data-set | |
RANGEDATA= data-set | |
RANGEDECIMAL | |
RANGEKEYVAR= variable | |
TYPE= street-type-data-set |
Options |
To facilitate converting existing SAS/GIS batch geocoding programs that use the %GCBATCH autocall macro to the GEOCODE procedure, the option name from the %GCBATCH autocall macro is an acceptable alias for most options. For more information, see the SAS/GIS: Spatial Data and Procedure Guide.
specifies the SAS data set that contains address observations that you want to geocode. If you do not specify this option, then the most recently created SAS data set is used.
Note: The character variables in your input address data set must be left-aligned. That is, the values must not contain leading spaces. You can use the LEFT function in a DATA step to align your data if necessary.
specifies the character variable in the input address data set that contains the city names.
Default: | CITY |
specifies the variable in the input address data set that contains ZIP+4 extensions. The variable can be either numeric or character, but it must be the same type as the ZIP+4 variable in the lookup data set (LOOKUPPLUS4VAR=).
Default: | PLUS4 |
specifies the character variable in the input address data set. This variable contains the two-character postal code for state (for example, NY).
Default: | STATE |
for STREET geocoding, specifies the variable in the address data set that contains the street address values (for example, "1229 North Main St.")
For CUSTOM and RANGE geocoding, the ADDRESSVAR= option specifies the variable in the address data set that contains non-address input values. The variable can be character or numeric. This is used together with the LOOKUPVAR= option to geocode with unconventional values. Examples include internal sales territories, Metropolitan Statistical Areas (MSA), and Internet Protocol (IP) addresses.
Default: | For STREET geocoding, the default name is ADDRESS. |
specifies the variable in the input address data set that contains the 5-digit ZIP code values. The variable can be either numeric or character, but it must be the same type as the ZIP code variable in the lookup data set (specified by the LOOKUPZIPVAR= option).
Note: The values for the ZIP code variable must be five digits. You can use the Z5. format to prepend leading zeros to any ZIP code values that have fewer than five digits.
Default: | ZIP |
lists non-geocoding variables in the lookup data set that are to be added to the output data set. Examples include county, census block, and time zone. Variable names can be separated by commas or spaces.
Note: The values for additional attribute variables are not added to observations in output data set where the match type is "City mean" or "ZIP mean".
Note: If an attribute variable has the same name as a variable in the address data set, then that variable is not added to the output data set.
Note: For the STREET geocoding method, only attribute variables from the street segment lookup data set can be included.
Example: | ATTRIBUTEVAR=(STATENAME, COUNTYNM) |
specifies the numeric variable in the your range data set that contains the beginning IP address for each range of addresses.
specifies the numeric variable in the your range data set that contains the ending IP address for each range of addresses.
specifies a SAS data set that is used STREET geocoding method to convert two-character postal codes and city names into US FIPS codes.
Note: The values of the city and state variables in the FIPS data set must be uppercase.
Default: | The SASHELP.PLFIPS data set. |
specifies a SAS data set that associates coordinates with addresses. The data set is searched for observations that match the address observations. The variables that are required for your lookup data set depend on your geocoding method. See Alternate ZIP Code and ZIP+4 Lookup Data Sets.
The data set can also include other attribute variables (such as COUNTY, TIME ZONE, AREA CODE) that can be added to the address observation by using the ATTRIBUTEVAR= option.
Note: The character variables in your lookup data set must be left-aligned. That is, the values must not contain leading spaces. You can use the LEFT function in a DATA step to align your data if necessary.
Default: | For the ZIP geocoding method, the SASHELP.ZIPCODE data set is the default. For other methods, you must specify the LOOKUP= option. |
specifies the character variable in the lookup data set that contain the city names.
Default: | CITY |
specifies the key variable for the lookup data set. The values of the key variable correspond to values in the variable that you specify for the RANGEKEYVAR= option. The data type of the key variable must match the variable that you specify for the RANGEKEYVAR= option.
specifies the variable in the lookup address data set that contains ZIP+4 extensions. The variable can be either numeric or character, but it must be the same type as the ZIP+4 variable in the input address data set (ADDRESSPLUS4VAR=).
Default: | PLUS4 |
specifies the character variable in the lookup data set that contains the two-character postal code for the state or province.
Default: | STATECODE |
specifies the street matching data set for associating coordinates with addresses when performing STREET geocoding.
The GEOCODE procedure expects the street matching data set to have a name that ends with M. The library must also contain two corresponding datasets whose names end with S (segment) and P (coordinate). For example, if you specify the street matching data set MYMAPS.STREETM, then the MYMAPS library must also contain the STREETS and STREETP data sets.
For more information about the data sets for STREET geocoding, see Data Sets for Street Geocoding.
Default: | The SASHELP.USM data set. You can download the USM, USS, and USP data sets for the entire United States from SAS Maps Online Web site at www.sas.com/mapsonline. |
specifies the variable in the lookup data set that contains non-address values. The variable can be character or numeric. This is used together with the ADDRESSVAR= option to geocode with unconventional values. Examples include internal sales territories, Metropolitan Statistical Areas (MSA), and Internet Protocol (IP) addresses.
specifies the numeric variable in the lookup data set that contains the longitude of the geocoding location.
Default: | X |
specifies the numeric variable in the lookup data set that contains the latitude of the geocoding location.
Default: | Y |
specifies the variable in the lookup data set that contains the five-digit ZIP code values. The variable can be either character or numeric, but it must be the same type as ZIP code variable in the input address data set (ADDRESSZIPVAR=).
Note: The values for a character ZIP code variable must be five digits. You can use the Z5. format to prepend leading zeros to any ZIP code values that have fewer than five digits.
Default: | ZIP |
specifies the geocoding method. This parameter is optional. Specify one of the following:
CITY |
specifies the CITY geocoding method. The GEOCODE procedure attempts to match the city and state from the address data set with the lookup data set. Separate city and state variables are required in the address and lookup data sets. If multiple matches are found, then the coordinates of the matches are averaged. Note: The city and state matching method is case insensitive.
| ||||||||||||||||||||||||||||||||||||||||
CUSTOM |
specifies the CUSTOM geocoding method. The GEOCODE procedure attempts to match custom variables that you specify by using the LOOKUPVAR= and ADDRESSVAR= variables. Examples include internal sales territories and Metropolitan Statistical Areas (MSA).
| ||||||||||||||||||||||||||||||||||||||||
PLUS4 |
specifies the PLUS4 geocoding method. The GEOCODE procedure attempts to match the five-digit ZIP code and ZIP+4 extension from the address data set with the lookup data set. If no match is found, then the ZIP method is used instead. If multiple ZIP matches are found, then the coordinates of the matches are averaged.
| ||||||||||||||||||||||||||||||||||||||||
RANGE |
specifies the RANGE geocoding method. The GEOCODE procedure attempts to match an Internet Protocol (IP) address from the address data set to a range of IP addresses from the range data set. If a match is found, then a key variable is used to match the IP address to a set of coordinates in the lookup data set. Note: This feature is for SAS 9.2 Phase 2 and later.
| ||||||||||||||||||||||||||||||||||||||||
STREET |
specifies the STREET geocoding method. The GEOCODE procedure attempts to match the street name and ZIP code. If no match is found, then the GEOCODE procedure attempts to match the street name, city name, and two-character postal code. If the second match fails, then the ZIP method and the CITY method are used instead. If a street match is found, X and Y coordinate values are interpolated by using the house number, street type suffix, directional prefix, and directional suffix from the input address. Note: This feature is for the third maintenance release of SAS 9.2 and later. For more information about the STREET geocoding method, see Street Geocoding.
| ||||||||||||||||||||||||||||||||||||||||
ZIP |
specifies the ZIP code geocoding method. The GEOCODE procedure attempts to match the five-digit ZIP code from the address data set with the lookup data set. If no match is found, then the CITY method is used instead. If multiple CITY matches are found, then the coordinates of the matches are averaged.
|
Default: | ZIP |
Interaction: | If you specify more than one method, then the last method that you specify is used. |
disables the secondary matching attempt by city and state if STREET or ZIP code geocoding does not find a match.
By default, if ZIP code geocoding does not find a match, or if STREET geocoding does not find a match for the street address or ZIP code, then the GEOCODE procedure attempts to match the city and state values and then averages the results.
Interaction: | You cannot use the NOCITY option with the CITY geocoding method. |
disables the informational messages sent to the SAS log that tracks the progress of the geocoding operation. If the input data set includes 1,000 or more observations, then the GEOCODE procedure writes periodic messages to the SAS log showing the percentage completed and estimated time remaining. This option disables those messages.
Note: If you do not specify this option (because you want the status messages) and your input data set has 1,000 or more observations, and you are still not receiving periodic status messages, then check the setting of the LOGPARM system parameter. Set LOGPARM="WRITE=IMMEDIATE" to cause messages to be written immediately to the SAS log rather than buffered for later output.
disables the secondary matching attempt by ZIP code when PLUS4 or STREET geocoding do not find a match. By default, if PLUS4 or STREET geocoding do not find a match, then the GEOCODE procedure attempts to match the five-digit ZIP code and average each matching ZIP code coordinate.
Note: If your data set contains many missing ZIP+4 values, then the NOZIP option might improve performance.
Interaction: | You cannot use the NOZIP option with the ZIP geocoding method. |
specifies a data set for the geocoded addresses. All of the variables in the input address data set are copied to the output data set. Also added to the output data set are the following:
X and Y variables for the location of the match
optional variables specified by the ATTRIBUTEVAR option
a variable named _MATCHED_ indicating how the match was made (by ZIP code, by city and state, by averaging coordinates, or no match)
If the output data set that you specify already exists, then it is replaced without warning. If the output data set is the same as the input data set, then the input data set is updated by the geocoding operation.
If you omit the OUT= option, then the name of the output data set is DATAn, where n is the smallest integer that produces a unique name. For example, if the DATA1 data set exists, then the default name for the output data set is DATA2.
specifies a data set that associates ranges of IP addresses with locations. The data set should contain variables that identify the starting IP number, ending IP number, and location ID for each range of IP addresses.
specifies that the values of the ADDRESSVAR= variable are in decimal form. By default, the IP addresses in the ADDRESSVAR= variable are in dotted quad notation. For example, the IP address 192.168.0.1 is represented as 3232235521 in decimal form.
specifies the key variable for the lookup data set. The values of the key variable correspond to values in the variable that you specify for the LOOKUPKEYVAR= option. The data type of the key variable must match the variable that you specify for the LOOKUPKEYVAR= option.
specifies a SAS data set that is used by the STREET geocoding method to standardize variations of common street address elements. For example, the type data set might standardize "parkway", "parkwy", and "pkwy" to a standard form "pkwy" to facilitate matching.
Default: | The SASHELP.GCTYPE data set. |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.