The GMAP Procedure

Concepts

Map data sets and response data sets are used in the GMAP procedure. These data sets must contain the required variables or the procedure stops and you get an error message. The GMAP procedure can take as input a map data set and a response data set, provided that both data sets contain the same ID variable. Alternatively, you can use a single data set as input if it contains either the map data or a variable that references a map data set.

About Map Data Sets

There are two types of data sets that are provided with SAS/GRAPH for mapping: traditional map data sets and feature tables. Much of the map data that is delivered with SAS/GRAPH is available in both the traditional map data set and feature table formats.

SAS/GRAPH software includes a number of predefined map data sets. These data sets are described in The METAMAPS Data Set.

About Traditional Data Sets

A traditional map data set is a SAS data set that contains coordinates that define the boundaries of map areas, such as states or counties.

Required Variables

A traditional map data set must contain at least these variables:

a numeric variable named X that contains the horizontal coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, X represents longitude.
a numeric variable named Y that contains the vertical coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, Y represents latitude.
one or more variables that uniquely identify the areas in the map. Map area identification variables can be either character or numeric and are indicated in the ID statement.

The X and Y variable values in the traditional map data set do not have to be in any specific units. They are rescaled by the GMAP procedure based on the minimum and maximum values in the data set. The minimum X and Y values are in the lower-left corner of the map, and the maximum X and Y values are in the upper-right corner.

Map data sets in which the X and Y variables contain longitude and latitude should be projected before you use them with PROC GMAP. See The GPROJECT Procedure for details.

Segment Variable

The traditional map data set can also contain an optional variable named SEGMENT to identify map areas that comprise noncontiguous polygons. Each unique value of the SEGMENT variable within a single map area defines a distinct polygon. If the SEGMENT variable is not present, each map area is drawn as a separate closed polygon that indicates a single segment.

The observations for each segment of a map area in the map data set must occur in the order in which the points are to be joined. The GMAP procedure forms map area outlines by connecting the boundary points of each segment in the order in which they appear in the data set, eventually joining the last point to the first point to complete the polygon. All the segments for each ID value must be contiguous within the map data set.

LONG and LAT Variables

In addition to the variables described in Required Variables, some of the SAS/GRAPH map data sets also contain the following variables:

a numeric variable named LONG containing the unprojected longitude (in radians or degrees) of the boundary points
a numeric variable named LAT containing the unprojected latitude (in radians or degrees) of the boundary points

The GMAP procedure uses the values of the X and Y variables to draw the map. To use the unprojected values to produce a custom map, rename LONG and LAT to X and Y, and then do the following tasks:

Rename the LONG and LAT variables to X and Y.
Project the coordinates by using the GPROJECT procedure.
Use the output data set from GPROJECT as your map data set.

Traditional Map Data Sets Containing X, Y, LONG, and LAT

Most of the traditional map data sets that are provided with SAS/GRAPH software contain four coordinate variables (X, Y, LONG, and LAT). In that case, X and Y are always projected values that are used by the SAS/GRAPH procedures (by default). If you need to use the unprojected values that are contained in the LONG and LAT variables, then do the following tasks:

Drop the existing X and Y variables.
Rename the LONG and LAT variables to X and Y.

The MAP= value in the GMAP procedure automatically uses X and Y. See Input Map Data Sets that Contain Both Projected and Unprojected Values for more details.

Traditional Map Data Sets Containing Only X and Y

The traditional map data sets that contain X and Y variables (and no LONG and LAT variables), are usually projected maps. However, there are a few traditional map data sets for the US and Canada that contain X and Y values that are unprojected longitude and latitude. In this case, you need to use the GPROJECT procedure to project the map (see The GPROJECT Procedure).

Note: You can determine whether a SAS traditional map data set is projected or unprojected by looking at the description of each variable that is displayed when you use the CONTENTS procedure or by browsing the MAPS.METAMAPS data set. [cautionend]

About Feature Tables

An alternative to using the traditional map data set is the feature table. While the traditional map data set stores the spatial information across multiple observations, the feature table uses a data arrangement to store a reference to the spatial information in a single variable value. The feature table's data arrangement uses the $GEOREF SAS/GRAPH format.

$GEOREF format

The $GEOREF format stores spatial information in binary data streams, making it possible to store as a single variable value all the information needed to draw a map area. Thus, the feature tables use only a single observation for each map area, and they treat a field of spatial information just like any other information that can be added to a data set. Each $GEOREF value contains the name of the map data set and the ID variable for that map. The traditional map data set associated with the feature table must be located in the SAS library with the feature table for GMAP to proceed correctly.

The names of the feature tables that are supplied by SAS usually end with the number 2. For example, the feature table for MAPS.AFRICA is MAPS.AFRICA2. You can also determine the feature table for your map data set by referring to the MAPS.METAMAPS data set.

To locate the variable that contains the spatial information, run PROC CONTENTS on a feature table. In the Output window, the variable containing the spatial information has $GEOREF as the value in the column labeled Format.

Note: Some feature tables, like MAPS.CANCENS, have more than one $GEOREF format variable. [cautionend]

Merging Feature Tables with Response Data Sets

To display response data with a feature table, the feature table must be merged with a response data set. The merged data set is then specified by the DATA= option in the PROC GMAP statement.

First, a PROC SORT must be used to sort the response and feature tables by a variable that is present within both the data sets. Once sorted, the data sets can then be merged with an SQL or DATA step MERGE with the BY variable being the variable used to sort the data sets. Once the data set is merged, the $GEOREF formatted variable from the feature table becomes the new data set's identification variable to be used in the GMAP procedure. See Creating a Map Using the Feature Table for more details.

The METAMAPS Data Set

In the MAPS library, there is a data set named METAMAPS, which contains metadata about all of the data sets that are delivered in the library. Among the metadata in MAPS.METAMAPS are the following four variables, which you can use to determine which feature table corresponds to a particular geometry table:

Variable	Description
MEMNAME	Identifies the names of all of the data sets that are delivered in the MAPS library.
MEMCODE	Indicates whether a data set represents a feature table (F) or a geometry table (G).
F_TABLE	Indicates the corresponding feature table for a geometry table. This variable is blank for rows that contain metadata about a feature table.
F_GEOCOL	Indicates the variable, in the feature table, whose values encapsulate the geometry object.

For example, consider the data sets MAPS.ASIA, MAPS.STATES, and MAPS.US. Each of these represents a geometry table, and to locate the corresponding feature tables, you would look in MAPS.METAMAPS to find the MEMNAME values ASIA, STATES, and US. Here are the relevant values on those rows:

Values from the METAMAPS Data Set
MEMNAME	MEMCODE	F_TABLE	F_GEOCOL
Asia	G	ASIA2	CONT95_GEO
STATES	G	US2	GEO_STATE
US	G	US2	_MAP_GEOMETRY_

From these values, you can see that the data sets that are named ASIA, STATES, and US all represent geometry tables because their MEMCODE values are G. The feature table corresponding to the ASIA data set is the data set ASIA2, which stores the spatial information in the variable CONT95_GEO. The feature tables corresponding to STATES and US are both in the data set US2. The spatial information corresponding to STATES is stored in the variable GEO_STATE, and the spatial information corresponding to US is stored in the variable _MAP_GEOMETRY_.

Special Data Sets for Annotating Maps

There are several data sets in the MAPS library that enable you to easily label maps. These data sets contain coordinates for map features such as cities, but cannot be used as map data sets.

MAPS.USCENTER: contains the coordinates of the visual center of each state in the U.S. and Washington, D.C., as well as coordinates in the ocean for states that are too small to contain a label. There are two pairs of variables for locating labels using Annotate data sets. The X and Y variables are projected and can be used with the MAPS.US and MAPS.USCOUNTY data sets. The LONG and LAT variables are unprojected longitude and latitude in radians and can be used with the MAPS.STATES, MAPS.COUNTIES, and MAPS.COUNTY data sets.
MAPS.USCITY: contains the locations of selected cities in the U.S. Many city names occur in more than one state, so you might have to subset by state to avoid duplication. There are two pairs of variables for locating labels using Annotate data sets. The X and Y variables contain projected coordinates and can be used with the MAPS.US and MAPS.USCOUNTY data sets. The LONG and LAT variables contain the unprojected longitude and latitude in radians. These can be used to place labels on the MAPS.STATES, MAPS.COUNTIES, or MAPS.COUNTY data sets.

For details on each of these data sets, see the MAPS.METAMAPS data set.

About Response Data Sets

A response data set is a SAS data set that contains the following variables:

one or more response variables that contain data values that are associated with map areas. Each value of the response variable is associated with a map area in the map data set.
identification variables that identify the map area to which a response value belongs. These variables must be the same as those that are contained in the map data set.

The response data set can contain other variables in addition to these required variables.

Using the Response Data Set with the Map Data Sets

The traditional map data set and the response data set must be used independently in the PROC GMAP statement, where the response data set is specified by the DATA= option and the traditional map data set is specified by the MAP= option. The values of the map area ID variables in the response data set determine the map areas to be included on the map. Unless the ALL option is used in the PROC GMAP statement, only the map areas with response values are shown on the map. As a result, you do not need to subset your map data set if you are mapping only a small section of the map. However, if you map the same small section frequently, then create a subset of the map data set for efficiency.

If you have a response data set named WORK.SITES, then the syntax for using GMAP might resemble the following:

proc gmap map=maps.us data=work.sites;
   id state;
   choro region/discrete;
run;
quit;

To use data from both a feature table and response data set, merge the two data sets by using a variable that is contained in both data sets. The new combined data set becomes the DATA= value in the PROC GMAP statement. When the response data set and the feature table are merged into one, do not use MAP=map-data-set in the PROC GMAP statement. The $GEOREF formatted variable is the ID variable for the combined data set. See Creating a Map Using the Feature Table for more details.

Note: Response data that does not correspond to a map feature is included in the legend. [cautionend]

About Response Variables

The GMAP procedure can produce block, choropleth, prism, and surface maps for both numeric and character response variables. Numeric variables fall into two categories: discrete and continuous.

Discrete variables contain a finite number of specific numeric values that are to be represented on the map. For example, a variable that contains only the values 1989 or 1990 is a discrete variable.
Continuous variables contain a range of numeric values that are to be represented on the map. For example, a variable that contains any real value between 0 and 100 is a continuous variable.

Numeric response variables are treated as continuous variables unless the DISCRETE option is used in the action statement.

About Response Levels

Response levels are the values that identify categories of data on the graph. The categories that are shown on the graph are based on the values of the response variable. Based on the type of the response variable, a response level can be determined by any of the following:

a character value
the MIDPOINTS= option
a range of numeric values
a specific numeric value

When response levels are determined by a character value, the GMAP procedure treats each unique value as a response level. For example, if the response variable contains the names of ten regions, each region is a response level, resulting in ten response levels.

When character response levels are determined by the MIDPOINTS= option, any response variable values that do not match one of the specified response level values are ignored.

When response levels are determined by a range of numeric values, each response level has a similar number of observations. These options are exceptions to this:

The LEVELS= option specifies the number of response levels to be used on the map.
The DISCRETE option causes the numeric variable to be treated as a discrete variable.
The MIDPOINTS= option chooses specific response level values as medians of the value ranges.

If the response variable values are continuous, then the GMAP procedure assigns response level intervals automatically unless you specify otherwise. The response levels represent a range of values rather than a single value.

When response levels are determined by specific numeric values, and the DISCRETE option is specified, one level is created for each value. If the response variable has an associated format, then each formatted value is represented by a different response level.

The AREA, BLOCK, CHORO, and PRISM statements assign patterns to response levels. In CHORO and PRISM maps, response levels are shown as map areas. However, in BLOCK maps, response levels are shown as blocks. If you specify the AREA statement on a BLOCK map, then the response levels for AREA variable are shown as map areas. The default fill pattern for the response level is solid.

PATTERN statements can define the fill patterns and colors for both blocks and map areas. PATTERN definitions that define valid block patterns are applied to the blocks (response levels), and PATTERN definitions that define valid map patterns are applied to map areas.

See PATTERN Statement for more information on fill pattern values and default pattern rotation.

About Identification Variables

For traditional map data sets and response data sets, id-variables identify the map areas (for example, counties, states, or provinces) that make up the map. A unit area or map area is a group of observations with the same ID value. The GMAP procedure matches the value of the response variables for each map area in the response data set to the corresponding map area in the traditional map data set in order to create the output graphs.

With feature tables, the geo-variable, or $GEOREF formatted variable containing the spatial information, is the identification variable. Each observation in a feature table has a unique $GEOREF formatted variable value. When merging the feature table with the response data set using an SQL or DATA step statement, the identification variable can be any variable that is contained within both data sets. Once the merged data set has been created, the geo-variable is used in the PROC GMAP ID statement for the merged feature table and response data set. See Creating a Map Using the Feature Table for more details.

Displaying Map Areas and Response Data

Whether the GMAP procedure draws a map area and whether it displays patterns for response values depends on the contents of the response data set and on the ALL and MISSING options. The following table describes the conditions under which the procedure does or does not display map areas and response data.

Displaying Map Areas and Response Data
If the response data set . . .	And if . . .	Then the procedure . . .
includes the map area	the map area has a response value	draws the map area and displays the response data
includes the map area	the response value for the map area is a missing value	draws the map area but leaves it empty
includes the map area	the response value for the map area is a missing value and the MISSING option is used in the map statement	draws the map area and displays a response level for the missing value
does not include the map area	the ALL option is used in the PROC GMAP statement	draws the map area but leaves it empty
does not include the map area	the ALL option is not used	does not draw the map area

Summary of Use

To use the GMAP procedure, you must do the following:

If using a traditional map data set, determine what processing needs to be done to the map data set before it is displayed. Use the GPROJECT, GREDUCE, and GREMOVE procedures or a DATA step to perform the necessary processing.
Issue a LIBNAME statement for the SAS data set that contains the response data set, or use a DATA step to create a response data set.
If using a traditional map data set, use the PROC GMAP statement to identify the map data set as the MAP= value and response data set as the DATA= value.
If using a feature table, use PROC SORT to individually sort the feature table and response data set by a variable common to both data sets. Next, use SQL or the DATA step MERGE to merge the feature table with the response data set by using a variable common to both data sets. Use the combined data set as the DATA= value in the PROC GMAP statement (do not include MAP= in the PROC GMAP statement).
Use the ID statement to specify the id-variables or, if you are using a feature table, specify the geo-variable.
Use a BLOCK, CHORO, PRISM, or SURFACE statement to identify the response variable and generate the map.

Accessing SAS Maps Online

Visit SAS Maps Online to download data updates, sample SAS/GRAPH programs that use the map data sets delivered with SAS/GRAPH, and GIF images of maps. SAS Maps Online is located at the following URL:

http://www.sas.com/mapsonline

After downloading and unzipping map data sets, you must take them out of transport format by running the CIMPORT procedure using your current version of SAS. For more information, see Transporting and Converting Graphics Output.

Importing Maps from ESRI Shapefiles

You can import ESRI shapefiles as traditional map data sets by using the MAPIMPORT procedure. Depending on the type of coordinates that are in your shapefile, you might want to perform additional processing. For example, you might want to project the map with the GPROJECT procedure, or use the GREDUCE procedure to create a DENSITY variable for reducing your data.

For more information, see The MAPIMPORT Procedure.

Top of Page