The GREMOVE Procedure

Concepts

The GREMOVE procedure processes the input map data set to remove internal boundaries and creates a new map data set. The PROC GREMOVE statement identifies the input and output map data sets. The ID statement identifies the variable or variables in the input map data set that define the current unit areas. The BY statement identifies the variable or variables in the input map data set that define the new unit areas.

About the Input Map Data Set

The input map data set must be in traditional map data set format (see About Map Data Sets) and it must contain these variables:

a numeric variable named X that contains the horizontal coordinates of the map boundary points.
a numeric variable named Y that contains the vertical coordinates of the map boundary points.
one or more variables that uniquely identify the current unit areas in the map. These variables are listed in the ID statement. Each group of observations with a different ID variable value is evaluated as a separate unit area.
one or more variables that identify the new unit areas to be created in the output map data set. These variables are listed in the BY statement.

It might also contain the variable SEGMENT, which is used to distinguish non-conterminous segments of the same unit areas. Other variables might exist in the input map data set, but they do not affect the GREMOVE procedure and they will not be carried into the output map data set.

About the Output Map Data Set

The output map data set contains the newly defined unit areas. These new unit areas are created by removing all interior line segments from the original unit areas. All variables in the input map data set except X, Y, SEGMENT, and the variables listed in the BY statement are omitted from the output map data set.

The output map data set might contain missing X, Y coordinates to construct any polygons that have enclosed boundaries (like lakes or combined regions that have one or more hollow interior regions).

The SEGMENT variable in the output map data set is ordered according to the size of the bounding box around the polygon that it describes. A SEGMENT value of 1 describes the polygon whose bounding box is the largest in the unit area, and each additional SEGMENT value describes a smaller polygon. This information is useful for removing small polygons that clutter up maps.

All current unit areas with common BY-variable value(s) are combined into a single unit area in the output map data set. The new unit area contains

all boundaries that are not shared, such as islands and lakes
all boundaries that are shared by two different BY groups.

About Unmatched Area Boundaries

If you are using map data sets in which area boundaries do not match precisely (for example, if the boundaries were digitized with a different set of points), PROC GREMOVE will not be able to identify common boundaries properly, resulting in abnormalities in your output data set.

If the points in the area boundaries match up except for precision differences, before using PROC GREMOVE round each X and Y value in your map data set accordingly, using the DATA step function ROUND. See SAS Language Reference: Dictionary for information on the ROUND function.

For example, if you have a map data set named APPROX in which the horizontal and vertical coordinate values for interior boundaries of unit areas are exactly equal only to three decimal places, this DATA step creates a new map data set, EXACT, that is better suited for use with the GREMOVE procedure:

data exact;
   set approx;
   if x ne . then x=round(x,.001);
   if y ne . then y=round(y,.001);
run;

You can also use the FUZZ option to specify a level of tolerance so that the boundaries do not need to match precisely.

Top of Page