Working with Spatial Data

SAS Data Sets

A SAS data set is a collection of data values and their associated descriptive information that is arranged and presented in a form that can be recognized and processed by SAS. SAS data sets can be data files or views. A SAS data file contains the following elements:

data values that are organized into a rectangular structure of columns and rows
descriptor information that identifies attributes of both the data set and the data values

A SAS view contains the following elements:

instructions to build a table
descriptor information that identifies attributes of both the data set and the data values

SAS data sets can be indexed by one or more variables, known as key variables. A SAS index contains the data values of the key variables that are paired with location identifiers for the observations that contain the variables. The value and identifier pairs are ordered in a B-tree structure that enables the engine to search by value. SAS indexes are classified as simple or composite, according to the number of key variables they contain.

For more information about SAS data sets, SAS files, SAS views, and SAS indexes, refer to SAS Language Reference: Concepts.

SAS/GIS Data Sets

As a component of SAS, SAS/GIS stores all of its data in SAS data sets. The SAS/GIS spatial database works as one logical entity, but is physically separated into six different categories of data sets:

chains
nodes
details
polygonal index
label
attribute

A given SAS/GIS map can reference only one chains, nodes, details, and label data set, but it can reference multiple polygonal index and attribute data sets. Multiple SAS/GIS maps can use a single set of chains, nodes, and details data sets.

Chains Data Set

The chains data set contains coordinates for the polylines that are used to form line and polygon features. A polyline consists of a series of connected line segments that are chains. A chain is a sequence of two or more points in the coordinate space. The end points, the first and last points of the chain, must be nodes. Each chain has a direction, from the first point toward the last point. The first point in the chain is the from-node, and the last point is the to-node. Relative to its direction, a chain has a left side and a right side. Points between the from-node and the to-node are detail points, which serve to trace the curvature of the feature that is represented by the chain. Detail points are not nodes.

The chains data set also lists the from-node and to-node row numbers in the nodes data set, as well as the number of detail points and the corresponding details data set row number. The left and right side attribute values (for example, ZIP codes and FIPS codes) are also stored in the chains data set.

Nodes Data Set

The nodes data set contains the coordinates of the end points for the chains in the chains data set and the linkage information that is necessary to attach chains to the correct nodes. A node is a point in the spatial data with connections to one or more chains. Nodes can be discrete points or the end points of chains. A node definition can span multiple records in the nodes data set, so only the starting record number for a node is a node feature ID.

Details Data Set

The details data set stores the curvature points of a chain between the two end nodes, which are also called the from-node and the to-node. That is, the details data set contains all the coordinates between the intersection points of the chain. The node coordinates are not duplicated in the details data set. The details data set also contains the chains data set row number of the associated chain.

Polygonal Index Data Sets

The polygonal index data set contains one observation for each polygon that was successfully closed during the index creation process. It is called a polygonal index because each observation is an index to a polygon in the chains data set. That is, it points to the starting chain in the chains data set for each of the polygons.

If polygon areas, perimeter distances, and centroid locations were computed, then that information is also stored in the polygonal index data set.

Label Data Set

The label data set defines the attributes of labels to be displayed on the map. The attributes include all of the information that is applicable for each label, such as location, color, size, source of the text for a text label, as well as other behavioral and graphical attributes.

Attribute Data Sets

Attribute data sets contain values related to the map features. The observations in attribute data sets must be associated with observations in the chains data set. Attribute data is used to display themes on the map and for spatially oriented reports, graphs, map actions, and so forth.

Managing Data Set Sizes

By their nature, spatial databases tend to be rather large. Users of spatial data want as much detail in the maps as they can get, which increases the demands on storage and processing capacity. Spatial data that is not carefully managed can become too large for easy use.

Here are five actions that you can take to manage the size of your spatial data sets. You need to perform most of these actions before importing your data into SAS/GIS.

Reduce the spatial extent of the data.

Do not store a larger area than you need. If you need a map containing one state, do not store a map containing all the states for a region. For example, if you need to work with a map of Oregon, do not store a map containing all of the Pacific Northwest.
Store only the features that you need.

If you do not need features such as rivers and lakes, do not store these features in your spatial data.
Limit the amount of detail to what is necessary for your application.

If you are using a map for which you do not require highly detailed boundaries, reduce the detail level and save storage space. If you are using SAS/GRAPH data sets, you can use the GREDUCE procedure in SAS/GRAPH software to reduce the detail level. If you are using a data set from another source, you'll have to reduce the level of detail before importing the data set into SAS/GIS.
Reduce the number of attributes that are stored with the spatial data.

If you do not need an attribute, and do not think you will ever need it again, delete it from your spatial data.
Reduce the size of variables that are stored in the spatial data.

Examine the method that you use for storing your variables and determine whether you can safely reduce the variable size that you use to store them.

For example, if you have a numeric variable that contains a code that can be a maximum of two digits, perhaps it would be better to store it in a two-digit character variable rather than in an eight-byte numeric variable. Change the variables' defined types or lengths in a DATA step after you complete the import.

Of the five actions, reducing the number of attributes is the easiest to perform. Use the Import window, which you access by selecting Modify Composites from the GIS Spatial Data Importing window, to remove and drop unneeded composite variables from your data set as it is imported.

Import Type Specific Variables

The following tables describe the composites and variables that are created for each of the import types. All of the variables are located in the chains data set except for the X and Y variables, which are in the nodes data set.

Partial Listing of Composites and Variables Specific to the ArcInfo Interchange Import Type
Composite	Variable 1	Variable 2	Type (table note 1)	Description
ARCID	ARCIDL	ARCIDR	A or C	ARCID from the ArcInfo coverage. Maps made from line and point coverages do not have left and right variables.
ARCNUM			C	ARCNUM from the coverage.
'COVERAGE'	'COVERAGE'_L	'COVERAGE'_R	A or C	This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.e00 would have a 'COVERAGE' (table note 2) name of montana . The left variable would be montanal , the right variable would be montanar , and the composite type would be Area. Line and point coverages do not have left- and right-side variables, and the composite type would be Classification.
AREA	AREAL	AREAR	A	AREA from the coverage.
PERIMETER	PERIML	PERIMR	A	PERIMETER from the coverage.
'ATTRIB'	'ATTRIB'L	'ATTRIB'R		All variables in the polygon, line, or point attribute tables are saved as composite variables. In the case of the polygon coverages, an L or an R is added to the end of the first five characters of the actual variable name.
_COVER_	_COVEL	_COVER	A or C	This variable contains the name stored in the 'COVERAGE' variable.
_SRC_	_SRCL	_SRCR	C	Contains the string 'ARC'.
X	X		X	X coordinate.
Y	Y		Y	Y coordinate.