SAS/GIS Data Sets

SAS Data Sets

A SAS data set is a collection of data values and their associated descriptive information. This collection is arranged and presented in a form that can be recognized and processed by SAS. SAS data sets can be data files or views. A SAS data file contains the following elements:
  • data values that are organized into a rectangular structure of columns and rows
  • descriptor information that identifies attributes of both the data set and the data values
A SAS view contains the following elements:
  • instructions to build a table
  • descriptor information that identifies attributes of both the data set and the data values
SAS data sets can be indexed by one or more variables, known as key variables. A SAS index contains the data values of the key variables that are paired with location identifiers for the observations that contain the variables. The value and identifier pairs are ordered in a B-tree structure that enables the engine to search by value. SAS indexes are classified as simple or composite, according to the number of key variables that they contain.
For more information about SAS data sets, SAS files, SAS views, and SAS indexes, refer to SAS Language Reference: Concepts.
As a component of SAS, SAS/GIS stores all of its data in SAS data sets. The SAS/GIS spatial database works as one logical entity, but is physically separated into six different categories of data sets:
  • chains
  • nodes
  • details
  • polygonal index
  • label
  • attribute
A given SAS/GIS map can reference only one chains, nodes, details, and label data set, but it can reference multiple polygonal index and attribute data sets. Multiple SAS/GIS maps can use a single set of chains, nodes, and details data sets.

Chains Data Set

The chains data set contains coordinates for the polylines that are used to form line and polygon features. A polyline consists of a series of connected line segments that are chains. A chain is a sequence of two or more points in the coordinate space. The end points, the first and last points of the chain, must be nodes. Each chain has a direction, from the first point toward the last point. The first point in the chain is the from-node, and the last point is the o-node. Relative to its direction, a chain has a left side and a right side. Points between the from-node and the to-node are detail points, which serve to trace the curvature of the feature that is represented by the chain. Detail points are not nodes.
The chains data set also lists the from-node and to-node row numbers in the nodes data set. It also lists the number of detail points and the corresponding details data set row number. The left and right side attribute values (for example, ZIP codes and FIPS codes) are also stored in the chains data set.

Nodes Data Set

The nodes data set contains the coordinates of the end points for the chains in the chains data set. It also contains the linkage information that is necessary to attach chains to the correct nodes. A node is a point in the spatial data with connections to one or more chains. Nodes can be discrete points or the end points of chains. A node definition can span multiple records in the nodes data set, so only the starting record number for a node is a node feature ID.

Details Data Set

The details data set stores the curvature points of a chain between the two end nodes, which are also called the from-node and the to-node. That is, the details data set contains all the coordinates between the intersection points of the chain. The node coordinates are not duplicated in the details data set. The details data set also contains the chains data set row number of the associated chain.

Polygonal Index Data Sets

The polygonal index data set contains one observation for each polygon that was successfully closed during the index creation process. It is called a polygonal index because each observation is an index to a polygon in the chains data set. That is, it points to the starting chain in the chains data set for each of the polygons.
If polygon areas, perimeter distances, and centroid locations were computed, then that information is also stored in the polygonal index data set.

Label Data Set

The label data set defines the attributes of labels to be displayed on the map. The attributes include all of the information that is applicable for each label. Information includes location, color, size, source of the text for a text label, as well as other behavioral and graphical attributes.

Attribute Data Sets

Attribute data sets contain values related to the map features. The observations in attribute data sets must be associated with observations in the chains data set. Attribute data is used to display themes on the map and for spatially oriented reports, graphs, map actions, and so on.

Managing Data Set Sizes

By their nature, spatial databases tend to be rather large. Users of spatial data want as much detail in the maps as they can get, which increases the demands on storage and processing capacity. Spatial data that is not carefully managed can become too large for easy use.
Here are five actions that you can take to manage the size of your spatial data sets. You need to perform most of these actions before importing your data into SAS/GIS.
  • Reduce the spatial extent of the data.
    Do not store a larger area than you need. If you need a map containing one state, do not store a map containing all the states for a region. For example, if you need to work with a map of Oregon, do not store a map containing all of the Pacific Northwest.
  • Store only the features that you need.
    If you do not need features such as rivers and lakes, do not store these features in your spatial data.
  • Limit the amount of detail to what is necessary for your application.
    If you are using a map for which you do not require highly detailed boundaries, reduce the detail level and save storage space. If you are using SAS/GRAPH data sets, you can use the GREDUCE procedure in SAS/GRAPH software to reduce the detail level. If you are using a data set from another source, you must reduce the level of detail before importing the data set into SAS/GIS.
  • Reduce the number of attributes that are stored with the spatial data.
    If you do not need an attribute, and do not think you will ever need it again, delete it from your spatial data.
  • Reduce the size of variables that are stored in the spatial data.
    Examine the method that you use for storing your variables and determine whether you can safely reduce the variable size that you use to store them.
    For example, you can have a numeric variable that contains a code that can be a maximum of two digits. In this case, it might be better to store it in a two-digit character variable rather than in an eight-byte numeric variable. Change the variables' defined types or lengths in a DATA step after you complete the import.
Of the five actions, reducing the number of attributes is the easiest to perform. Use the Modify Composites window. Access this window by selecting Modify Composites from the GIS Spatial Data Importing window. You can remove and drop unneeded composite variables from your data set as it is imported.

Import Type Specific Variables

The following tables describe the composites and variables that are created for each of the import types. All of the variables are located in the chains data set except for the X and Y variables, which are in the nodes data set.
In the following table, the values in the Type column represent the following data types:
A Area
C Classification
X X coordinate
Y Y coordinate
Composites and Variables for SAS/GIS Spatial Data Imported from ArcInfo Interchange Data
Composite
Variable 1
Variable 2
Type
Description
ARCID
ARCIDL
ARCIDR
A or C
ARCID from the ArcInfo coverage. Maps made from line and point coverages do not contain left and right variables.
ARCNUM
C
ARCNUM from the coverage.
'COVERAGE'1
'COVERAGE'L
'COVERAGE'R
A or C
This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.e00 would have a 'COVERAGE' name of montana. The left variable would be montanal, the right variable would be montanar, and the composite type would be Area. Line and point coverages do not have left- and right-side variables, and the composite type would be Classification.
AREA
AREAL
AREAR
A
AREA from the coverage.
PERIMETER
PERIML
PERIMR
A
PERIMETER from the coverage.
'ATTRIB'
'ATTRIB'L
'ATTRIB'R
All variables in the polygon, line, or point attribute tables are saved as composite variables. In the case of the polygon coverages, an L or an R is added to the end of the first five characters of the actual variable name.
_COVER_
_COVEL
_COVER
A or C
This variable contains the name stored in the 'COVERAGE' variable.
_SRC_
_SRCL
_SRCR
C
Contains the string 'ARC'.
X
X
X
X coordinate.
Y
Y
Y
Y coordinate.
1Names in single quotation marks, such as 'COVERAGE' and 'ATTRIB,' are GIS composite names.
In the following table, the values in the Type column represent the following data types:
A Area
C Classification
X X coordinate
Y Y coordinate
Composites and Variables for SAS/GIS Spatial Data Imported from Digital Line Graph (DLG) Data
Composite
Variable 1
Variable 2
Type
Description
LMAJOR(n)
LMAJOR(n)
C
Major line attribute code.
LMINOR(n)
LMINOR(n)
C
Minor line attribute code.
NMAJOR(n)
NMAJOR(n)
C
Major node attribute code.
NMINOR(n)
NMINOR(n)
C
Minor node attribute code.
MAJOR(n)
AMAJORR(n)
AMAJORL(n)
A
Major area attribute code.
MINOR (n)
AMINORL(n)
AMINORR(n)
A
Minor area attribute code.
X
X
X
X coordinate.
Y
Y
Y
Y coordinate
In the following table, the values in the Type column represent the following data types:
A Area
C Classification
Composites and Variables for SAS/GIS Spatial Data Imported from Drawing Interchange File (DXF) Data
Composite
Variable 1
Variable 2
Type
Description
'ATTRIB'
'ATTRIB'L
'ATTRIB'R
A or C
All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name.
In the following table, the values in the Type column represent the following data types:
C Classification
X X coordinate
Y Y coordinate
Partial Listing of Composites and Variables Specific to the Genline Import Type
Composite
Variable 1
Variable 2
Type
Description
ID
ID
C
The ID variable from the data set.
'ATTRIB'
'ATTRIB'
'ATTRIB'
C
Any other variable in the data set is saved as a classification composite.
X
X
X
X coordinate.
Y
Y
Y
Y coordinate.
In the following table, the values in the Type column represent the following data types:
C Classification
X X coordinate
Y Y coordinate
Partial Listing of Composites and Variables Specific to the Genpoint Import Type
Composite
Variable 1
Variable 2
Type
Description
ID
ID
C
The ID variable from the data set.
'ATTRIB'
'ATTRIB'
'ATTRIB'
C
Any other variable in the data set is saved as a classification composite.
X
X
X
X coordinate.
Y
Y
Y
Y coordinate.
In the following table, the values in the Type column represent the following data types:
A Area
C Classification
Partial Listing of Composites and Variables Specific to the MapInfo Import Type
Composite
Variable 1
Variable 2
Type
Description
'ATTRIB'
'ATTRIB'L
'ATTRIB'R
A or C
All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name.
LINELYR
C
This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a LINELYR name of montana.
PTLYR
C
This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a PTLYR name of montana.
POLYLYR
A
This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a POLYLYR name of montana.
'MAP'
'MAP'L
'MAP'R
A or C
This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/usa.mif, would have a 'MAP' name of usa. The left variable would be usal, the right variable would be usar and, in this case, the composite type would be Area. Line and point maps do not have left- and right-side variables, and the composite would be Classification.
In the following table, the value in the Type column represents the following data type:
A Area
Partial Listing of Composites and Variables Specific to the SAS/GRAPH and Genpoly Import Types
Composite
Variable 1
Variable 2
Type
Description
'IDVAR'n
'IDVAR'L
'IDVAR'R
A
An area composite variable is created for each ID variable (IDVAR) selected by the user in the ID vars list box. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name.
In the following table, the values in the Type column represent the following data types:
A Area
ADDR Address
ADDRP Address Prefix
ADDRS Address Suffix
C Classification
X Longitude
Y Latitude
Composites and Variables Specific to the TIGER and DYNAMAP Import Types
Composite
Variable 1
Variable 2
Variable 3
Variable 4
Type
Description
ADDR
FRADDL
FRADDR
TOADDL
TOADDR
ADDR
Address range.
BLOCK
BLOCKL
BLOCKR
A
Block number.
CFCC
CFCC
C
Feature classification code.
COUNTY
COUNTYL
COUNTYR
A
County FIPS code.
DIRPRE
DIRPRE
ADDRP
Feature direction prefix.
DIRSUF
DIRSUF
ADDRS
Feature direction suffix.
FEANAME
FEANAME
C
Feature name.
MCD
MCDL
MCDR
A
Minor civil division.
PLACE
PLACEL
PLACER
A
Incorporated place code.
RECTYPE
RECTYPE
C
Record type.
STATE
STATEL
STATER
A
State FIPS code.
TRACT
TRACTL
TRACTR
A
Census tract.
ZIP
ZIPL
ZIPR
A
ZIP code.
BG
BGL
BGR
A
Block group.
LONGITUDE
X
X
Longitude.
LATITUDE
Y
Y
Latitude.