Working with Spatial Data |
A SAS data set is a collection of data values and their associated descriptive information that is arranged and presented in a form that can be recognized and processed by SAS. SAS data sets can be data files or views. A SAS data file contains the following elements:
data values that are organized into a rectangular structure of columns and rows
descriptor information that identifies attributes of both the data set and the data values
A SAS view contains the following elements:
SAS data sets can be indexed by one or more variables, known as key variables. A SAS index contains the data values of the key variables that are paired with location identifiers for the observations that contain the variables. The value and identifier pairs are ordered in a B-tree structure that enables the engine to search by value. SAS indexes are classified as simple or composite, according to the number of key variables they contain.
For more information about SAS data sets, SAS files, SAS views, and SAS indexes, refer to SAS Language Reference: Concepts.
SAS/GIS Data Sets |
As a component of SAS, SAS/GIS stores all of its data in SAS data sets. The SAS/GIS spatial database works as one logical entity, but is physically separated into six different categories of data sets:
chains
nodes
details
polygonal index
label
attribute
A given SAS/GIS map can reference only one chains, nodes, details, and label data set, but it can reference multiple polygonal index and attribute data sets. Multiple SAS/GIS maps can use a single set of chains, nodes, and details data sets.
The chains data set contains coordinates for the polylines that are used to form line and polygon features. A polyline consists of a series of connected line segments that are chains. A chain is a sequence of two or more points in the coordinate space. The end points, the first and last points of the chain, must be nodes. Each chain has a direction, from the first point toward the last point. The first point in the chain is the from-node, and the last point is the to-node. Relative to its direction, a chain has a left side and a right side. Points between the from-node and the to-node are detail points, which serve to trace the curvature of the feature that is represented by the chain. Detail points are not nodes.
The chains data set also lists the from-node and to-node row numbers in the nodes data set, as well as the number of detail points and the corresponding details data set row number. The left and right side attribute values (for example, ZIP codes and FIPS codes) are also stored in the chains data set.
The nodes data set contains the coordinates of the end points for the chains in the chains data set and the linkage information that is necessary to attach chains to the correct nodes. A node is a point in the spatial data with connections to one or more chains. Nodes can be discrete points or the end points of chains. A node definition can span multiple records in the nodes data set, so only the starting record number for a node is a node feature ID.
The details data set stores the curvature points of a chain between the two end nodes, which are also called the from-node and the to-node. That is, the details data set contains all the coordinates between the intersection points of the chain. The node coordinates are not duplicated in the details data set. The details data set also contains the chains data set row number of the associated chain.
The polygonal index data set contains one observation for each polygon that was successfully closed during the index creation process. It is called a polygonal index because each observation is an index to a polygon in the chains data set. That is, it points to the starting chain in the chains data set for each of the polygons.
If polygon areas, perimeter distances, and centroid locations were computed, then that information is also stored in the polygonal index data set.
The label data set defines the attributes of labels to be displayed on the map. The attributes include all of the information that is applicable for each label, such as location, color, size, source of the text for a text label, as well as other behavioral and graphical attributes.
Attribute data sets contain values related to the map features. The observations in attribute data sets must be associated with observations in the chains data set. Attribute data is used to display themes on the map and for spatially oriented reports, graphs, map actions, and so forth.
Managing Data Set Sizes |
By their nature, spatial databases tend to be rather large. Users of spatial data want as much detail in the maps as they can get, which increases the demands on storage and processing capacity. Spatial data that is not carefully managed can become too large for easy use.
Here are five actions that you can take to manage the size of your spatial data sets. You need to perform most of these actions before importing your data into SAS/GIS.
Reduce the spatial extent of the data.
Do not store a larger area than you need. If you need a map containing one state, do not store a map containing all the states for a region. For example, if you need to work with a map of Oregon, do not store a map containing all of the Pacific Northwest.
Store only the features that you need.
If you do not need features such as rivers and lakes, do not store these features in your spatial data.
Limit the amount of detail to what is necessary for your application.
If you are using a map for which you do not require highly detailed boundaries, reduce the detail level and save storage space. If you are using SAS/GRAPH data sets, you can use the GREDUCE procedure in SAS/GRAPH software to reduce the detail level. If you are using a data set from another source, you'll have to reduce the level of detail before importing the data set into SAS/GIS.
Reduce the number of attributes that are stored with the spatial data.
If you do not need an attribute, and do not think you will ever need it again, delete it from your spatial data.
Reduce the size of variables that are stored in the spatial data.
Examine the method that you use for storing your variables and determine whether you can safely reduce the variable size that you use to store them.
For example, if you have a numeric variable that contains a code that can be a maximum of two digits, perhaps it would be better to store it in a two-digit character variable rather than in an eight-byte numeric variable. Change the variables' defined types or lengths in a DATA step after you complete the import.
Of the five actions, reducing the number of attributes is the easiest to perform. Use the Import window, which you access by selecting Modify Composites from the GIS Spatial Data Importing window, to remove and drop unneeded composite variables from your data set as it is imported.
Import Type Specific Variables |
The following tables describe the composites and variables that are created for each of the import types. All of the variables are located in the chains data set except for the X and Y variables, which are in the nodes data set.
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
ARCID | ARCIDL | ARCIDR | A or C | ARCID from the ArcInfo coverage. Maps made from line and point coverages do not have left and right variables. |
ARCNUM |
|
|
C | ARCNUM from the coverage. |
'COVERAGE' | 'COVERAGE'_L | 'COVERAGE'_R | A or C | This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.e00 would have a 'COVERAGE' (table note 2) name of montana . The left variable would be montanal , the right variable would be montanar , and the composite type would be Area. Line and point coverages do not have left- and right-side variables, and the composite type would be Classification. |
AREA | AREAL | AREAR | A | AREA from the coverage. |
PERIMETER | PERIML | PERIMR | A | PERIMETER from the coverage. |
'ATTRIB' | 'ATTRIB'L | 'ATTRIB'R |
|
All variables in the polygon, line, or point attribute tables are saved as composite variables. In the case of the polygon coverages, an L or an R is added to the end of the first five characters of the actual variable name. |
_COVER_ | _COVEL | _COVER | A or C | This variable contains the name stored in the 'COVERAGE' variable. |
_SRC_ | _SRCL | _SRCR | C | Contains the string 'ARC'. |
X | X |
|
X | X coordinate. |
Y | Y |
|
Y | Y coordinate. |
TABLE NOTE 1: Values for Type are as follows:
A
Area
C
Classification
x
X coordinate
Y
Y coordinate
TABLE NOTE 2: Names in single quotation marks, such as 'COVERAGE' and 'ATTRIB,' are GIS composite names.
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
LMAJOR(n) | LMAJOR(n) |
|
C | Major line attribute code. |
LMINOR(n) | LMINOR(n) |
|
C | Minor line attribute code. |
NMAJOR(n) | NMAJOR(n) |
|
C | Major node attribute code. |
NMINOR(n) | NMINOR(n) |
|
C | Minor node attribute code. |
MAJOR(n) | AMAJORR(n) | AMAJORL(n) | A | Major area attribute code. |
MINOR (n) | AMINORL(n) | AMINORR(n) | A | Minor area attribute code. |
X | X |
|
X | X coordinate. |
Y | Y |
|
Y | Y coordinate. |
TABLE NOTE 1:
Values for Type are as
follows:
A
Area
C
Classification
x
X coordinate
Y
Y coordinate
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
'ATTRIB' | 'ATTRIB'L | 'ATTRIB'R | A or C | All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name. |
TABLE NOTE 1:
Values for Type are as
follows:
A
Area
C
Classification
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
ID | ID |
|
C | The ID variable from the data set. |
'ATTRIB' | 'ATTRIB' | 'ATTRIB' | C | Any other variable in the data set is saved as a classification composite. |
X | X |
|
X | X coordinate. |
Y | Y |
|
Y | Y coordinate. |
TABLE NOTE 1:
Values for Type are as
follows:
C
Classification
x
X coordinate
Y
Y coordinate
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
ID | ID |
|
C | The ID variable from the data set. |
'ATTRIB' | 'ATTRIB' | 'ATTRIB' | C | Any other variable in the data set is saved as a classification composite. |
X | X |
|
X | X coordinate. |
Y | Y |
|
Y | Y coordinate. |
TABLE NOTE 1:
Values for Type are as
follows:
C
Classification
x
X coordinate
Y
Y coordinate
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
'ATTRIB' | 'ATTRIB'L | 'ATTRIB'R | A or C | All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name. |
LINELYR |
|
|
C | This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a LINELYR name of montana. |
PTLYR |
|
|
C | This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a PTLYR name of montana . |
POLYLYR |
|
|
A | This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a POLYLYR name of montana . |
'MAP' | 'MAP'L | 'MAP'R | A or C | This variable is derived from the input filename. It is the last word preceding the file extension. For example, /local/gisdata/usa.mif , would have a 'MAP' name of usa . The left variable would be usal , the right variable would be usar and, in this case, the composite type would be Area. Line and point maps do not have left- and right-side variables, and the composite would be Classification. |
TABLE NOTE 1:
Values for Type are as
follows:
A
Area
C
Classification
Composite | Variable 1 | Variable 2 | Type (table note 1) | Description |
---|---|---|---|---|
'IDVAR'n | 'IDVAR'L | 'IDVAR'R | A | An area composite variable is created for each ID variable (IDVAR) selected by the user in the ID vars list box. In the case of polygon maps, an L or R is added to the end of the first seven characters of the actual variable name. |
TABLE NOTE 1:
Values for Type are as
follows:
A
Area
Composite | Variable 1 | Variable 2 | Variable 3 | Variable 4 | Type (table note 1) | Description |
---|---|---|---|---|---|---|
ADDR | FRADDL | FRADDR | TOADDL | TOADDR | ADDR | Address range. |
BLOCK | BLOCKL | BLOCKR |
|
|
A | Block number. |
CFCC | CFCC |
|
|
|
C | Feature classification code. |
COUNTY | COUNTYL | COUNTYR |
|
|
A | County FIPS code. |
DIRPRE | DIRPRE |
|
|
|
ADDRP | Feature direction prefix. |
DIRSUF | DIRSUF |
|
|
|
ADDRS | Feature direction suffix. |
FEANAME | FEANAME |
|
|
|
C | Feature name. |
MCD | MCDL | MCDR |
|
|
A | Minor civil division. |
PLACE | PLACEL | PLACER |
|
|
A | Incorporated place code. |
RECTYPE | RECTYPE |
|
|
|
C | Record type. |
STATE | STATEL | STATER |
|
|
A | State FIPS code. |
TRACT | TRACTL | TRACTR |
|
|
A | Census tract. |
ZIP | ZIPL | ZIPR |
|
|
A | ZIP code. |
BG | BGL | BGR |
|
|
A | Block group. |
LONGITUDE | X |
|
|
|
X | Longitude. |
LATITUDE | Y |
|
|
|
Y | Latitude. |
TABLE NOTE 1:
Values for Type are as
follows:
A
Area
ADDR
Address
ADDRP
Address Prefix
ADDRS
Address Suffix
C
Classification
x
Longitude
Y
Latitude
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.