The index
file is a SAS file that has the same name as its associated data file,
and that has a member type of INDEX. There is only one index file
per data file. That is, all indexes for a data file are stored in
a single file.
The index file might
be a separate file, or be part of the data file, depending on the
operating environment. In any case, the index file is stored in the
same SAS library as its data file.
The index file consists
of entries that are organized hierarchically and connected by pointers,
all of which are maintained by SAS. The lowest level in the index
file hierarchy consists of entries that represent each distinct value
for an indexed variable, in ascending value order. Each entry contains
this information:
-
-
one or more unique record identifiers
(referred to as a RID) that identifies each observation containing
the value. (Think of the RID as an internal observation number.)
That is, in an index
file, each value is followed by one or more RIDs, which identify the
observations in the data file that contains the value. (Multiple RIDs
result from multiple occurrences of the same value.) For example,
the following represents index file entries for the variable LASTNAME:
When an index is used
to process a request, such as a WHERE expression, SAS performs a binary
search on the index file and positions the index to the first entry
that contains a qualified value. SAS then uses the value's RID to
read the observation that contains the value. If a value has more
than one RID (such as in the value for Brown in the previous example),
SAS reads the observation that is pointed to by the next RID in the
list. The result is that SAS can quickly locate the observations that
are associated with a value or range of values.
For example, using an
index to process the WHERE expression, SAS positions the index to
the index entry for the first value greater than 20 and uses the value's
RID or RIDs to read the observation or observations
where
age > 20 and age < 35;
. SAS then moves sequentially
through the index entries reading observations until it reaches the
index entry for the value that is equal to or greater than 35.
SAS automatically keeps
the index file balanced as updates are made, which means that it ensures
a uniform cost to access any index entry, and all space that is occupied
by deleted values is recovered and reused.