A
BY key is similar to a BY group in SAS, which groups observations based on one or more fields.
Many SAS procedures process records in BY groups. Also, some updates in the DATA step
are performed by matching specified BY variables in different data sets. A similar
matching process occurs with BY-key items in the
SAS/ACCESS interface to SYSTEM 2000. Use the BY-key capability to eliminate redundancy and to
help the
interface view engine find an existing
path for inserting the new records.
Each time the interface view engine is called to insert an
observation, it inspects the changes that you made from observation to observation, in order
to determine how many data records to insert into the
database.
If none of the data changed, or if the changes were only at the lowest
level of the
view, the
engine needs to insert only a single new
data record at the lowest level. Because the engine inserts at least one record for any addition,
and only one record is called for here, there is no question about how many records
to insert, that is, the insert is not ambiguous.
However, if any data values changed in records above the lowest record in the path,
an ambiguous situation occurs. A specific number of new records seem to be required
by your changes, but some of the new data might already exist in the database records.
That is, the actual number of new records to be added to the database might
be different.
In insert mode, the engine can determine whether some of the new data already exists
in a record. If the data
exists, the engine needs to insert records only for the data that does not exist in
the database. If the data does not exist, the engine needs to insert a record at every
level.
In optimized
load mode, the engine ignores the ambiguity; it inserts all of the new data that is at or below
the highest-level record that changed. Therefore, when you specify optimized load
mode, make sure that
your incoming data is always sorted by major-to-minor sort keys at every level (from
level 0 down to the lowest level in the view). If the data is not sorted correctly,
redundancy occurs.
If you specify a BY key, it should contain one or more database items at each level
above the lowest level in your
view descriptor.
BY keys cause extra processing time because the engine issues one or more where-clauses
to look for already-existing records.