When you submit a DATA step
for execution, SAS checks the syntax of the SAS statements and compiles
them, that is, automatically translates the statements into machine
code. In this phase, SAS identifies the type and length of each new
variable, and determines whether a variable type conversion is necessary
for each subsequent reference to a variable. During the compilation
phase, SAS creates the following three items:
is a logical area in
memory into which SAS reads each record of raw data when SAS executes
an INPUT statement. Note that this buffer is created only when the
DATA step reads raw data. (When the DATA step reads a SAS data set,
SAS reads the data directly into the program data vector.)
program data vector (PDV)
is a logical area in
memory where SAS builds a data set, one observation at a time. When
a program executes, SAS reads data values from the input buffer or
creates them by executing SAS language statements. The data values
are assigned to the appropriate variables in the program data vector.
From here, SAS writes the values to a SAS data set as a single observation.
Along with data set
variables and computed variables, the PDV contains two automatic variables,
_N_ and _ERROR_. The _N_ variable counts the number of times the DATA
step begins to iterate. The _ERROR_ variable signals the occurrence
of an error caused by the data during execution. The value of _ERROR_
is either 0 (indicating no errors exist), or 1 (indicating that one
or more errors have occurred). SAS does not write these variables
to the output data set.
is information that
SAS creates and maintains about each SAS data set, including data
set attributes and variable attributes. For example, it contains the
name of the data set and its member type, the date and time that the
data set was created, and the number, names, and data types (character
or numeric) of the variables.