The computer time and
memory required for an analysis depend on the number of cases, the
number of variables, the complexity of the model, and the training
algorithm. For many modeling methods, there is a trade-off between
time and memory.
For all modeling nodes,
memory is required for the operating system, SAS supervisor, and the
Enterprise Miner diagram and programs, resulting in an overhead of
about 20 to 30 megabytes.
The following notation
will be used:
the number of input
variables.
the number of input
terms or units, including dummy variables, intercepts, interactions,
and polynomials.
the number of weights
in a neural network.
the number of output
units.
the average depth of
a tree.
the number of times
the training data are read in logistic regression or neural nets,
which depends on the training technique, the termination criteria,
the model, and the data. R is typically much larger for neural nets
than for logistic regression. In regard to training techniques, R
is usually smallest for Newton-Raphson or Levenberg-Marquardt, larger
for quasi-Newton, and still larger for conjugate gradients.
the number of steps
in a stepwise regression, or 1 if stepwise regression is not used.
For the Decision Tree
node, the minimum additional memory required for an analysis is about
8N bytes. Training will be considerably faster if there is enough
RAM to hold the entire data set, which is about 8N(V+1) bytes. If
the data will not fit in memory, they must be stored in a utility
file. Memory is also required to hold summary statistics for a node,
such as means or a contingency table, but this amount is usually much
smaller than the amount required for the data.
For the Regression node,
the memory required depends on the type of model and on the training
technique. For linear regression, memory usage is dominated by the
SSCP matrix, which requires 8I
2 bytes.
For logistic regression, memory usage depends on the training technique
as documented in the
SAS/OR Technical Report: The NLP Procedure, ranging from about 40I bytes for the conjugate gradient technique
to about 8I
2 bytes for the Newton-Raphson
technique.
For the Neural Network
node, memory usage depends on the training technique as documented
in the
SAS/OR Technical Report: The NLP Procedure. About 40W bytes are needed for the conjugate gradient technique,
while 4W
2 bytes are needed for the quasi-Newton
and Levenberg-Marquardt techniques. For a network with biases and
H hidden units in one layer, W = (I+1)H + (H+1)O.
For both logistic regression
and neural networks, the conjugate gradient technique, which requires
the least memory, must usually read the training data many more times
than the Newton-Raphson and Levenberg-Marquardt techniques.
Assuming that the number
of training cases is greater than the number of inputs or weights,
the time required for training is approximately proportional to:
for logistic regression
using conjugate gradients.
for logistic regression
using quasi-Newton or Newton-Raphson. Note that R is usually considerably
less for these techniques than for conjugate gradients.
for decision tree-based
models.
for neural networks
using conjugate gradients.
for neural networks
using quasi-Newton or Levenberg-Marquardt. Note that R is usually
considerably less for these techniques than for conjugate gradients.