Computer Resources :: SAS(R) Enterprise Miner(TM) 12.1 Extension Nodes: Developer’s Guide

The computer time and memory required for an analysis depend on the number of cases, the number of variables, the complexity of the model, and the training algorithm. For many modeling methods, there is a trade-off between time and memory.

For all modeling nodes, memory is required for the operating system, SAS supervisor, and the SAS Enterprise Miner diagram and programs, resulting in an overhead of about 20 to 30 megabytes.

The following notation will be used:

N

the number of cases.

V

the number of input variables.

I

the number of input terms or units, including dummy variables, intercepts, interactions, and polynomials.

W

the number of weights in a neural network.

O

the number of output units.

D

the average depth of a tree.

R

the number of times the training data are read in logistic regression or neural nets, which depends on the training technique, the termination criteria, the model, and the data. R is typically much larger for neural nets than for logistic regression. In regard to training techniques, R is usually smallest for Newton-Raphson or Levenberg-Marquardt, larger for quasi-Newton, and still larger for conjugate gradients.

S

the number of steps in a stepwise regression, or 1 if stepwise regression is not used.

For the Decision Tree node, the minimum additional memory required for an analysis is about 8N bytes. Training will be considerably faster if there is enough RAM to hold the entire data set, which is about 8N(V+1) bytes. If the data will not fit in memory, they must be stored in a utility file. Memory is also required to hold summary statistics for a node, such as means or a contingency table, but this amount is usually much smaller than the amount required for the data.

For the Regression node, the memory required depends on the type of model and on the training technique. For linear regression, memory usage is dominated by the SSCP matrix, which requires 8I² bytes. For logistic regression, memory usage depends on the training technique as documented in the SAS/OR Technical Report: The NLP Procedure, ranging from about 40I bytes for the conjugate gradient technique to about 8I² bytes for the Newton-Raphson technique.

For the Neural Network node, memory usage depends on the training technique as documented in the SAS/OR Technical Report: The NLP Procedure. About 40W bytes are needed for the conjugate gradient technique, but 4W² bytes are needed for the quasi-Newton and Levenberg-Marquardt techniques. For a network with biases and H hidden units in one layer, W = (I+1)H + (H+1)O.

For both logistic regression and neural networks, the conjugate gradient technique, which requires the least memory, must usually read the training data many more times than the Newton-Raphson and Levenberg-Marquardt techniques.

Assuming that the number of training cases is greater than the number of inputs or weights, the time required for training is approximately proportional to:

NI²

for linear regression.

SRNI

for logistic regression using conjugate gradients.

SRNI²

for logistic regression using quasi-Newton or Newton-Raphson. Note that R is usually considerably less for these techniques than for conjugate gradients.

DNI

for decision tree-based models.

RNW

for neural networks using conjugate gradients.

RNW²

for neural networks using quasi-Newton or Levenberg-Marquardt. Note that R is usually considerably less for these techniques than for conjugate gradients.