The FASTCLUS Procedure

Computational Resources

Subsections:

Memory
Time

Let

$\begin{eqnarray*} n & = & \mbox{number of observations} \\ v & = & \mbox{number of variables} \\ c & = & \mbox{number of clusters} \\ p & = & \mbox{number of passes over the data set} \\ \end{eqnarray*}$

Memory

The memory required is approximately $4(19v + 12cv + 10c + 2 \max (c+ 1, v))$ bytes.

If you request the DISTANCE option, an additional $4c(c + 1)$ bytes of space is needed.

Time

The overall time required by PROC FASTCLUS is roughly proportional to $nvcp$ if c is small with respect to n.

Initial seed selection requires one pass over the data set. If the observations are in random order, the time required is roughly proportional to

$nvc + vc^2$

unless you specify REPLACE=NONE. In that case, a complete pass might not be necessary, and the time is roughly proportional to $mvc$ , where $c \leq m \leq n$ .

The DRIFT option, each iteration, and the final assignment of cluster seeds each require one pass, with time for each pass roughly proportional to $nvc$ .

For greatest efficiency, you should list the variables in the VAR statement in order of decreasing variance.