The FASTCLUS Procedure

Computational Resources

Subsections:

Let

\begin{eqnarray*} n & = & \mbox{number of observations} \\ v & = & \mbox{number of variables} \\ c & = & \mbox{number of clusters} \\ p & = & \mbox{number of passes over the data set} \\ \end{eqnarray*}

Memory

The memory required is approximately $4(19v + 12cv + 10c + 2 \max (c+ 1, v))$ bytes.

If you request the DISTANCE option, an additional $4c(c + 1)$ bytes of space is needed.

Time

The overall time required by PROC FASTCLUS is roughly proportional to $nvcp$ if c is small with respect to n.

Initial seed selection requires one pass over the data set. If the observations are in random order, the time required is roughly proportional to

\[ nvc + vc^2 \]

unless you specify REPLACE=NONE. In that case, a complete pass might not be necessary, and the time is roughly proportional to $mvc$, where $c \leq m \leq n$.

The DRIFT option, each iteration, and the final assignment of cluster seeds each require one pass, with time for each pass roughly proportional to $nvc$.

For greatest efficiency, you should list the variables in the VAR statement in order of decreasing variance.