The FASTCLUS Procedure

Computational Resources

Let

$\displaystyle  n  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of observations}  $
$\displaystyle v  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of variables}  $
$\displaystyle c  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of clusters}  $
$\displaystyle p  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of passes over the data set}  $

Memory

The memory required is approximately $4(19v + 12cv + 10c + 2 \max (c+ 1, v))$ bytes.

If you request the DISTANCE option, an additional $4c(c + 1)$ bytes of space is needed.

Time

The overall time required by PROC FASTCLUS is roughly proportional to $nvcp$ if c is small with respect to n.

Initial seed selection requires one pass over the data set. If the observations are in random order, the time required is roughly proportional to

\[  nvc + vc^2  \]

unless you specify REPLACE=NONE. In that case, a complete pass might not be necessary, and the time is roughly proportional to $mvc$, where $c \leq m \leq n$.

The DRIFT option, each iteration, and the final assignment of cluster seeds each require one pass, with time for each pass roughly proportional to $nvc$.

For greatest efficiency, you should list the variables in the VAR statement in order of decreasing variance.