The VARCLUS Procedure

Computational Resources

Let

$\displaystyle  n  $
$\displaystyle = $
$\displaystyle  \mbox{number of observations} $
$\displaystyle v  $
$\displaystyle = $
$\displaystyle  \mbox{number of variables} $
$\displaystyle c  $
$\displaystyle = $
$\displaystyle  \mbox{number of clusters}  $

It is assumed that, at each stage of clustering, the clusters all contain the same number of variables.

Time

The time required for PROC VARCLUS to analyze a given data set varies greatly depending on the number of clusters requested, the number of iterations in both the alternating least squares and search phases, and whether centroid or principal components are used.

The time required to compute the correlation matrix is roughly proportional to $nv^2$.

Default cluster initialization requires time roughly proportional to $v^3$. Any other method of initialization requires time roughly proportional to $cv^2$.

In the alternating least squares phase, each iteration requires time roughly proportional to $cv^2$ if centroid components are used or

\[  \left(c+5\frac{v}{c^2}\right)v^2  \]

if principal components are used.

In the search phase, each iteration requires time roughly proportional to $v^3/c$ if centroid components are used or $v^4/c^2$ if principal components are used. The HIERARCHY option speeds up each iteration after the first split by as much as $c/2$.

Memory

The amount of memory, in bytes, needed by PROC VARCLUS is approximately

\[  v^2+2vc+20v+15c  \]