The FASTCLUS Procedure

Missing Values

Observations with all missing values are excluded from the analysis. If you specify the NOMISS option, observations with any missing values are excluded. Observations with missing values cannot be cluster seeds.

The distance between an observation with missing values and a cluster seed is obtained by computing the squared distance based on the nonmissing values, multiplying by the ratio of the number of variables, n, to the number of variables having nonmissing values, m, and taking the square root:

\[  \sqrt { \left( \frac{n}{m} \right) \sum ( x_ i - s_ i )^2 }  \]


$\displaystyle  n  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of variables}  $
$\displaystyle m  $
$\displaystyle  =  $
$\displaystyle  \mbox{number of variables with nonmissing values}  $
$\displaystyle x_ i  $
$\displaystyle  =  $
$\displaystyle  \mbox{value of the \emph{i}th variable for the observation}  $
$\displaystyle s_ i  $
$\displaystyle  =  $
$\displaystyle  \mbox{value of the \emph{i}th variable for the seed}  $

If you specify the LEAST=p option with a power p other than 2 (the default), the distance is computed using

\[  \left(\left( \frac{n}{m} \right) \sum ( x_ i - s_ i )^ p\right)^{\frac{1}{p}}  \]

The summation is taken over variables with nonmissing values.

The IMPUTE option fills in missing values in the OUT= output data set.