![]() | ![]() | ![]() |
There are two ways. One uses PROC FASTCLUS, the other uses PROC DISCRIM.
The FASTCLUS method is suitable if the original clusters were obtained by FASTCLUS or by CLUSTER with METHOD=WARD. With other methods, there is no formal justification for using FASTCLUS, although results will probably be reasonable with METHOD=AVERAGE or CENTROID, and maybe with MEDIAN and FLEXIBLE. For METHOD=COMPLETE, you would want to use FASTCLUS with LEAST=MAX.
DISCRIM with METHOD=NPAR is suitable if the original clusters were obtained by MODECLUS or by CLUSTER with METHOD= DENSITY or TWOSTAGE. Be sure to specify the same density-estimation method in DISCRIM as was used for clustering. DISCRIM using nearest neighbor (K=1) would be the natural choice for clusters from METHOD=SINGLE. DISCRIM with METHOD=NORMAL is problematic because the distributional assumptions in DISCRIM do not correspond to any of the clustering methods.
With the default settings, FASTCLUS will offer a quick solution for even very large data sets. The default convergence value of 0.02 allows a reasonable solution in a reasonable amount of time. Changing the convergence criterion to 0 may require a large number of iterations and substantially increase computation time.
By default, FASTCLUS gives no information about whether or not a run converged as it presumes you want the fast clustering. However, if you specify a value of MAXITER= that is greater than one, the procedure presumes that you are interested in reaching convergence so the SAS log will show
WARNING: Iteration limit reached without convergenceif the procedure did not converge. Trial and error may be the only way to determine how large MAXITER= must be for your data and MAXC= setting.
If you want to create clusters using FASTCLUS that you can exactly reproduce later, then on the initial FASTCLUS run, you will need to specify CONVERGE=0 and increase MAXITER= until it converges. Then, running FASTCLUS as at the beginning of this paper article will reproduce the same clusters.
If you want to create clusters using FASTCLUS that you can exactly reproduce later, then on the initial FASTCLUS run, create an OUTSTAT= data set. (You do not have to specify CONVERGE=0.) Then to create the exact same clusters on the second FASTCLUS run, specify the OUTSTAT data set from the first run as the INSTAT= data set. You do not have to specify MAXC=, REPLACE= or MAXITER= the second time if you use this method.
| Product Family | Product | System | SAS Release | |
| Reported | Fixed* | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | SAS Reference ==> Procedures ==> MODELCLUS SAS Reference ==> Procedures ==> DISCRIM Analytics ==> Discriminant Analysis Analytics ==> Nonparametric Analysis Analytics ==> Cluster Analysis Analytics ==> Data Mining Analytics ==> Multivariate Analysis SAS Reference ==> Procedures ==> FASTCLUS SAS Reference ==> Procedures ==> CLUSTER |
| Date Modified: | 2010-08-31 12:11:34 |
| Date Created: | 2002-12-16 10:56:37 |



