This example uses computer-generated data to mimic a Hertzsprung-Russell plot (Struve and Zebergs 1962, p. 259) of the temperature and luminosity of stars. The data are plotted and displayed in Output 78.4.1. It appears that there are two main groups of stars and a collection of isolated stars. The long straggling group of points appearing diagonally across the figure represents the main group of stars; the more compact group in the top-right corner contains giant stars. The JOIN= option is specified at a 0.05 significance level with various smoothing parameters. The CK=5 option is specified in order to prevent the numerous outliers from forming separate clusters. The results from PROC MODECLUS is displayed in Output 78.4.2. The cluster memberships are then plotted by PROC SGPLOT, as displayed in Output 78.4.3 through Output 78.4.5.
Note that the graphic output from PROC SGPLOT in Output 78.4.3 is not available when _R_ = 2.5 because only one cluster remains after joining at a 5% significance level, and the results are not written to the OUT= data set. See the description of the JOIN= option). for more information.
The following statements produce Output 78.4.1 through Output 78.4.5:
title 'Hertzsprung-Russell Plot of Visible Stars'; title2 'Computer-Generated Simulated Data'; data hr; input x y @@; label x='-Temperature' y='-Luminosity'; datalines; 1.0 12.8 0.9 13.7 0.9 12.9 1.0 12.3 1.0 12.2 2.6 10.9 2.4 10.9 2.5 11.2 2.3 11.5 2.6 12.0 2.4 12.1 2.3 10.9 2.6 11.5 2.5 11.9 2.4 11.0 3.4 11.1 3.3 11.2 3.4 11.1 3.4 9.9 3.2 10.4 3.5 10.8 3.4 11.0 3.3 11.2 3.3 10.8 3.5 10.0 3.5 10.2 3.4 10.2 3.6 10.6 3.7 10.4 3.7 10.1 3.4 10.7 3.4 10.8 3.3 11.0 3.6 10.8 3.5 10.1 4.5 10.3 4.6 9.4 4.3 10.3 4.6 9.4 4.4 9.9 4.5 10.4 4.4 9.9 4.6 9.4 4.4 10.7 4.4 9.3 4.4 9.5 4.1 10.6 4.4 10.6 4.5 10.3 4.4 10.0 4.2 9.8 4.5 9.5 4.2 13.4 4.6 10.4 4.5 9.8 5.8 8.8 5.6 8.4 5.6 13.9 5.7 9.5 5.6 14.5 5.6 9.2 5.7 8.7 5.7 9.4 5.7 9.3 5.6 9.4 5.8 9.8 5.5 8.8 5.8 8.9 5.7 9.4 5.6 12.1 5.4 10.1 5.8 9.3 5.9 9.0 5.7 10.0 5.6 9.3 6.6 8.6 6.7 8.5 6.7 12.5 ... more lines ... 26.4 14.1 26.6 14.2 27.5 13.7 27.6 14.4 27.8 14.0 27.4 14.7 25.8 13.5 25.6 13.6 26.8 14.4 26.4 19.0 26.0 13.4 27.3 14.0 27.5 14.3 27.4 14.5 26.3 13.8 26.9 13.7 26.3 13.7 27.7 14.3 27.3 14.1 28.3 14.2 17.4 15.5 13.8 15.2 12.0 11.6 14.1 12.8 17.1 10.2 16.9 15.4 18.5 12.6 14.2 16.1 23.2 6.6 11.4 12.4 20.4 11.7 20.9 8.1 18.9 13.7 16.9 9.7 15.5 9.9 18.3 14.2 19.3 13.7 17.0 12.9 10.1 11.6 17.9 13.5 14.3 1.4 13.1 -0.8 8.1 -0.9 20.0 7.0 21.0 8.5 15.6 13.2 ;
proc sgplot data=hr; scatter y=y x=x; run;
proc modeclus data=hr m=1 r=1 1.5 2 2.5 ck=5 join=.05 short out=out; run; title2 'MODECLUS Analysis'; proc sgplot data=out; scatter y=y x=x/group=cluster; by _R_; run;
Output 78.4.1: Scatter Plot of Data
Output 78.4.2: Results from PROC MODECLUS
Output 78.4.3: Scatter Plots of Cluster Memberships by _R_= 1
Output 78.4.4: Scatter Plots of Cluster Memberships by _R_= 1.5
Output 78.4.5: Scatter Plots of Cluster Memberships by _R_=2