This example uses computer-generated data to mimic a Hertzsprung-Russell plot (Struve and Zebergs, 1962, p. 259) of the temperature and luminosity of stars. The data are plotted and displayed in Output 60.4.1. It appears that there are two main groups of stars and a collection of isolated stars. The long straggling group of points appearing diagonally across the figure represents the main group of stars; the more compact group in the top-right corner contains giant stars. The JOIN= option is specified at a 0.05 significance level with various smoothing parameters. The CK=5 option is specified in order to prevent the numerous outliers from forming separate clusters. The results from PROC MODECLUS is displayed in Output 60.4.2. The cluster memberships are then plotted by PROC SGPLOT, as displayed in Output 60.4.3 through Output 60.4.5.
Note that the graphic output from PROC SGPLOT in Output 60.4.3 is not available when _R_ = 2.5 because only one cluster remains after joining at a 5% significance level, and the results are not written to the OUT= data set. See the description of the JOIN= option). for more information.
The following statements produce Output 60.4.1 through Output 60.4.5:
title 'Hertzsprung-Russell Plot of Visible Stars'; title2 'Computer-Generated Simulated Data'; data hr; input x y @@; label x='-Temperature' y='-Luminosity'; datalines; 1.0 12.8 0.9 13.7 0.9 12.9 1.0 12.3 1.0 12.2 2.6 10.9 2.4 10.9 2.5 11.2 2.3 11.5 2.6 12.0 2.4 12.1 2.3 10.9 2.6 11.5 2.5 11.9 2.4 11.0 3.4 11.1 3.3 11.2 3.4 11.1 3.4 9.9 3.2 10.4 3.5 10.8 3.4 11.0 3.3 11.2 3.3 10.8 3.5 10.0 3.5 10.2 3.4 10.2 3.6 10.6 3.7 10.4 3.7 10.1 3.4 10.7 3.4 10.8 3.3 11.0 3.6 10.8 3.5 10.1 4.5 10.3 4.6 9.4 4.3 10.3 4.6 9.4 4.4 9.9 4.5 10.4 4.4 9.9 4.6 9.4 4.4 10.7 4.4 9.3 4.4 9.5 4.1 10.6 4.4 10.6 4.5 10.3 4.4 10.0 4.2 9.8 4.5 9.5 4.2 13.4 4.6 10.4 4.5 9.8 5.8 8.8 5.6 8.4 5.6 13.9 5.7 9.5 5.6 14.5 5.6 9.2 5.7 8.7 5.7 9.4 5.7 9.3 5.6 9.4 5.8 9.8 5.5 8.8 5.8 8.9 5.7 9.4 5.6 12.1 5.4 10.1 5.8 9.3 5.9 9.0 5.7 10.0 5.6 9.3 6.6 8.6 6.7 8.5 6.7 12.5 ... more lines ... 26.4 14.1 26.6 14.2 27.5 13.7 27.6 14.4 27.8 14.0 27.4 14.7 25.8 13.5 25.6 13.6 26.8 14.4 26.4 19.0 26.0 13.4 27.3 14.0 27.5 14.3 27.4 14.5 26.3 13.8 26.9 13.7 26.3 13.7 27.7 14.3 27.3 14.1 28.3 14.2 17.4 15.5 13.8 15.2 12.0 11.6 14.1 12.8 17.1 10.2 16.9 15.4 18.5 12.6 14.2 16.1 23.2 6.6 11.4 12.4 20.4 11.7 20.9 8.1 18.9 13.7 16.9 9.7 15.5 9.9 18.3 14.2 19.3 13.7 17.0 12.9 10.1 11.6 17.9 13.5 14.3 1.4 13.1 -0.8 8.1 -0.9 20.0 7.0 21.0 8.5 15.6 13.2 ;
proc sgplot data=hr; scatter y=y x=x; run;
proc modeclus data=hr m=1 r=1 1.5 2 2.5 ck=5 join=.05 short out=out; run; title2 'MODECLUS Analysis'; proc sgplot data=out; scatter y=y x=x/group=cluster; by _R_; run;
Output 60.4.2: Results from PROC MODECLUS
Hertzsprung-Russell Plot of Visible Stars |
Computer-Generated Simulated Data |
Cluster Summary | |||||
---|---|---|---|---|---|
R | CK | Number of Clusters Joined |
Maximum P-value |
Number of Clusters |
Frequency of Unclassified Objects |
1 | 5 | 14 | 0.0001 | 2 | 0 |
1.5 | 5 | 6 | 0.0000 | 3 | 0 |
2 | 5 | 4 | 0.0000 | 2 | 0 |
2.5 | 5 | 2 | 0.0000 | 1 | 0 |