The FACTOR Procedure |
This example analyzes socioeconomic data provided by Harman (1976). The five variables represent total population, median school years, total employment, miscellaneous professional services, and median house value. Each observation represents one of twelve census tracts in the Los Angeles Standard Metropolitan Statistical Area.
The first analysis is a principal component analysis. Simple descriptive statistics and correlations are also displayed. The following statements produce Output 33.1.1:
data SocioEconomics; title 'Five Socioeconomic Variables'; title2 'See Page 14 of Harman: Modern Factor Analysis, 3rd Ed'; input Population School Employment Services HouseValue; datalines; 5700 12.8 2500 270 25000 1000 10.9 600 10 10000 3400 8.8 1000 10 9000 3800 13.6 1700 140 25000 4000 12.8 1600 140 25000 8200 8.3 2600 60 12000 1200 11.4 400 10 16000 9100 11.5 3300 60 14000 9900 12.5 3400 180 18000 9600 13.7 3600 390 25000 9600 9.6 3300 80 12000 9400 11.4 4000 100 13000 ;
title3 'Principal Component Analysis'; proc factor data=SocioEconomics simple corr; run;
There are two large eigenvalues, 2.8733 and 1.7967, which together account for 93.4% of the standardized variance. Thus, the first two principal components provide an adequate summary of the data for most purposes. Three components, explaining 97.7% of the variation, should be sufficient for almost any application. PROC FACTOR retains two components on the basis of the eigenvalues-greater-than-one rule since the third eigenvalue is only 0.2148.
The first component has large positive loadings for all five variables. The correlation with Services () is especially high. The second component is a contrast of Population (0.8064) and Employment () against School () and HouseValue (), with a very small loading on Services ().
Means and Standard Deviations from 12 Observations |
||
---|---|---|
Variable | Mean | Std Dev |
Population | 6241.667 | 3439.9943 |
School | 11.442 | 1.7865 |
Employment | 2333.333 | 1241.2115 |
Services | 120.833 | 114.9275 |
HouseValue | 17000.000 | 6367.5313 |
Correlations | |||||
---|---|---|---|---|---|
Population | School | Employment | Services | HouseValue | |
Population | 1.00000 | 0.00975 | 0.97245 | 0.43887 | 0.02241 |
School | 0.00975 | 1.00000 | 0.15428 | 0.69141 | 0.86307 |
Employment | 0.97245 | 0.15428 | 1.00000 | 0.51472 | 0.12193 |
Services | 0.43887 | 0.69141 | 0.51472 | 1.00000 | 0.77765 |
HouseValue | 0.02241 | 0.86307 | 0.12193 | 0.77765 | 1.00000 |
Five Socioeconomic Variables |
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed |
Principal Component Analysis |
Eigenvalues of the Correlation Matrix: Total = 5 Average = 1 |
||||
---|---|---|---|---|
Eigenvalue | Difference | Proportion | Cumulative | |
1 | 2.87331359 | 1.07665350 | 0.5747 | 0.5747 |
2 | 1.79666009 | 1.58182321 | 0.3593 | 0.9340 |
3 | 0.21483689 | 0.11490283 | 0.0430 | 0.9770 |
4 | 0.09993405 | 0.08467868 | 0.0200 | 0.9969 |
5 | 0.01525537 | 0.0031 | 1.0000 |
The final communality estimates show that all the variables are well accounted for by two components, with final communality estimates ranging from for Services to for Population.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.