Previous Page | Next Page

The FACTOR Procedure

Example 33.1 Principal Component Analysis

This example analyzes socioeconomic data provided by Harman (1976). The five variables represent total population, median school years, total employment, miscellaneous professional services, and median house value. Each observation represents one of twelve census tracts in the Los Angeles Standard Metropolitan Statistical Area.

The first analysis is a principal component analysis. Simple descriptive statistics and correlations are also displayed. The following statements produce Output 33.1.1:

   data SocioEconomics;
      title 'Five Socioeconomic Variables';
      title2 'See Page 14 of Harman: Modern Factor Analysis, 3rd Ed';
      input Population School Employment Services HouseValue;
      datalines;
   5700     12.8      2500      270       25000
   1000     10.9      600       10        10000
   3400     8.8       1000      10        9000
   3800     13.6      1700      140       25000
   4000     12.8      1600      140       25000
   8200     8.3       2600      60        12000
   1200     11.4      400       10        16000
   9100     11.5      3300      60        14000
   9900     12.5      3400      180       18000
   9600     13.7      3600      390       25000
   9600     9.6       3300      80        12000
   9400     11.4      4000      100       13000
   ;
   title3 'Principal Component Analysis';
   proc factor data=SocioEconomics simple corr;
   run;

There are two large eigenvalues, 2.8733 and 1.7967, which together account for 93.4% of the standardized variance. Thus, the first two principal components provide an adequate summary of the data for most purposes. Three components, explaining 97.7% of the variation, should be sufficient for almost any application. PROC FACTOR retains two components on the basis of the eigenvalues-greater-than-one rule since the third eigenvalue is only 0.2148.

The first component has large positive loadings for all five variables. The correlation with Services () is especially high. The second component is a contrast of Population (0.8064) and Employment () against School () and HouseValue (), with a very small loading on Services ().

Output 33.1.1 Principal Component Analysis
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Principal Component Analysis

The FACTOR Procedure

Means and Standard Deviations from
12 Observations
Variable Mean Std Dev
Population 6241.667 3439.9943
School 11.442 1.7865
Employment 2333.333 1241.2115
Services 120.833 114.9275
HouseValue 17000.000 6367.5313

Correlations
  Population School Employment Services HouseValue
Population 1.00000 0.00975 0.97245 0.43887 0.02241
School 0.00975 1.00000 0.15428 0.69141 0.86307
Employment 0.97245 0.15428 1.00000 0.51472 0.12193
Services 0.43887 0.69141 0.51472 1.00000 0.77765
HouseValue 0.02241 0.86307 0.12193 0.77765 1.00000

Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Principal Component Analysis

The FACTOR Procedure
Initial Factor Method: Principal Components


Prior Communality Estimates: ONE

Eigenvalues of the Correlation Matrix: Total
= 5 Average = 1
  Eigenvalue Difference Proportion Cumulative
1 2.87331359 1.07665350 0.5747 0.5747
2 1.79666009 1.58182321 0.3593 0.9340
3 0.21483689 0.11490283 0.0430 0.9770
4 0.09993405 0.08467868 0.0200 0.9969
5 0.01525537   0.0031 1.0000


2 factors will be retained by the MINEIGEN criterion.

Factor Pattern
  Factor1 Factor2
Population 0.58096 0.80642
School 0.76704 -0.54476
Employment 0.67243 0.72605
Services 0.93239 -0.10431
HouseValue 0.79116 -0.55818

Variance Explained by Each
Factor
Factor1 Factor2
2.8733136 1.7966601

Final Communality Estimates: Total = 4.669974
Population School Employment Services HouseValue
0.98782629 0.88510555 0.97930583 0.88023562 0.93750041

The final communality estimates show that all the variables are well accounted for by two components, with final communality estimates ranging from for Services to for Population.

Previous Page | Next Page | Top of Page