Previous Page | Next Page

The FACTOR Procedure

Example 33.2 Principal Factor Analysis

This example uses the data presented in Example 33.1, and performs a principal factor analysis with squared multiple correlations for the prior communality estimates (PRIORS=SMC). Unlike Example 33.1, which analyzes the principal components, the current analysis is based on a common factor model.

To help determine whether the common factor model is appropriate, Kaiser’s measure of sampling adequacy (MSA) is requested, and the residual correlations and partial correlations are computed (RESIDUAL).

The ROTATE= and REORDER options are specified to enhance factor interpretability. The ROTATE=PROMAX option produces an orthogonal varimax prerotation (default) followed by an oblique Procrustes rotation, and the REORDER option reorders the variables according to their largest factor loadings. An OUTSTAT= data set is created by PROC FACTOR and displayed in Output 33.2.12.

PROC FACTOR can produce high-quality graphs that are very useful to interpret the factor solutions. To request these graphs, you must first enable ODS Graphics by specifying the ods graphics on statement, as shown in the following code. All ODS graphs in PROC FACTOR are requested with the PLOTS= option. In this example, a scree plot is requested to help you determine the number of factors. Loading plots for the initial unrotated solution, prerotated (varimax) solution, and promax-rotated solution are also requested to help you visualize the patterns of factor loadings in various stages.

   
   ods graphics on;
   
   title3 'Principal Factor Analysis with Promax Rotation';
   proc factor data=SocioEconomics 
      priors=smc msa residual 
      rotate=promax reorder
      outstat=fact_all 
      plots=(scree initloadings preloadings loadings);
   run;
   
   ods graphics off;
   

Output 33.2.1 displays the results of the principal factor extraction.

Output 33.2.1 Principal Factor Analysis
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Principal Factor Analysis with Promax Rotation

The FACTOR Procedure
Initial Factor Method: Principal Factors

Partial Correlations Controlling all other Variables
  Population School Employment Services HouseValue
Population 1.00000 -0.54465 0.97083 0.09612 0.15871
School -0.54465 1.00000 0.54373 0.04996 0.64717
Employment 0.97083 0.54373 1.00000 0.06689 -0.25572
Services 0.09612 0.04996 0.06689 1.00000 0.59415
HouseValue 0.15871 0.64717 -0.25572 0.59415 1.00000

Kaiser's Measure of Sampling Adequacy: Overall MSA = 0.57536759
Population School Employment Services HouseValue
0.47207897 0.55158839 0.48851137 0.80664365 0.61281377

Prior Communality Estimates: SMC
Population School Employment Services HouseValue
0.96859160 0.82228514 0.96918082 0.78572440 0.84701921

Eigenvalues of the Reduced Correlation Matrix: Total = 4.39280116 Average = 0.87856023
  Eigenvalue Difference Proportion Cumulative
1 2.73430084 1.01823217 0.6225 0.6225
2 1.71606867 1.67650586 0.3907 1.0131
3 0.03956281 0.06408626 0.0090 1.0221
4 -.02452345 0.04808427 -0.0056 1.0165
5 -.07260772   -0.0165 1.0000

If the data are appropriate for the common factor model, the partial correlations controlling the other variables should be small compared to the original correlations. The partial correlation between the variables School and HouseValue, for example, is , slightly less than the original correlation of . The partial correlation between Population and School is , which is much larger in absolute value than the original correlation; this is an indication of trouble. Kaiser’s MSA is a summary, for each variable and for all variables together, of how much smaller the partial correlations are than the original correlations. Values of or are considered good, while MSAs below are unacceptable. The variables Population, School, and Employment have very poor MSAs. Only the Services variable has a good MSA. The overall MSA of is sufficiently poor that additional variables should be included in the analysis to better define the common factors. A commonly used rule is that there should be at least three variables per factor. In the following analysis, there seems to be two common factors in these data, so more variables are needed for a reliable analysis.

The SMCs are all fairly large; hence, the factor loadings do not differ greatly from those in the principal component analysis.

The eigenvalues in Output 33.2.1 show clearly that two common factors are present. The first two largest positive eigenvalues account for of the common variance. This is possible because the reduced correlation matrix, in general, is not necessarily positive definite, and negative eigenvalues for the matrix are possible. These cumulative proportions of common variance explained by factors are plotted in the right panel of Output 33.2.2, which shows that the curve flattens out essentially after the second factor. Showing in the left panel of Output 33.2.2 is the scree plot, which displays a sharp bend at the third eigenvalue, reinforcing the conclusion that two common factors are present.

Output 33.2.2 Scree and Variance Explained Plots
Scree and Variance Explained Plots

As displayed in Output 33.2.3, the principal factor pattern is similar to the principal component pattern seen in Example 33.1. For example, the variable Services has the largest loading on the first factor, and the Population variable has the smallest. The variables Population and Employment have large positive loadings on the second factor, and the HouseValue and School variables have large negative loadings.

Output 33.2.3 Initial Factor Pattern Matrix and Communalities
Factor Pattern
  Factor1 Factor2
Services 0.87899 -0.15847
HouseValue 0.74215 -0.57806
Employment 0.71447 0.67936
School 0.71370 -0.55515
Population 0.62533 0.76621

Variance Explained by Each
Factor
Factor1 Factor2
2.7343008 1.7160687

Final Communality Estimates: Total = 4.450370
Population School Employment Services HouseValue
0.97811334 0.81756387 0.97199928 0.79774304 0.88494998

The final communality estimates are all fairly close to the priors. Only the communality for the variable HouseValue increased appreciably, from 0.847 to 0.885. Nearly 100% of the common variance is accounted for. The residual correlations (off-diagonal elements) are low, the largest being 0.03 (Output 33.2.4). The partial correlations are not quite as impressive, since the uniqueness values are also rather small. These results indicate that the SMCs are good but not quite optimal communality estimates.

Output 33.2.4 Residual and Partial Correlations
Residual Correlations With Uniqueness on the Diagonal
  Population School Employment Services HouseValue
Population 0.02189 -0.01118 0.00514 0.01063 0.00124
School -0.01118 0.18244 0.02151 -0.02390 0.01248
Employment 0.00514 0.02151 0.02800 -0.00565 -0.01561
Services 0.01063 -0.02390 -0.00565 0.20226 0.03370
HouseValue 0.00124 0.01248 -0.01561 0.03370 0.11505

Root Mean Square Off-Diagonal Residuals: Overall = 0.01693282
Population School Employment Services HouseValue
0.00815307 0.01813027 0.01382764 0.02151737 0.01960158

Partial Correlations Controlling Factors
  Population School Employment Services HouseValue
Population 1.00000 -0.17693 0.20752 0.15975 0.02471
School -0.17693 1.00000 0.30097 -0.12443 0.08614
Employment 0.20752 0.30097 1.00000 -0.07504 -0.27509
Services 0.15975 -0.12443 -0.07504 1.00000 0.22093
HouseValue 0.02471 0.08614 -0.27509 0.22093 1.00000

Root Mean Square Off-Diagonal Partials: Overall = 0.18550132
Population School Employment Services HouseValue
0.15850824 0.19025867 0.23181838 0.15447043 0.18201538

As displayed in Output 33.2.5, the unrotated factor pattern reveals two tight clusters of variables, with the variables HouseValue and School at the negative end of Factor2 axis and the variables Employment and Population at the positive end. The Services variable is in between but closer to the HouseValue and School variables. A good rotation would put the reference axes through the two clusters.

Output 33.2.5 Unrotated Factor Pattern Plot
Unrotated Factor Pattern Plot

Output 33.2.6 and Output 33.2.7 display the results of the varimax rotation. This rotation puts one axis through the variables HouseValue and School but misses the Population and Employment variables slightly.

Output 33.2.6 Varimax Rotation: Transform Matrix and Rotated Pattern
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Principal Factor Analysis with Promax Rotation

The FACTOR Procedure
Prerotation Method: Varimax

Orthogonal Transformation Matrix
  1 2
1 0.78895 0.61446
2 -0.61446 0.78895

Rotated Factor Pattern
  Factor1 Factor2
HouseValue 0.94072 -0.00004
School 0.90419 0.00055
Services 0.79085 0.41509
Population 0.02255 0.98874
Employment 0.14625 0.97499

Variance Explained by Each
Factor
Factor1 Factor2
2.3498567 2.1005128

Final Communality Estimates: Total = 4.450370
Population School Employment Services HouseValue
0.97811334 0.81756387 0.97199928 0.79774304 0.88494998

Output 33.2.7 Varimax-Rotated Factor Loadings
Varimax-Rotated Factor Loadings

An alternative to the scatter plot of factor loadings as shown in Output 33.2.7 is the so-called vector plot of loadings. The vector plot is requested with the suboption VECTOR in the PLOTS= option. For example:

   plots=preloadings(vector)

This will generate the vector plot of loadings as shown in Output 33.2.8.

Output 33.2.8 Varimax-Rotated Factor Loadings: Vector Plot
Varimax-Rotated Factor Loadings: Vector Plot

The results of oblique promax rotation are shown in Output 33.2.9 and Output 33.2.10. The corresponding plot of factor loadings is shown in Output 33.2.11.

Output 33.2.9 Promax Rotation: Procrustean Target and Transformation
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Principal Factor Analysis with Promax Rotation

The FACTOR Procedure
Rotation Method: Promax (power = 3)

Target Matrix for Procrustean Transformation
  Factor1 Factor2
HouseValue 1.00000 -0.00000
School 1.00000 0.00000
Services 0.69421 0.10045
Population 0.00001 1.00000
Employment 0.00326 0.96793

Procrustean Transformation Matrix
  1 2
1 1.04116598 -0.0986534
2 -0.1057226 0.96303019

Normalized Oblique Transformation
Matrix
  1 2
1 0.73803 0.54202
2 -0.70555 0.86528

Output 33.2.10 Promax Rotation: Promax-Rotated Factor Solution
Inter-Factor Correlations
  Factor1 Factor2
Factor1 1.00000 0.20188
Factor2 0.20188 1.00000

Rotated Factor Pattern (Standardized Regression Coefficients)
  Factor1 Factor2
HouseValue 0.95558485 -0.0979201
School 0.91842142 -0.0935214
Services 0.76053238 0.33931804
Population -0.0790832 1.00192402
Employment 0.04799 0.97509085

Reference Axis Correlations
  Factor1 Factor2
Factor1 1.00000 -0.20188
Factor2 -0.20188 1.00000

Reference Structure (Semipartial Correlations)
  Factor1 Factor2
HouseValue 0.93591 -0.09590
School 0.89951 -0.09160
Services 0.74487 0.33233
Population -0.07745 0.98129
Employment 0.04700 0.95501

Variance Explained by Each
Factor Eliminating Other
Factors
Factor1 Factor2
2.2480892 2.0030200

Factor Structure (Correlations)
  Factor1 Factor2
HouseValue 0.93582 0.09500
School 0.89954 0.09189
Services 0.82903 0.49286
Population 0.12319 0.98596
Employment 0.24484 0.98478

Variance Explained by Each
Factor Ignoring Other Factors
Factor1 Factor2
2.4473495 2.2022803

Final Communality Estimates: Total = 4.450370
Population School Employment Services HouseValue
0.97811334 0.81756387 0.97199928 0.79774304 0.88494998

Output 33.2.11 Promax-Rotated Factor Loadings
Promax-Rotated Factor Loadings

As shown in Output 33.2.11, the promax solution places an axis through the variables Population and Employment but misses the HouseValue and School variables. Since an independent-cluster solution would be possible if it were not for the variable Services, a Harris-Kaiser rotation weighted by the Cureton-Mulaik technique could be used. Rather than reanalyze the entire problem with the Harris-Kaiser rotation, you can simply use the preceding results stored in the OUTSTAT= data set.

First, the OUTSTAT= data set is printed using this code:

   
   title3 'Factor Output Data Set';
   proc print data=fact_all;
   run;
   

The output data set is displayed in Output 33.2.12.

Output 33.2.12 Output Data Set
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Factor Output Data Set

Obs _TYPE_ _NAME_ Population School Employment Services HouseValue
1 MEAN   6241.67 11.4417 2333.33 120.833 17000.00
2 STD   3439.99 1.7865 1241.21 114.928 6367.53
3 N   12.00 12.0000 12.00 12.000 12.00
4 CORR Population 1.00 0.0098 0.97 0.439 0.02
5 CORR School 0.01 1.0000 0.15 0.691 0.86
6 CORR Employment 0.97 0.1543 1.00 0.515 0.12
7 CORR Services 0.44 0.6914 0.51 1.000 0.78
8 CORR HouseValue 0.02 0.8631 0.12 0.778 1.00
9 COMMUNAL   0.98 0.8176 0.97 0.798 0.88
10 PRIORS   0.97 0.8223 0.97 0.786 0.85
11 EIGENVAL   2.73 1.7161 0.04 -0.025 -0.07
12 UNROTATE Factor1 0.63 0.7137 0.71 0.879 0.74
13 UNROTATE Factor2 0.77 -0.5552 0.68 -0.158 -0.58
14 RESIDUAL Population 0.02 -0.0112 0.01 0.011 0.00
15 RESIDUAL School -0.01 0.1824 0.02 -0.024 0.01
16 RESIDUAL Employment 0.01 0.0215 0.03 -0.006 -0.02
17 RESIDUAL Services 0.01 -0.0239 -0.01 0.202 0.03
18 RESIDUAL HouseValue 0.00 0.0125 -0.02 0.034 0.12
19 PRETRANS Factor1 0.79 -0.6145 . . .
20 PRETRANS Factor2 0.61 0.7889 . . .
21 PREROTAT Factor1 0.02 0.9042 0.15 0.791 0.94
22 PREROTAT Factor2 0.99 0.0006 0.97 0.415 -0.00
23 TRANSFOR Factor1 0.74 -0.7055 . . .
24 TRANSFOR Factor2 0.54 0.8653 . . .
25 FCORR Factor1 1.00 0.2019 . . .
26 FCORR Factor2 0.20 1.0000 . . .
27 PATTERN Factor1 -0.08 0.9184 0.05 0.761 0.96
28 PATTERN Factor2 1.00 -0.0935 0.98 0.339 -0.10
29 RCORR Factor1 1.00 -0.2019 . . .
30 RCORR Factor2 -0.20 1.0000 . . .
31 REFERENC Factor1 -0.08 0.8995 0.05 0.745 0.94
32 REFERENC Factor2 0.98 -0.0916 0.96 0.332 -0.10
33 STRUCTUR Factor1 0.12 0.8995 0.24 0.829 0.94
34 STRUCTUR Factor2 0.99 0.0919 0.98 0.493 0.09

This output data set can be used for Harris-Kaiser rotation by deleting observations with _TYPE_=’PATTERN’ and _TYPE_=’FCORR’, which are for the promax-rotated factors, and changing _TYPE_=’UNROTATE’ to _TYPE_=’PATTERN’. In this way, the initial orthogonal factor pattern matrix is saved in the observations with _TYPE_=’PATTERN’. The following factor analysis will then read in the factor pattern in the fact2 data set as an initial factor solution, which will then be rotated by the Harris-Kaiser rotation with Cureton-Mulaik weights.


The following statements produce Output 33.2.13:

   data fact2(type=factor);
      set fact_all;
      if _TYPE_ in('PATTERN' 'FCORR') then delete;
      if _TYPE_='UNROTATE' then _TYPE_='PATTERN';
   ods graphics on;
   
   title3 'Harris-Kaiser Rotation with Cureton-Mulaik Weights';
   proc factor rotate=hk norm=weight reorder
        plots=loadings;
   run;
   
   ods graphics off;

The results of the Harris-Kaiser rotation are displayed in Output 33.2.13.

Output 33.2.13 Harris-Kaiser Rotation
Five Socioeconomic Variables
See Page 14 of Harman: Modern Factor Analysis, 3rd Ed
Harris-Kaiser Rotation with Cureton-Mulaik Weights

The FACTOR Procedure
Rotation Method: Harris-Kaiser (hkpower = 0)

Variable Weights for Rotation
Population School Employment Services HouseValue
0.95982747 0.93945424 0.99746396 0.12194766 0.94007263

Oblique Transformation Matrix
  1 2
1 0.73537 0.61899
2 -0.68283 0.78987

Inter-Factor Correlations
  Factor1 Factor2
Factor1 1.00000 0.08358
Factor2 0.08358 1.00000

Rotated Factor Pattern (Standardized Regression Coefficients)
  Factor1 Factor2
HouseValue 0.94048 0.00279
School 0.90391 0.00327
Services 0.75459 0.41892
Population -0.06335 0.99227
Employment 0.06152 0.97885

Reference Axis Correlations
  Factor1 Factor2
Factor1 1.00000 -0.08358
Factor2 -0.08358 1.00000

Reference Structure (Semipartial Correlations)
  Factor1 Factor2
HouseValue 0.93719 0.00278
School 0.90075 0.00326
Services 0.75195 0.41745
Population -0.06312 0.98880
Employment 0.06130 0.97543

Variance Explained by Each
Factor Eliminating Other
Factors
Factor1 Factor2
2.2628537 2.1034731

Factor Structure (Correlations)
  Factor1 Factor2
HouseValue 0.94071 0.08139
School 0.90419 0.07882
Services 0.78960 0.48198
Population 0.01958 0.98698
Employment 0.14332 0.98399

Variance Explained by Each
Factor Ignoring Other Factors
Factor1 Factor2
2.3468965 2.1875158

Final Communality Estimates: Total = 4.450370
Population School Employment Services HouseValue
0.97811334 0.81756387 0.97199928 0.79774304 0.88494998

A plot of the rotated loadings is shown in Output 33.2.14.

Output 33.2.14 Factor Pattern with Harris-Kaiser Rotation
Factor Pattern with Harris-Kaiser Rotation

In the results of the Harris-Kaiser rotation, the variable Services receives a small weight, and the axes are placed as desired.


Previous Page | Next Page | Top of Page