The PRINCOMP Procedure |
This example uses the PRINCOMP procedure to analyze job performance. Police officers were rated by their supervisors in 14 categories as part of standard police departmental administrative procedure.
The following statements create the Jobratings data set:
options validvarname=any; data Jobratings; input ('Communication Skills'n 'Problem Solving'n 'Learning Ability'n 'Judgment Under Pressure'n 'Observational Skills'n 'Willingness to Confront Problems'n 'Interest in People'n 'Interpersonal Sensitivity'n 'Desire for Self-Improvement'n 'Appearance'n 'Dependability'n 'Physical Ability'n 'Integrity'n 'Overall Rating'n) (1.); datalines; 26838853879867 74758876857667 56757863775875 67869777988997 99997798878888 89897899888799 89999889899798 87794798468886 35652335143113 89888879576867 76557899446397 97889998898989 76766677598888 ... more lines ... 99899899899899 76656399567486 ;
The data set Jobratings contains 14 variables. Each variable contains the job ratings, using a scale measurement from 1 to 10 (1=fail to comply, 10=exceptional). The last variable Overall Rating contains a score as an overall index on how each officer performs.
The following statements request a principal component analysis on the Jobratings data set, output the scores to the Scores data set (OUT= Scores), and produce default plots. Note that variable Overall Rating is excluded from the analysis.
ods graphics on; proc princomp data=Jobratings(drop='Overall Rating'n); run;
Figure 69.3.1 and Figure 69.3.2 display the PROC PRINCOMP output, beginning with simple statistics followed by the correlation matrix. By default, the PROC PRINCOMP statement requests principal components computed from the correlation matrix, so the total variance is equal to the number of variables, 13. In this example, it would also be reasonable to use the COV option, which would cause variables with a high variance (such as Dependability) to have more influence on the results than variables with a low variance (such as Learning Ability). If you used the COV option, scores would be computed from centered rather than standardized variables.
Observations | 103 |
---|---|
Variables | 13 |
Simple Statistics | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Communication Skills | Problem Solving | Learning Ability | Judgment Under Pressure | Observational Skills | Willingness to Confront Problems |
Interest in People | Interpersonal Sensitivity | Desire for Self-Improvement | Appearance | Dependability | Physical Ability | Integrity | |
Mean | 6.650485437 | 6.631067961 | 6.990291262 | 6.737864078 | 6.932038835 | 7.291262136 | 6.708737864 | 6.621359223 | 6.572815534 | 7.000000000 | 6.825242718 | 7.203883495 | 7.213592233 |
StD | 1.764068036 | 1.590352602 | 1.339411238 | 1.731830976 | 1.761584269 | 1.525155524 | 1.892353385 | 1.760773587 | 1.729796212 | 1.798692335 | 1.917040123 | 1.555251845 | 1.845240223 |
Correlation Matrix | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Communication Skills |
Problem Solving | Learning Ability | Judgment Under Pressure |
Observational Skills |
Willingness to Confront Problems |
Interest in People |
Interpersonal Sensitivity |
Desire for Self-Improvement |
Appearance | Dependability | Physical Ability | Integrity | |
Communication Skills | 1.0000 | 0.6280 | 0.5546 | 0.5538 | 0.5381 | 0.5265 | 0.4391 | 0.5030 | 0.5642 | 0.4913 | 0.5471 | 0.2192 | 0.5081 |
Problem Solving | 0.6280 | 1.0000 | 0.5690 | 0.6195 | 0.4284 | 0.5015 | 0.3972 | 0.4398 | 0.4090 | 0.3873 | 0.4546 | 0.3201 | 0.3846 |
Learning Ability | 0.5546 | 0.5690 | 1.0000 | 0.4892 | 0.6230 | 0.5245 | 0.2735 | 0.1855 | 0.5737 | 0.3988 | 0.5110 | 0.2269 | 0.3142 |
Judgment Under Pressure | 0.5538 | 0.6195 | 0.4892 | 1.0000 | 0.3733 | 0.4004 | 0.6226 | 0.6134 | 0.4826 | 0.2266 | 0.5471 | 0.3476 | 0.5883 |
Observational Skills | 0.5381 | 0.4284 | 0.6230 | 0.3733 | 1.0000 | 0.7300 | 0.2616 | 0.1655 | 0.5985 | 0.4177 | 0.5626 | 0.4274 | 0.3906 |
Willingness to Confront Problems | 0.5265 | 0.5015 | 0.5245 | 0.4004 | 0.7300 | 1.0000 | 0.2233 | 0.1291 | 0.5307 | 0.4825 | 0.4870 | 0.4872 | 0.3260 |
Interest in People | 0.4391 | 0.3972 | 0.2735 | 0.6226 | 0.2616 | 0.2233 | 1.0000 | 0.8051 | 0.4857 | 0.2679 | 0.6074 | 0.3768 | 0.7452 |
Interpersonal Sensitivity | 0.5030 | 0.4398 | 0.1855 | 0.6134 | 0.1655 | 0.1291 | 0.8051 | 1.0000 | 0.3713 | 0.2600 | 0.5408 | 0.2182 | 0.6920 |
Desire for Self-Improvement | 0.5642 | 0.4090 | 0.5737 | 0.4826 | 0.5985 | 0.5307 | 0.4857 | 0.3713 | 1.0000 | 0.4474 | 0.5981 | 0.3752 | 0.5664 |
Appearance | 0.4913 | 0.3873 | 0.3988 | 0.2266 | 0.4177 | 0.4825 | 0.2679 | 0.2600 | 0.4474 | 1.0000 | 0.5089 | 0.3820 | 0.4135 |
Dependability | 0.5471 | 0.4546 | 0.5110 | 0.5471 | 0.5626 | 0.4870 | 0.6074 | 0.5408 | 0.5981 | 0.5089 | 1.0000 | 0.4461 | 0.6536 |
Physical Ability | 0.2192 | 0.3201 | 0.2269 | 0.3476 | 0.4274 | 0.4872 | 0.3768 | 0.2182 | 0.3752 | 0.3820 | 0.4461 | 1.0000 | 0.3810 |
Integrity | 0.5081 | 0.3846 | 0.3142 | 0.5883 | 0.3906 | 0.3260 | 0.7452 | 0.6920 | 0.5664 | 0.4135 | 0.6536 | 0.3810 | 1.0000 |
Figure 69.3.2 displays the eigenvalues. The first principal component explains about 50% of the total variance, the second principal component explains about 13.6%, and the third principal component explains about 7.7%. Note that the eigenvalues sum to the total variance. The eigenvalues indicate that three to five components provide a good summary of the data, with three components accounting for about 71.7% of the total variance and five components explaining about 82.7%. Subsequent components contribute less than 5% each.
Eigenvalues of the Correlation Matrix | ||||
---|---|---|---|---|
Eigenvalue | Difference | Proportion | Cumulative | |
1 | 6.54740242 | 4.77468744 | 0.5036 | 0.5036 |
2 | 1.77271499 | 0.76747933 | 0.1364 | 0.6400 |
3 | 1.00523565 | 0.26209665 | 0.0773 | 0.7173 |
4 | 0.74313901 | 0.06479499 | 0.0572 | 0.7745 |
5 | 0.67834402 | 0.22696368 | 0.0522 | 0.8267 |
6 | 0.45138034 | 0.06922167 | 0.0347 | 0.8614 |
7 | 0.38215866 | 0.08432613 | 0.0294 | 0.8908 |
8 | 0.29783254 | 0.02340663 | 0.0229 | 0.9137 |
9 | 0.27442591 | 0.01208809 | 0.0211 | 0.9348 |
10 | 0.26233782 | 0.01778332 | 0.0202 | 0.9550 |
11 | 0.24455450 | 0.04677622 | 0.0188 | 0.9738 |
12 | 0.19777828 | 0.05508241 | 0.0152 | 0.9890 |
13 | 0.14269586 | 0.0110 | 1.0000 |
Eigenvectors | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prin1 | Prin2 | Prin3 | Prin4 | Prin5 | Prin6 | Prin7 | Prin8 | Prin9 | Prin10 | Prin11 | Prin12 | Prin13 | |
Communication Skills | 0.303548 | 0.052039 | -.329181 | -.227039 | 0.181087 | -.416563 | 0.143543 | 0.333846 | -.430955 | 0.375983 | 0.028370 | -.252778 | -.122809 |
Problem Solving | 0.278034 | 0.057046 | -.400112 | 0.300476 | 0.453604 | 0.096750 | 0.048904 | 0.199259 | 0.256098 | -.372914 | -.434417 | 0.069863 | -.116642 |
Learning Ability | 0.266521 | 0.288152 | -.354591 | -.020735 | -.219329 | 0.578388 | -.114808 | 0.064088 | 0.224706 | 0.287031 | 0.210540 | -.284355 | 0.248555 |
Judgment Under Pressure | 0.294376 | -.199458 | -.255164 | 0.397306 | -.030188 | 0.102087 | 0.068204 | -.591822 | -.358618 | 0.178270 | 0.118318 | 0.306490 | -.126636 |
Observational Skills | 0.276641 | 0.366979 | 0.065959 | 0.035711 | -.325257 | -.301254 | -.297894 | 0.163484 | 0.258377 | 0.223793 | -.079692 | 0.565290 | -.168555 |
Willingness to Confront Problems | 0.267580 | 0.392989 | 0.098723 | 0.184409 | 0.038278 | -.458585 | -.044796 | -.365684 | 0.129976 | -.330710 | 0.275249 | -.386151 | 0.177688 |
Interest in People | 0.278060 | -.432916 | 0.118113 | 0.046047 | -.111279 | 0.030870 | -.011105 | 0.154829 | 0.321200 | -.081470 | 0.393841 | -.210915 | -.610215 |
Interpersonal Sensitivity | 0.253814 | -.495662 | -.064547 | -.060000 | 0.107807 | -.170305 | -.088194 | 0.192725 | 0.137468 | -.074821 | 0.285447 | 0.276824 | 0.643410 |
Desire for Self-Improvement | 0.299833 | 0.099077 | 0.061097 | -.211279 | -.427477 | 0.105369 | 0.689011 | 0.087453 | -.121474 | -.363854 | -.052085 | 0.151436 | 0.053834 |
Appearance | 0.237358 | 0.190065 | 0.248353 | -.544587 | 0.568044 | 0.221643 | 0.049267 | -.257497 | 0.087395 | 0.061890 | 0.168369 | 0.236655 | -.113705 |
Dependability | 0.319480 | -.049742 | 0.169476 | -.156070 | -.130575 | 0.202301 | -.594850 | 0.081242 | -.495598 | -.377561 | -.164909 | -.090904 | -.018094 |
Physical Ability | 0.213868 | 0.097499 | 0.614959 | 0.514519 | 0.203995 | 0.173168 | 0.169247 | 0.302536 | -.149625 | 0.258321 | -.006202 | -.055828 | 0.133430 |
Integrity | 0.298246 | -.301812 | 0.190222 | -.169062 | -.130757 | -.100039 | 0.029456 | -.317545 | 0.271060 | 0.297010 | -.612497 | -.276273 | 0.114965 |
When the ods graphics on statement is specified, PROC PRINCOMP produces the scree plot as shown in Figure 69.3.3 by default, which helps to visualize and choose the number of components. You can obtain more plots by specifying the PLOTS= option in the PROC PRINCOMP statement.
The "Scree Plot" on the left shows that the eigenvalue of the first component is approximately 6.5 and the eigenvalue of the second component is largely decreased to under 2.0. The "Variance Explained" plot on the right shows that you can explain a near 80% of total variance with the first four principal components.
The first component reflects overall performance since the first eigenvector shows approximately equal loadings on all variables. The second eigenvector has high positive loadings on the variables Observational Skills and Willingness to Confront Problems but even higher negative loadings on the variables Interest in People and Interpersonal Sensitivity. This component seems to reflect the ability to take action, but it also reflects a lack of interpersonal skills. The third eigenvector has a very high positive loading on the variable Physical Ability and high negative loadings on the variables Problem Solving and Learning Ability. This component seems to reflect physical strength, but also shows poor learning and problem-solving skills.
In short, the three components represent the following:
overall performance
smart, tough, and introverted
superior strength and average intellect
PROC PRINCOMP also produces other plots besides the scree plot, which are helpful while interpreting the results. The following statements request plots from the PRINCOMP procedure:
proc princomp data=Jobratings(drop='Overall Rating'n) plots(ncomp=3)=all n=5; run; ods graphics off;
PLOTS=ALL(NCOMP=3) in the PROC PRINCOMP statement requests all plots to be produced but limits the number of components to be plotted in the component pattern plots and the component score plots to three. The N=5 option sets the number of principal components to be computed to five. Besides a scree plot similar to the one shown before, the rest of plots are displayed in the following context.
Output 69.3.4 shows a matrix plot of component scores between the first five principal components. The histogram of each component is displayed in the diagonal element of the matrix. The histograms indicate that the first principal component is skewed to the left and the second principal component is slightly skewed to the right.
The pairwise component pattern plots are shown in Output 69.3.5 to Output 69.3.7. The pattern plots show the following:
All variables positively and evenly correlate with the first principal component (Output 69.3.5 and Output 69.3.6).
The variables Observational Skills and Willingness to Confront Problems correlate highly with the second component, and the variables Interest in People and Interpersonal Sensitivity correlate highly but negatively with the second component (Output 69.3.5).
The variable Physical Ability correlates highly with the third component, and the variables Problem Solving and Learning Ability correlate highly but negatively with the third component (Output 69.3.6).
The variable Observational Skills, Willingness to Confront Problems, Interest in People, and Interpersonal Sensitivity correlate highly (either positively or negatively) with the second component, but all have very low correlations with the third component; the variables Physical Ability and Problem Solving correlate highly (either positively or negatively) with the third component, but both have very low correlations with the second component (Output 69.3.7).
Output 69.3.8 shows a component pattern profile. As it was shown in the pattern plots, the nearly horizontal profile from the first component indicates that the first component is mostly correlated evenly across all variables.
Output 69.3.9 through Output 69.3.11 display the pairwise component score plots. Observation numbers are used as the plotting symbol.
Output 69.3.9 shows a scatter plot of the first and third components. Observations 82, 9, and 84 seem like outliers on the first component; Observations 16 and 59 can be potential outliers on the second component.
Output 69.3.10 shows a scatter plot of the first and third components. Observations 82, 9, and 84 seem like outliers on the first component.
Output 69.3.11 shows a scatter plot of the second and third components. Observations 95, 15, 16, and 59 can be potential outliers on the second component.
Output 69.3.12 shows a scatter plot of the second and third components, displaying density with color. Color interpolation is based on the first component, such as in the statistical style, going from blue (minimum density) to tan (median density) and to red (maximum density).
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.