This example analyzes mean daily temperatures of selected US cities in January and July. The following statements create the
Temperature
data set:
data Temperature; length Cityid $ 2; title 'Mean Temperature in January and July for Selected Cities '; input City $1-15 January July; Cityid = substr(City,1,2); datalines; Mobile 51.2 81.6 Phoenix 51.2 91.2 Little Rock 39.5 81.4 Sacramento 45.1 75.2 Denver 29.9 73.0 ... more lines ... Cheyenne 26.6 69.1 ;
The following statements invoke the HPPRINCOMP procedure, which requests a principal component analysis of the Temperature
data set and outputs the scores to the Scores
data set (OUT= Scores
). The Cityid
variable in the ID statement is also included in the output data set.
title 'Mean Temperature in January and July for Selected Cities'; proc hpprincomp data=Temperature cov out=Scores; var July January; id Cityid; run;
Output 13.1.1 displays the PROC HPPRINCOMP output. The standard deviation of January
(11.712) is higher than the standard deviation of July
(5.128). The COV option in the PROC HPPRINCOMP statement requests that the principal components be computed from the covariance
matrix. The total variance is 163.474. The first principal component accounts for about 94% of the total variance, and the
second principal component accounts for only about 6%. The eigenvalues sum to the total variance.
Note that January
receives a higher loading on Prin1
because it has a higher standard deviation than July
. Also note that the HPPRINCOMP procedure calculates the scores by using the centered variables rather than the standardized
variables.
Output 13.1.1: Results of Principal Component Analysis