The PRINCOMP Procedure

Example 91.1 Analyzing Mean Temperatures of US Cities

This example analyzes mean daily temperatures of selected US cities in January and July. Both the raw data and the principal components are plotted to illustrate that principal components are orthogonal rotations of the original variables.

The following statements create the Temperature data set:

data Temperature;
   length CityId $ 2;
   title 'Mean Temperature in January and July for Selected Cities';
   input City $ 1-15 January July;
   CityId = substr(City,1,2);
   datalines;
Mobile          51.2 81.6
Phoenix         51.2 91.2
Little Rock     39.5 81.4
Sacramento      45.1 75.2
Denver          29.9 73.0

   ... more lines ...   

Cheyenne        26.6 69.1
;

The following statements plot the Temperature data set. The variable Cityid instead of City is used as a data label in the scatter plot to avoid label collisions.

title 'Mean Temperature in January and July for Selected Cities';
proc sgplot data=Temperature;
   scatter x=July y=January / datalabel=CityId;
run;

The results are displayed in Output 91.1.1, which shows a scatter plot of the 64 pairs of data points in which July temperatures are plotted against January temperatures.

Output 91.1.1: Plot of Raw Data

 Plot of Raw Data


The following step requests a principal component analysis of the Temperature data set:

ods graphics on;

title 'Mean Temperature in January and July for Selected Cities';
proc princomp data=Temperature cov plots=score(ellipse);
   var July January;
   id CityId;
run;

Output 91.1.2 displays the PROC PRINCOMP output. The standard deviation of January (11.712) is higher than the standard deviation of July (5.128). The COV option in the PROC PRINCOMP statement requests that the principal components be computed from the covariance matrix. The total variance is 163.474. The first principal component accounts for about 94% of the total variance, and the second principal component accounts for only about 6%. The eigenvalues sum to the total variance.

Note that January receives a higher loading on Prin1 because it has a higher standard deviation than July. Also note that the PRINCOMP procedure calculates the scores by using the centered variables rather than the standardized variables.

Output 91.1.2: Results of Principal Component Analysis

Mean Temperature in January and July for Selected Cities

The PRINCOMP Procedure

Observations 64
Variables 2

Simple Statistics
  July January
Mean 75.60781250 32.09531250
StD 5.12761910 11.71243309

Covariance Matrix
  July January
July 26.2924777 46.8282912
January 46.8282912 137.1810888

Total Variance 163.47356647

Eigenvalues of the Covariance Matrix
  Eigenvalue Difference Proportion Cumulative
1 154.310607 145.147647 0.9439 0.9439
2 9.162960   0.0561 1.0000

Eigenvectors
  Prin1 Prin2
July 0.343532 0.939141
January 0.939141 -.343532



The PLOTS=SCORE option in the PROC PRINCOMP statement requests a plot of the second principal component against the first principal component, as shown in Output 91.1.3. It is clear from this plot that the principal components are orthogonal rotations of the original variables and that the first principal component has a larger variance than the second principal component. In fact, the first component has a larger variance than either of the original variables, July and January. The ellipse indicates that Miami, Phoenix, and Portland are possible outliers.

Output 91.1.3: Plot of Component 2 by Component 1

 Plot of Component 2 by Component 1