The HPPRINCOMP Procedure

Example 58.1 Analyzing Mean Temperatures of US Cities

This example analyzes mean daily temperatures of selected US cities in January and July. The following statements create the Temperature data set:

data Temperature;
   length Cityid $ 2;
   title 'Mean Temperature in January and July for Selected Cities ';
   input City $1-15 January July;
   Cityid = substr(City,1,2);
   datalines;
Mobile          51.2 81.6
Phoenix         51.2 91.2
Little Rock     39.5 81.4
Sacramento      45.1 75.2
Denver          29.9 73.0

   ... more lines ...   

Cheyenne        26.6 69.1
;

The following statements invoke the HPPRINCOMP procedure, which requests a principal component analysis of the Temperature data set and outputs the scores to the Scores data set (OUT= Scores). The Cityid variable in the ID statement is also included in the output data set.

title 'Mean Temperature in January and July for Selected Cities';
proc hpprincomp data=Temperature cov out=Scores;
   var July January;
   id Cityid;
run;

Output 58.1.1 displays the PROC HPPRINCOMP output. The standard deviation of January (11.712) is higher than the standard deviation of July (5.128). The COV option in the PROC HPPRINCOMP statement requests that the principal components be computed from the covariance matrix. The total variance is 163.474. The first principal component accounts for about 94% of the total variance, and the second principal component accounts for only about 6%. The eigenvalues sum to the total variance.

Note that January receives a higher loading on Prin1 because it has a higher standard deviation than July. Also note that the HPPRINCOMP procedure calculates the scores by using the centered variables rather than the standardized variables.

Output 58.1.1: Results of Principal Component Analysis

Mean Temperature in January and July for Selected Cities

The HPPRINCOMP Procedure

Performance Information
Execution Mode Single-Machine
Number of Threads 4

Data Access Information
Data Engine Role Path
WORK.TEMPERATURE V9 Input On Client
WORK.SCORES V9 Output On Client

Model Information
Data Source WORK.TEMPERATURE
Component Extraction Method Eigenvalue Decomposition

Number of Observations Read 64
Number of Observations Used 64

Number of Variables 2
Number of Principal Components 2

Simple Statistics
Variable Mean Standard
Deviation
July 75.60781 5.12762
January 32.09531 11.71243

Covariance Matrix
Variable July January
July 26.29248 46.82829
January 46.82829 137.18109

Total Variance 163.47356647

Eigenvalues of the Covariance Matrix
  Eigenvalue Difference Proportion Cumulative
1 154.310607 145.147647 0.9439 0.9439
2 9.162960   0.0561 1.0000

Eigenvectors
Variable Prin1 Prin2
July 0.34353 0.93914
January 0.93914 -0.34353