The HPPRINCOMP Procedure

Example 12.1 Analyzing Mean Temperatures of US Cities

This example analyzes mean daily temperatures of selected US cities in January and July. The following statements create the `Temperature` data set:

```data Temperature;
length Cityid \$ 2;
title 'Mean Temperature in January and July for Selected Cities ';
input City \$1-15 January July;
Cityid = substr(City,1,2);
datalines;
Mobile          51.2 81.6
Phoenix         51.2 91.2
Little Rock     39.5 81.4
Sacramento      45.1 75.2
Denver          29.9 73.0

... more lines ...

Cheyenne        26.6 69.1
;
```

The following statements invoke the HPPRINCOMP procedure, which requests a principal component analysis of the `Temperature` data set and outputs the scores to the `Scores` data set (OUT= `Scores`). The `Cityid` variable in the ID statement is also included in the output data set.

```title 'Mean Temperature in January and July for Selected Cities';
proc hpprincomp data=Temperature cov out=Scores;
var July January;
id Cityid;
run;
```

Output 12.1.1 displays the PROC HPPRINCOMP output. The standard deviation of `January` (11.712) is higher than the standard deviation of `July` (5.128). The COV option in the PROC HPPRINCOMP statement requests that the principal components be computed from the covariance matrix. The total variance is 163.474. The first principal component accounts for about 94% of the total variance, and the second principal component accounts for only about 6%. The eigenvalues sum to the total variance.

Note that `January` receives a higher loading on `Prin1` because it has a higher standard deviation than `July`. Also note that the HPPRINCOMP procedure calculates the scores by using the centered variables rather than the standardized variables.

Output 12.1.1: Results of Principal Component Analysis

 Mean Temperature in January and July for Selected Cities

The HPPRINCOMP Procedure

Performance Information
Execution Mode Single-Machine

Data Access Information
Data Engine Role Path
WORK.TEMPERATURE V9 Input On Client
WORK.SCORES V9 Output On Client

 Number of Observations Read 64 64

 Number of Variables 2

Simple Statistics
Mean Standard
Deviation
July 75.60781 5.12762
January 32.09531 11.71243

Covariance Matrix
July January
July 26.29248 46.82829
January 46.82829 137.18109

 Total Variance 163.474

Eigenvalues of the Covariance Matrix
Eigenvalue Difference Proportion Cumulative
1 154.310607 145.147647 0.9439 0.9439
2 9.162960   0.0561 1.0000

Eigenvectors
Prin1 Prin2
July 0.34353 0.93914
January 0.93914 -0.34353