This example analyzes mean daily temperatures of selected US cities in January and July. Both the raw data and the principal components are plotted to illustrate that principal components are orthogonal rotations of the original variables.

The following statements create the `Temperature`

data set:

data Temperature; length CityId $ 2; title 'Mean Temperature in January and July for Selected Cities'; input City $ 1-15 January July; CityId = substr(City,1,2); datalines; Mobile 51.2 81.6 Phoenix 51.2 91.2 Little Rock 39.5 81.4 Sacramento 45.1 75.2 Denver 29.9 73.0 ... more lines ... Cheyenne 26.6 69.1 ;

The following statements plot the `Temperature`

data set. The variable `Cityid`

instead of `City`

is used as a data label in the scatter plot to avoid label collisions.

title 'Mean Temperature in January and July for Selected Cities'; proc sgplot data=Temperature; scatter x=July y=January / datalabel=CityId; run;

The results are displayed in Output 77.1.1, which shows a scatter plot of the 64 pairs of data points in which July temperatures are plotted against January temperatures.

The following step requests a principal component analysis of the `Temperature`

data set:

ods graphics on; title 'Mean Temperature in January and July for Selected Cities'; proc princomp data=Temperature cov plots=score(ellipse); var July January; id CityId; run;

Output 77.1.2 displays the PROC PRINCOMP output. The standard deviation of `January`

(11.712) is higher than the standard deviation of `July`

(5.128). The COV option in the PROC PRINCOMP statement requests that the principal components be computed from the covariance
matrix. The total variance is 163.474. The first principal component accounts for about 94% of the total variance, and the
second principal component accounts for only about 6%. The eigenvalues sum to the total variance.

Note that `January`

receives a higher loading on `Prin1`

because it has a higher standard deviation than `July`

. Also note that the PRINCOMP procedure calculates the scores by using the centered variables rather than the standardized
variables.

Output 77.1.2: Results of Principal Component Analysis

Mean Temperature in January and July for Selected Cities |

The PRINCOMP Procedure

Observations | 64 |
---|---|

Variables | 2 |

Simple Statistics | ||
---|---|---|

July | January | |

Mean | 75.60781250 | 32.09531250 |

StD | 5.12761910 | 11.71243309 |

Covariance Matrix | ||
---|---|---|

July | January | |

July | 26.2924777 | 46.8282912 |

January | 46.8282912 | 137.1810888 |

Total Variance | 163.47356647 |
---|

Eigenvalues of the Covariance Matrix | ||||
---|---|---|---|---|

Eigenvalue | Difference | Proportion | Cumulative | |

1 | 154.310607 | 145.147647 | 0.9439 | 0.9439 |

2 | 9.162960 | 0.0561 | 1.0000 |

Eigenvectors | ||
---|---|---|

Prin1 | Prin2 | |

July | 0.343532 | 0.939141 |

January | 0.939141 | -.343532 |

The PLOTS=SCORE option in the PROC PRINCOMP statement requests a plot of the second principal component against the first
principal component, as shown in Output 77.1.3. It is clear from this plot that the principal components are orthogonal rotations of the original variables and that the
first principal component has a larger variance than the second principal component. In fact, the first component has a larger
variance than either of the original variables, `July`

and `January`

. The ellipse indicates that Miami, Phoenix, and Portland are possible outliers.