This example demonstrates the NIPALS method in PROC HPPRINCOMP, which extracts principal components successively. The data that this example uses are from the Getting Started section; they provide crime rates per 100,000 people in seven categories for each of the 50 US states in 1977. The following DATA step generates the data:
data Crime; title 'Crime Rates per 100,000 Population by State'; input State $1-15 Murder Rape Robbery Assault Burglary Larceny Auto_Theft; datalines; Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4 California 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 ... more lines ... Wisconsin 2.8 12.9 52.2 63.7 846.9 2614.2 220.7 Wyoming . 21.9 39.7 173.9 811.6 2772.2 282.0 ;
The following statements use PROC HPPRINCOMP to extract principal components by using the NIPALS method:
proc hpprincomp data=Crime method=nipals; run;
Output 13.3.1 displays the PROC HPPRINCOMP output. The "Model Information" table shows that the NIPALS method is used to extract principal components. The "Explained Variation of Variables" table lists the fraction of variation that is accounted for in each variable by each of the seven principal components. All the variation in each variable is accounted for by seven principal components because there are only seven variables. The eigenvalues indicate that two or three components provide a good summary of the data: two components account for 76% of the total variance, and three components account for 87%. Subsequent components account for less than 5% each.
Note that in the Getting Started section, the principal components are extracted from the same data by using the eigenvalue decomposition method; the "Eigenvalues" table generated there matches the one generated by the NIPALS method. Also, the eigenvectors in the "Eigenvectors" table match the loading factors in the "Loadings" table.
Output 13.3.1: Results of Principal Component Analysis Using NIPALS
Explained Variation of Variables | |||||||
---|---|---|---|---|---|---|---|
Variable | Prin1 | Prin2 | Prin3 | Prin4 | Prin5 | Prin6 | Prin7 |
Murder | 0.37117 | 0.85539 | 0.87790 | 0.89562 | 0.97555 | 0.99143 | 1.00000 |
Rape | 0.76242 | 0.79917 | 0.84059 | 0.84199 | 0.85065 | 0.99041 | 1.00000 |
Robbery | 0.63783 | 0.64064 | 0.82164 | 0.92942 | 0.99788 | 0.99992 | 1.00000 |
Assault | 0.63517 | 0.79127 | 0.79341 | 0.91781 | 0.98822 | 0.99513 | 1.00000 |
Burglary | 0.78913 | 0.84414 | 0.88183 | 0.88207 | 0.88544 | 0.94800 | 1.00000 |
Larceny | 0.51373 | 0.72178 | 0.93718 | 0.95479 | 0.95492 | 0.95530 | 1.00000 |
Auto_Theft | 0.33638 | 0.65746 | 0.90481 | 0.96197 | 0.99623 | 0.99706 | 1.00000 |
Eigenvalues of the Data Matrix | ||||
---|---|---|---|---|
Eigenvalue | Difference | Proportion | Cumulative | |
1 | 4.045824 | 2.781795 | 0.5780 | 0.5780 |
2 | 1.264030 | 0.516529 | 0.1806 | 0.7586 |
3 | 0.747500 | 0.421175 | 0.1068 | 0.8653 |
4 | 0.326325 | 0.061119 | 0.0466 | 0.9120 |
5 | 0.265207 | 0.036843 | 0.0379 | 0.9498 |
6 | 0.228364 | 0.105613 | 0.0326 | 0.9825 |
7 | 0.122750 | 0.0175 | 1.0000 |
Loadings | |||||||
---|---|---|---|---|---|---|---|
Variable | Prin1 | Prin2 | Prin3 | Prin4 | Prin5 | Prin6 | Prin7 |
Murder | 0.30289 | -0.61893 | 0.17353 | -0.23308 | 0.54896 | -0.26371 | -0.26428 |
Rape | 0.43410 | -0.17053 | -0.23539 | 0.06540 | 0.18075 | 0.78232 | 0.27946 |
Robbery | 0.39705 | 0.04713 | 0.49208 | -0.57470 | -0.50808 | 0.09452 | 0.02497 |
Assault | 0.39622 | -0.35142 | -0.05343 | 0.61744 | -0.51525 | -0.17395 | -0.19921 |
Burglary | 0.44164 | 0.20861 | -0.22454 | -0.02750 | 0.11273 | -0.52340 | 0.65085 |
Larceny | 0.35634 | 0.40570 | -0.53681 | -0.23231 | 0.02172 | -0.04085 | -0.60346 |
Auto_Theft | 0.28834 | 0.50400 | 0.57524 | 0.41853 | 0.35939 | 0.06024 | -0.15487 |