Previous Page  Next Page 
Multivariate Analyses

Principal Component Analysis

Principal component analysis was originated by Pearson (1901) and later developed by Hotelling (1933). It is a multivariate technique for examining relationships among several quantitative variables. Principal component analysis can be used to summarize data and detect linear relationships. It can also be used for exploring polynomial relationships and for multivariate outlier detection (Gnanadesikan 1997).

Principal component analysis reduces the dimensionality of a set of data while trying to preserve the structure. Given a data set with ny Y variables, ny eigenvalues and their associated eigenvectors can be computed from its covariance or correlation matrix. The eigenvectors are standardized to unit length.

The principal components are linear combinations of the Y variables. The coefficients of the linear combinations are the eigenvectors of the covariance or correlation matrix. Principal components are formed as follows:

For a covariance or correlation matrix, the sum of its eigenvalues equals the trace of the matrix, that is, the sum of the variances of the ny variables for a covariance matrix, and ny for a correlation matrix. The principal components are sorted by descending order of their variances, which are equal to the associated eigenvalues.

Principal components can be used to reduce the number of variables in statistical analyses. Different methods for selecting the number of principal components to retain have been suggested. One simple criterion is to retain components with associated eigenvalues greater than the average eigenvalue (Kaiser 1958). SAS/INSIGHT software offers this criterion as an option for selecting the numbers of eigenvalues, eigenvectors, and principal components in the analysis.

Principal components have a variety of useful properties (Rao 1964; Kshirsagar 1972):

SAS/INSIGHT software computes principal components from either the correlation or the covariance matrix. The covariance matrix can be used when the variables are measured on comparable scales. Otherwise, the correlation matrix should be used. The new variables with principal component scores have variances equal to corresponding eigenvalues (Variance=Eigenvalues) or one (Variance=1). You specify the computation method and type of output components in the method options dialog, as shown in Figure 40.3. By default, SAS/INSIGHT software uses the correlation matrix with new variable variances equal to corresponding eigenvalues.

Previous Page  Next Page  Top of Page

Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.