Multivariate Analysis: Principal Component Analysis

Overview of the Principal Component Analysis

Principal component analysis is a technique for reducing the complexity of high-dimensional data. You can use principal component analysis to approximate high-dimensional data with fewer dimensions. Each dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.

You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a low-dimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering, and other statistical techniques.

You can run the Principal Component analysis by selecting Analysis →Multivariate Analysis →Principal Component Analysis from the main menu. The analysis is implemented by calling the PRINCOMP procedure in SAS/STAT software. See the PRINCOMP procedure documentation in the SAS/STAT User's Guide for additional details.