Principal component analysis was originated by Pearson (1901) and later developed by Hotelling (1933). The application of principal components is discussed by Rao (1964), Cooley and Lohnes (1971), Gnanadesikan (1977), and Jackson (1991). Excellent statistical treatments of principal components are found in Kshirsagar (1972), Morrison (1976), and Mardia, Kent, and Bibby (1979).
Principal component modeling focuses on the number of components used. The analysis begins with an eigenvalue decomposition of the sample covariance matrix, ,

as

where is a diagonal matrix and is an orthogonal matrix (Jackson, 1991; Mardia, Kent, and Bibby, 1979). The columns of are the eigenvectors, and the diagonal elements of are the eigenvalues. The eigenvectors are customarily scaled so that they have unit length.
A principal component, , is a linear combination of the original variables. The coefficients are the eigenvectors of the covariance matrix. The principal component scores for the ith observation are computed as

The principal components are sorted by descending order of the eigenvalues, which are equal to the variances of the components.
The eigenvectors are the principal component loadings. The eigenvectors are orthogonal, so the principal components represent jointly perpendicular directions through the space of the original variables. The scores on the first j principal components have the highest possible generalized variance of any set of j unitlength linear combinations of the original variables.
The first j principal components provide a least squares solution to the model

where is an matrix of the centered observed variables, is the matrix of scores on the first j principal components, is the matrix of eigenvectors, and is an matrix of residuals. The first j principal components are the vectors (rows of ) that minimize trace, the sum of all the squared elements in .
The first j principal components are the best linear predictors of the process variables among all possible sets of j variables, although any nonsingular linear transformation of the first j principal components provides equally good prediction. The same result is obtained by minimizing the determinant or the Euclidean norm of rather than the trace.