
COV

computes the principal components from the covariance matrix. By default, the correlation matrix is analyzed. The COV option
causes variables with large variances to be more strongly associated with components that have large eigenvalues, and it causes
variables with small variances to be more strongly associated with components that have small eigenvalues. You should not
specify the COV option unless the units in which the variables are measured are comparable or the variables are standardized
in some way.
Note: Specifying the COV option has the same effect as specifying the NOSCALE option.

CV=ONE
CV=BLOCK <(cvblockoptions)>
CV=SPLIT <(cvsplitoptions)>
CV=RANDOM <(cvrandomoptions)>

specifies that cross validation be performed to determine the number of principal components and specifies the method to be
used. If you do not specify the CV= option, no cross validation is performed.
In cross validation, the input data are repeatedly divided into a training set, which is used to compute a model, and a test set, which is used to test the model fit. The cross validation that is performed here is along both observations and variables,
as described in Eastment and Krzanowski (1982), which is a more detailed version of the “alternative scheme” of Wold (1978). The observations and variables are separately divided into groups. Each test set is the intersection of one observation
group and one variable group, so the number of test sets that are used is the product of the number of observation groups
and the number of variable groups. See the section Cross Validation for more information.
Note: The CV= option is experimental in this release.
CV=ONE requests oneatatime cross validation, in which each observation group contains one observation and each variable group contains one variable.
This approach is very computationally intensive because it computes separate principal component models for each potential number of principal components, where n is the number of observations in the input data set and p is the number of process variables.
CV=BLOCK requests blocked cross validation, in which observation groups consist of blocks of nobs consecutive observations and variable groups consist of blocks of nvar consecutive variables. You can specify the following cvblockoptions in parentheses after the CV=BLOCK option:

NOBS=nobs

specifies that observation groups consist of blocks of nobs consecutive observations from the input data. For example, if you specify NOBS=8, the first group contains observations 1 through 8, the second group contains observations 9 through 16, and so on. The default
is 7.

NVAR=nvar

specifies that variable groups consist of blocks of nvar consecutive variables from the input data. For example, if you specify NVAR=3, the first group contains variables 1 through 3, the second group contains variables 4 through 6, and so on. The default
is 7.
CV=SPLIT requests splitsample cross validation, in which observation groups are formed by selecting every nobsth observation and variable groups are formed by selecting very nvarth variable. You can specify the following cvsplitoptions in parentheses after the CV=SPLIT option:

NOBS=nobs

specifies that observation groups be created by selecting every nobsth observation from the input data. For example, if you specify NOBS=8, the first group contains observations {1, 9, 17, …}, the second group contains observations {2, 10, 18, …}, and so on. The
default is 7.

NVAR=nvar

specifies that variable groups be created by selecting every nvarth variable from the input data. For example, if you specify NVAR=5, the first group contains variables {1, 6, 11, …}, the second group contains variables {2, 7, 12, …}, and so on. The default
is 7.
CV=RANDOM requests that observations and variables be assigned to groups randomly. You can specify the following cvrandomoptions in parentheses after the CV=RANDOM option:

NITEROBS=nogrp

specifies the number of observation groups. The default is 10.

NITERVAR=nvgrp

specifies the number of variable groups. The default is 10.

NTESTOBS=nobs

specifies the number of observations in each observation group. The default is onetenth the total number of observations.

NTESTVAR=nvar

specifies the number of variables in each variable group. The default is onetenth the total number of variables.

SEED=n

specifies an integer used to start the pseudorandom number generator for selecting the random test set. If you do not specify
a seed or if you specify a value less than or equal to zero, the seed is generated by default from reading the time of day
from the computer’s clock.
Note: You cannot specify the CV= option together with the NCOMP= option.

DATA=SASdataset

specifies the input SAS data set to be analyzed. If the DATA= option is omitted, the procedure uses the most recently created
SAS data set.

MISSING=AVG  NONE

specifies how observations with missing values are to be handled in computing the fit. MISSING=AVG specifies that the fit
be computed by replacing missing values of a process variable with the average of its nonmissing values. The default is MISSING=NONE,
which excludes observations with missing values for any process variables from the analysis.

NCOMP=n  ALL

specifies the number of principal components to extract. The default is , where p is the number of process variables and N is the number of observations (runs). You can specify NCOMP=ALL to override the limit of 15 principal components. You cannot
specify the NCOMP= option together with the CV= option. If the number of nonzero eigenvalues of the correlation matrix is less than the number of components specified, p, then the p will be reset to the number of nonzero eigenvalues.

NOCENTER

suppresses centering of the process variables before fitting. This is useful if the variables are already centered and scaled.
See the section Centering and Scaling for more information.

NOCVSTDIZE

suppresses recentering and rescaling of the process variables before each model is fit in the cross validation. See the section
Centering and Scaling for more information.

NOPRINT

suppresses the display of all results, both tabular and graphical. This is useful when you want to produce only output data
sets.

NOSCALE

suppresses scaling of the process variables before fitting. This is useful if the variables are already centered and scaled.
Note: Specifying the NOSCALE option has the same effect as specifying the COV option.

OUT=SASdataset

creates an output data set that contains all the original data from the input data set, principal component scores, and multivariate
summary statistics. See the section Output Data Sets for details.

OUTLOADINGS=SASdataset

creates an output data set that contains the loadings for the principal components and the eigenvalues of the correlation
(or covariance) matrix. See the section Output Data Sets for details.

PLOTS <(globalplotoptions)> <= plotrequest <(options)>>
PLOTS <(globalplotoptions)> <= (plotrequest <(options)> <... plotrequest <(options)>>)>

controls the plots produced through ODS Graphics. When you specify only one plot request, you can omit the parentheses around
the plot request. For example:
plots=none
plots=score
plots=loadings
ODS Graphics must be enabled before you request plots. For general information about ODS Graphics, see Chapter 21: Statistical Graphics Using ODS in SAS/STAT 12.1 User's Guide.
You can specify the following globalplotoptions:

FLIP

interchanges the Xaxis and Yaxis dimensions for all score and loading plots.

NCOMP=n

specifies that pairwise score and loading plots be produced for the first n principal components. The default is 5 or the total number of components , whichever is smaller. If , then the default is NCOMP=j. Be aware that the number of score or loading plots produced () grows quadratically as n increases.

ONLY

suppresses the default plots. Only plots specifically requested are displayed. The default plots are the CV plot, when you
specify the CV= option, and the scree and variationexplained plots otherwise.
You can specify the following plotrequests:

ALL

produces all appropriate plots.

CVPLOT

produces a plot that displays the results of the cross validation and Rsquare analysis. This plot requires that the CV= option be specified and in that case is displayed by default.

LOADINGS <(loadingoptions)>

produces a matrix of pairwise scatter plots of the principal component loadings. Use NCOMP=n to specify the number of principal components for which plots are produced, and use the FLIP option to interchange the default
Xaxis and Yaxis dimensions.
You can specify the following loadingoptions:

FLIP

flips or interchanges the Xaxis and Yaxis dimensions of the loading plots. Specify PLOTS=LOADING(FLIP) to flip the Xaxis
and Yaxis dimensions.

NCOMP=n

specifies that pairwise loading plots be produced for the first n principal components. The default is the value specified by the NCOMP= globalplotoption. If , then the default is NCOMP=j. Be aware that the number of loading plots produced () grows quadratically as n increases.

UNPACKPANEL
UNPACK

suppresses paneling of loading plots. By default, all the loading plots appear in a single output panel. Specify UNPACKPANEL
to display each loading plot in a separate panel.

NONE

suppresses the display of all plots.

SCORES <(scoreoptions)>

produces pairwise scatter plots of the principal component scores. You can use the NCOMP= option to control the number of plots that are displayed.
You can specify the following scoreoptions:

ALPHA=value

specifies the probability used to compute a prediction ellipse that is overlaid on the score plot. The default is 0.05. If
you specify the ALPHA= option, you do not need to specify the ELLIPSE option.

ELLIPSE

requests that a prediction ellipse be overlaid on the principal component score plots. The probability that a new observation
falls outside the prediction ellipse is specified by the ALPHA= option.

FLIP

flips or interchanges the Xaxis and Yaxis dimensions of the score plots. Specify PLOTS=SCORES(FLIP) to flip the Xaxis and
Yaxis dimensions.

GROUP=variable

specifies a variable in the input data set used to group the points on the score plots. Points with different GROUP= variable
values are plotted using different markers and colors to distinguish the groups.

LABELS=ON  OFF  OUTSIDE

specifies which points in the score plots to label. Specify LABELS=ON to label all points and LABELS=OFF to label none of
the points. Points are labeled with the values of the first variable listed in the ID statement, or the observation number if no ID statement is specified.
If you specify the ELLIPSE and UNPACKPANEL options, you can specify LABELS=OUTSIDE to label only the points outside the confidence ellipse.
The default is ON if you specify UNPACKPANEL and OFF otherwise.

NCOMP=n

specifies that pairwise score plots be produced for the first n principal components. The default is the value specified by the NCOMP= globalplotoption. If , then the default is NCOMP=j. Be aware that the number of loading plots produced () grows quadratically as n increases.

UNPACKPANEL

suppresses paneling of score plots. By default, all the score plots appear in a single output panel. Specify UNPACKPANEL to
display each score plot in a separate panel.

SCREE <UNPACK>
EIGEN
EIGENVALUE

produces a scree plot of eigenvalues and a varianceexplained plot. By default, both plots are produced in a panel. Specify
PLOTS= SCREE(UNPACKPANEL) to display each plot in a separate panel. This plot is produced by default unless you specify the
CV= option.

PREFIX=name

specifies a prefix for naming the principal component scores in the OUT= data set. By default, the names are Prin1
, Prin2
, …, Prin
j. If you specify PREFIX=ABC, the components are named ABC1
, ABC2
, ABC3
, and so on. The number of characters in the prefix plus the number of digits in j should not exceed the current name length defined by the VALIDVARNAME= system option.

RPREFIX=name

specifies a prefix for naming the residual variables in the OUT= data set. The default is R_
. Residual variable names are formed by appending process variable names to the prefix.
If the length of the resulting residual variable exceeds the maximum name length defined by the VALIDVARNAME= system option,
characters are removed from the middle of the process variable name before it is appended to the residual prefix. For example,
if you specify RPREFIX=Residual_, the maximum variable name length is 32, and there is a process variable named PrimaryThermometerReading
, then the corresponding residual variable name is Residual_PrimaryThermeterReading
.

STDSCORES

standardizes the principal component scores in the OUT= data set to unit variance. If you omit the STDSCORES option, the variances of the scores are equal to the corresponding eigenvalues.
STDSCORES has no effect on the eigenvalues themselves.