Input Data Sets :: SAS/QC(R) 12.1 User's Guide

Input Data Sets

DATA= Data Set
HISTORY= Data Set
LOADINGS= Data Set
TABLE= Data Set

The MVPMONITOR procedure accepts a single primary input data set of one of three types.

A DATA= data set contains new process data to be analyzed by using an existing PCA model (Phase II analysis).
A HISTORY= data set contains process data and the accompanying scores, residuals, and statistics produced by applying a PCA model. The process data can be the original data that was used to create the model (Phase I analysis) or subsequent data that was analyzed by using a previously created model (Phase II analysis).
A TABLE= data set contains a summary of an SPE or control chart, consisting of the statistics, control limits, and other information.

These options are mutually exclusive. If you do not specify an option identifying a primary input data set, PROC MVPMONITOR uses the most recently created SAS data set as a DATA= data set.

When you specify a DATA= data set, you must also specify a LOADINGS= data set that contains loadings and other information describing the PCA model. When you specify a HISTORY= data set, you must also specify a LOADINGS= data set if you specify the CONTRIBUTIONS option in a TSQUARECHART statement.

DATA= Data Set

A DATA= data set provides the process measurement data for a Phase II analysis. In addition to the process variables, a DATA= data set can include the following:

BY variables
ID variables
a SERIES variable
a TIME variable

When you specify a DATA= data set, you must also specify a LOADINGS= data set that contains the loadings for the principal component model that describes the variation of the process. These loadings are used to score the new data from the DATA= data set. The process variables in the LOADINGS= data set must have the same names as those in the DATA= data set.

HISTORY= Data Set

A HISTORY= data set provides the input data set for a Phase I or Phase II analysis. In addition to the original process variables, it contains principal component scores, residuals, SPE and statistics, and a count of the observations that are used to construct the principal component model, as summarized in Table 13.4.

Table 13.4: Variables in the HISTORY= Data Set

Variable	Description
Prin1–Prinj	Principal component scores
R_–R_	Residuals
_NOBS_	Number of observations used to build the principal component model
_SPE_	Squared prediction error (SPE)
_TSQUARE_	statistic computed from principal component scores

A HISTORY= data set must include variables that contain principal component scores. The score variables names must consist of a common prefix followed by the numbers 1, 2, …, j, where j is the number of principal components. By default, the common prefix is Prin. You can use the PREFIX= option to specify another prefix for score variables.

If the number of principal components is less than the total number of process variables, the HISTORY= data set should also contain residual variables. A residual variable name consists of a common prefix followed by the corresponding process variable name. The default residual variable prefix is R_. For example, if the process variables are A, B, and C, the default residual variable names are R_A, R_B, and R_C. You can use the RPREFIX= option to specify a different residual variable prefix.

Note: Usually you create a HISTORY= data set by specifying the PROC MVPMODEL OUT= option or the PROC MVPMONITOR OUTHISTORY= option. If the PREFIX= or RPREFIX= option is used when such an output data set is created, you must specify the same prefixes to identify the score and residual variables when you read it as a HISTORY= data set.

LOADINGS= Data Set

The LOADINGS= data set contains the following information about the principal component model:

eigenvalues of the correlation or covariance matrix used to construct the model
principal component loadings
process variable means used to center the variable values
process variable standard deviations used to scale the variable values

You can produce a LOADINGS= data set by using the PROC MVPMODEL OUTLOADINGS= option. Table 13.5 lists the variables that are required in a LOADINGS= data set.

Table 13.5: Variables in the LOADINGS= Data Set

Variable	Description
_VALUE_	The value contained in process variables for a given observation
_NOBS_	Number of observations used to build the principal component model
_PC_	Principal component number; 0 for the observation that contains eigenvalues
process variables	Values associated with the process variables

Valid values for the _VALUE_ variable are as follows:

EIGEN: eigenvalues from the principal component analysis
LOADING: principal component loadings
MEAN: process variable means
STD: process variable standard deviations

The LOADINGS= data set contains one EIGEN observation and j LOADING observations, where j is the number of principal components in the model. The presence of a MEAN observation indicates that the process variables were centered when the principal component model was constructed, and the presence of a STD observation indicates that the process variables were scaled when the principal component model was constructed. The means and standard deviations are used to center and scale new data in a Phase II analysis.

TABLE= Data Set

A TABLE= data set contains a summary of an SPE or control chart. Usually, you create a TABLE= data set by specifying the OUTTABLE= option in an SPECHART or TSQUARECHART statement. A TABLE= data set contains the variables listed in Table 13.6.

Table 13.6: Variables in an OUTTABLE= Data Set

Variable	Description
`_ALPHA_`	Probability ( $\alpha$ ) of exceeding control limits
`_EXLIM_`	Flag to indicate control limit was exceeded
`_LCL_`	Lower control limit
`_MEDIAN_`	Center line
`_SPE_`	Squared prediction error (SPE) statistic (SPECHART statement only)
time	Optional TIME variable
`_TSQUARE_`	statistic (TSQUARECHART statement only)
`_UCL_`	Upper control limit

A TABLE= data set must contain either an _SPE_ or _TSQUARE_ variable but not both. When you use a TABLE= input data set, you can specify only chart statements that correspond to the statistic in the data set.

You can use a TABLE= data set to display a previously created control chart or to specify custom control limits by computing your own _LCL_ and _UCL_ values.

The MVPMONITOR Procedure

Input Data Sets

DATA= Data Set

HISTORY= Data Set

LOADINGS= Data Set

TABLE= Data Set