Standardization of Raw Data |
PROC SCORE automatically standardizes or centers the DATA= variables for you, based on information from the original variables and analysis from the SCORE= data set.
If the SCORE= scoring coefficients data set contains observations with _TYPE_=’MEAN’ and _TYPE_=’STD’, then PROC SCORE standardizes the raw data before scoring. For example, this type of SCORE= data set can come from PROC PRINCOMP without the COV option.
If the SCORE= scoring coefficients data set contains observations with _TYPE_=’MEAN’ but _TYPE_=’STD’ is absent, then PROC SCORE centers the raw data (the means are subtracted) before scoring. For example, this type of SCORE= data set can come from PROC PRINCOMP with the COV option.
If the SCORE= scoring coefficients data set does not contain observations with _TYPE_=’MEAN’ and _TYPE_=’STD’, or if you use the NOSTD option, then PROC SCORE does not center or standardize the raw data.
If the SCORE= scoring coefficients are obtained from observations with _TYPE_=’USCORE’, then PROC SCORE "standardizes" the raw data by using the uncorrected standard deviations identified by _TYPE_=’USTD’, and the means are not subtracted from the raw data. For example, this type of SCORE= data set can come from PROC PRINCOMP with the NOINT option. For more information about _TYPE_=’USCORE’ scoring coefficients in TYPE=UCORR or TYPE=UCOV output data sets, see Appendix A, Special SAS Data Sets.
You can use PROC SCORE to score the data that were also used to generate the scoring coefficients, although more typically, scoring results are directly obtained from the OUT= data set in a procedure that computes scoring coefficients. When scoring new data, it is important to realize that PROC SCORE assumes that the new data have approximately the same scales as the original data. For example, if you specify the COV option with PROC PRINCOMP for the original analysis, the scoring coefficients in the PROC PRINCOMP OUTSTAT= data set are not appropriate for standardized data. With the COV option, PROC PRINCOMP will not output _TYPE_=’STD’ observations to the OUTSTAT= data set, and PROC SCORE will only subtract the means of the original (not new) variables from the new variables before multiplying. Without the COV option in PROC PRINCOMP, both the original variable means and standard deviations will be in the OUTSTAT= data set, and PROC SCORE will subtract the original variable means from the new variables and divide them by the original variable standard deviations before multiplying.
In general, procedures that output scoring coefficients in their OUTSTAT= data sets provide the necessary information for PROC SCORE to determine the appropriate standardization. However, if you use PROC SCORE with a scoring coefficients data set that you constructed without _TYPE_=’MEAN’ and _TYPE_=’STD’ observations, you might have to do the relevant centering or standardization of the new data first. If you do this, you must use the means and standard deviations of the original variables—that is, the variables that were used to generate the coefficients—not the means and standard deviations of the variables to be scored.
See the section Getting Started: SCORE Procedure for further illustration.