PROC SCORE: Regression Parameter Estimates :: SAS/STAT(R) 9.3 User's Guide

Example 79.2 Regression Parameter Estimates

In this example, PROC REG computes regression parameter estimates for the Fitness data. (See Example 79.1 to for more information about how to create the Fitness data set.) The parameter estimates are output to a data set and used as scoring coefficients. For the first part of this example, PROC SCORE is used to score the Fitness data, which are the same data used in the regression.

In the second part of this example, PROC SCORE is used to score a new data set, Fitness2. For PROC SCORE, the TYPE= specification is PARMS, and the names of the score variables are found in the variable _MODEL_, which gets its values from the model label. The following code produces Output 79.2.1 through Output 79.2.3:

proc reg data=Fitness outest=RegOut;
   OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse;
   title 'Regression Scoring Example';
run;

proc print data=RegOut;
   title2 'OUTEST= Data Set from PROC REG';
run;

proc score data=Fitness score=RegOut out=RScoreP type=parms;
   var Age Weight RunTime RunPulse RestPulse;
run;

proc print data=RScoreP;
   title2 'Predicted Scores for Regression';
run;

proc score data=Fitness score=RegOut out=RScoreR type=parms;
   var Oxygen Age Weight RunTime RunPulse RestPulse;
run;

proc print data=RScoreR;
   title2 'Negative Residual Scores for Regression';
run;

Output 79.2.1 shows the PROC REG output. The column labeled "Parameter Estimates" lists the parameter estimates. These estimates are output to the RegOut data set.

Output 79.2.1 Creating an OUTEST= Data Set with PROC REG

Regression Scoring Example

The REG Procedure

Model: OxyHat

Dependent Variable: Oxygen

Number of Observations Read	12
Number of Observations Used	12

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	509.62201	101.92440	15.80	0.0021
Error	6	38.70060	6.45010
Corrected Total	11	548.32261

Root MSE	2.53970	R-Square	0.9294
Dependent Mean	48.38942	Adj R-Sq	0.8706
Coeff Var	5.24847

Parameter Estimates
Variable	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	1	151.91550	31.04738	4.89	0.0027
Age	1	-0.63045	0.42503	-1.48	0.1885
Weight	1	-0.10586	0.11869	-0.89	0.4068
RunTime	1	-1.75698	0.93844	-1.87	0.1103
RunPulse	1	-0.22891	0.12169	-1.88	0.1090
RestPulse	1	-0.17910	0.13005	-1.38	0.2176

Output 79.2.2 lists the RegOut data set. Note that _TYPE_=’PARMS’ and _MODEL_=’OXYHAT’, which are from the label in the MODEL statement in PROC REG.

Output 79.2.2 OUTEST= Data Set from PROC REG Reproduced with PROC PRINT

Regression Scoring Example

OUTEST= Data Set from PROC REG

Obs	_MODEL_	_TYPE_	_DEPVAR_	_RMSE_	Intercept	Age	Weight	RunTime	RunPulse	RestPulse	Oxygen
1	OxyHat	PARMS	Oxygen	2.53970	151.916	-0.63045	-0.10586	-1.75698	-0.22891	-0.17910	-1

Output 79.2.3 lists the data sets created by PROC SCORE. Since the SCORE= data set does not contain observations with _TYPE_=’MEAN’ or _TYPE_=’STD’, the data in the Fitness data set are not standardized before scoring. The SCORE= data set contains the variable Intercept, so this intercept value is used in computing the score. To produce the RScoreP data set, the VAR statement in PROC SCORE includes only the independent variables from the model in PROC REG. As a result, the OxyHat variable contains predicted values. To produce the RScoreR data set, the VAR statement in PROC SCORE includes both the dependent variables and the independent variables from the model in PROC REG. As a result, the OxyHat variable contains negative residuals (PREDICT $\text{[math]}$ ACTUAL) as shown in Output 79.2.4. If the RESIDUAL option is specified, the variable OxyHat contains positive residuals (ACTUAL $\text{[math]}$ PREDICT). If the PREDICT option is specified, the OxyHat variable contains predicted values.

Output 79.2.3 Predicted Scores from the OUT= Data Set Created by PROC SCORE

Regression Scoring Example

Predicted Scores for Regression

Obs	Age	Weight	Oxygen	RunTime	RestPulse	RunPulse	OxyHat
1	44	89.47	44.609	11.37	62	178	42.8771
2	40	75.07	45.313	10.07	62	185	47.6050
3	44	85.84	54.297	8.65	45	156	56.1211
4	42	68.15	59.571	8.17	40	166	58.7044
5	38	89.02	49.874	9.22	55	178	51.7386
6	47	77.45	44.811	11.63	58	176	42.9756
7	40	75.98	45.681	11.95	70	176	44.8329
8	43	81.19	49.091	10.85	64	162	48.6020
9	44	81.42	39.442	13.08	63	174	41.4613
10	38	81.87	60.055	8.63	48	170	56.6171
11	44	73.03	50.541	10.13	45	168	52.1299
12	45	87.66	37.388	14.03	56	186	37.0080

Output 79.2.4 Residual Scores from the OUT= Data Set Created by PROC SCORE

Regression Scoring Example

Negative Residual Scores for Regression

Obs	Age	Weight	Oxygen	RunTime	RestPulse	RunPulse	OxyHat
1	44	89.47	44.609	11.37	62	178	-1.73195
2	40	75.07	45.313	10.07	62	185	2.29197
3	44	85.84	54.297	8.65	45	156	1.82407
4	42	68.15	59.571	8.17	40	166	-0.86657
5	38	89.02	49.874	9.22	55	178	1.86460
6	47	77.45	44.811	11.63	58	176	-1.83542
7	40	75.98	45.681	11.95	70	176	-0.84811
8	43	81.19	49.091	10.85	64	162	-0.48897
9	44	81.42	39.442	13.08	63	174	2.01935
10	38	81.87	60.055	8.63	48	170	-3.43787
11	44	73.03	50.541	10.13	45	168	1.58892
12	45	87.66	37.388	14.03	56	186	-0.38002

The second part of this example uses the parameter estimates to score a new data set. The following statements produce Output 79.2.5 and Output 79.2.6:

   /* The FITNESS2 data set contains observations 13-16 from */
   /* the FITNESS data set used in EXAMPLE 2 in the PROC REG */
   /* chapter.                                               */
data Fitness2;
   input Age Weight Oxygen RunTime RestPulse RunPulse;
   datalines;
45  66.45  44.754  11.12  51  176
47  79.15  47.273  10.60  47  162
54  83.12  51.855  10.33  50  166
49  81.42  49.156   8.95  44  180
;

proc print data=Fitness2;
   title 'Regression Scoring Example';
   title2 'New Raw Data Set to be Scored';
run;

proc score data=Fitness2 score=RegOut out=NewPred type=parms
           nostd predict;
   var Oxygen Age Weight RunTime RunPulse RestPulse;
run;

proc print data=NewPred;
   title2 'Predicted Scores for Regression';
   title3 'for Additional Data from FITNESS2';
run;

Output 79.2.5 lists the Fitness2 data set.

Output 79.2.5 Listing of the Fitness2 Data Set

Regression Scoring Example

New Raw Data Set to be Scored

Obs	Age	Weight	Oxygen	RunTime	RestPulse	RunPulse
1	45	66.45	44.754	11.12	51	176
2	47	79.15	47.273	10.60	47	162
3	54	83.12	51.855	10.33	50	166
4	49	81.42	49.156	8.95	44	180

PROC SCORE scores the Fitness2 data set by using the parameter estimates in the RegOut data set. These parameter estimates result from fitting a regression equation to the Fitness data set. The NOSTD option is specified, so the raw data are not standardized before scoring. (However, the NOSTD option is not necessary here. The SCORE= data set does not contain observations with _TYPE_=’MEAN’ or _TYPE_=’STD’, so standardization is not performed.) The VAR statement contains the dependent variables and the independent variables used in PROC REG. In addition, the PREDICT option is specified. This combination gives predicted values for the new score variable. The name of the new score variable is OxyHat, from the value of the _MODEL_ variable in the SCORE= data set. Output 79.2.6 shows the data set produced by PROC SCORE.

Output 79.2.6 Predicted Scores from the OUT= Data Set Created by PROC SCORE and Reproduced Using PROC PRINT

Regression Scoring Example

Predicted Scores for Regression

for Additional Data from FITNESS2

Obs	Age	Weight	Oxygen	RunTime	RestPulse	RunPulse	OxyHat
1	45	66.45	44.754	11.12	51	176	47.5507
2	47	79.15	47.273	10.60	47	162	49.7802
3	54	83.12	51.855	10.33	50	166	43.9682
4	49	81.42	49.156	8.95	44	180	47.5949

The SCORE Procedure

Example 79.2 Regression Parameter Estimates