33307 - Scoring (computing predicted values) for new observations or a validation data set

Usage Note 33307: Scoring (computing predicted values) for new observations or a validation data set

Contents: Scoring methods and examples

1. Use the STORE Statement and PROC PLM
2. Use Built-In Scoring Capabilities — PROC SCORE, SCORE and CODE statements: PROC SCORE; SCORE Statement; CODE Statement
3. Augment the Training Data Set: Example 1: Logistic Model Validation Using PROC GENMOD
4. Use the Saved Parameter Estimates to Score Generalized Linear Models: Example 2: A Poisson Model with Offset; Example 3: A Probit Model; Example 4: Scoring a model containing spline effects

Four ways to score (compute predicted values for) new observations using a previously fitted model are discussed below. Note that several conditions can make it impossible to score a new observation, resulting in a missing predicted value. These conditions are described in this note.

1. Use the STORE Statement and PROC PLM

Beginning with SAS/STAT^® 9.22 in SAS 9.2 TS2M3, many procedures provide a STORE statement to save the fitted model. You can then use the SCORE statement in PROC PLM to score a data set using the saved model. This is illustrated in the example titled "Scoring with PROC PLM" in the Examples section of the PLM documentation and at the end of Example 1 below. For more on the STORE statement, see "STORE statement" in the Shared Concepts and Topics chapter of the SAS/STAT User's Guide.

2. Use Built-In Scoring Capabilities — PROC SCORE, SCORE and CODE statements

Some procedures include features that make scoring new observations easier:

PROC SCORE

For ordinary regression models fit using PROC REG, you can use PROC SCORE to compute predicted values for new observations. See the example titled "Regression Parameter Estimates" in the SCORE documentation. It is not necessary to refit the model. However, PROC SCORE does not directly provide scoring for other types of models such as logistic or other generalized linear models. It also does not provide standard error estimates or confidence limits.

SCORE Statement

For a logistic or probit model, the scoring process is greatly simplified in PROC LOGISTIC. Its SCORE statement enables you to score a data set of new observations. The FITSTAT and OUTROC= options in the SCORE statement enable you to evaluate the model applied to the new data set. The FITSTAT option provides fit statistics such as the area under the ROC curve (AUC) and R-square (beginning in SAS 9.3). The OUTROC= option produces a data set for plotting the ROC curve. An ROC plot and analysis for validation data can be obtained as described in this note. As with ordinary regression models, refitting the model is not necessary if the model is saved using the OUTMODEL= option and then retrieved during scoring by the INMODEL= option. The example titled "Scoring Data Sets with the SCORE Statement" in the LOGISTIC documentation illustrates the use of the SCORE statement with a nominal logistic model.

A SCORE statement is also available in several other modeling procedures such as GLMSELECT, GAM, LOESS, TPSPLINE, and ADAPTIVEREG. See the procedure documentation for discussion and examples.

PROC DISCRIM provides a TESTDATA= option that enables you to specify a data set to be scored, and a TESTOUT= option that includes posterior probabilities and predicted classifications. See the example titled "Linear Discriminant Analysis of Remote-Sensing Data on Crops" in the DISCRIM documentation.

CODE Statement

Beginning with SAS/STAT 12.1 in SAS 9.3 TS1M2, the CODE statement is available in several modeling procedures. The CODE statement generates SAS code that can be used in a DATA step to score a data set. See "CODE statement" in the Shared Concepts and Topics chapter of the SAS/STAT User's Guide for an example of using the CODE statement.

3. Augment the Training Data Set

You can get predicted values for one or more settings of your model predictors by adding observations to the input data that you use to fit (train) the model. The predictors in these new observations should be set to the values for which you want predicted values. For the added observations, either the response variable should be set to missing, or if the new observations have observed values then a WEIGHT variable should be created with value 1 for the training observations and value 0 (or missing) for the new observations.

With these new observations appended to your training data set, the fitted model should be identical to the model fit using only the training data. This is because any observation that has a missing response value or zero (or missing) weight is ignored when fitting the model. (The exception to this is when the model includes spline effects defined in the EFFECT statement. See the Extrapolation section of this note for details.) The procedure can compute predicted values for such observations as long as they have nonmissing values for all of the model predictors and have values for CLASS predictors that existed in the training data set. This is further explained and illustrated in this note. In many procedures, you can request predicted values by specifying the P= option in the OUTPUT statement, but some procedures use other syntax. See the procedure's documentation.

Example 1: Logistic Model Validation Using PROC GENMOD

Model validation often involves getting predictions for a potentially large number of observations that were held out from the original data. That is, the original data set is split into a data set to train the model and a data set to validate the model. Validation is done by comparing the values predicted under the model to the observed values in the validation data set (often called a hold-out data set). One way this can be done is by concatenating the training and validation data sets and using the combined data set as the input data set to the modeling procedure. It is often convenient for the output data set to contain only the validation observations, excluding the observations used to train the model. To do this, add a variable to the combined data set that indicates which observations are the training data set and which observations are the validation data set. You can use this indicator variable in a WHERE= data set option in the OUTPUT statement to select only the validation observations for output.

The following DATA step creates a SAS data set named REMISS that contains the training data for a logistic model to be fit by PROC GENMOD.

      data remiss;
        input remiss cell smear infil li blast temp;
        datalines;
        1 .8 .83 .66 1.9 1.1 .996
        1 .9 .36 .32 1.4 .74 .992
        0 .8 .88 .7 .8 .176 .982
        0 1 .87 .87 .7 1.053 .986
        1 .9 .75 .68 1.3 .519 .98
        0 1 .65 .65 .6 .519 .982
        1 .95 .97 .92 1 1.23 .992
        0 .95 .87 .83 1.9 1.354 1.02
        0 1 .45 .45 .8 .322 .999
        0 .95 .36 .34 .5 0 1.038
        0 .85 .39 .33 .7 .279 .988
        0 .7 .76 .53 1.2 .146 .982
        0 .8 .46 .37 .4 .38 1.006
        0 .2 .39 .08 .8 .114 .99
        0 1 .9 .9 1.1 1.037 .99
        1 1 .84 .84 1.9 2.064 1.02
        0 .65 .42 .27 .5 .114 1.014
        0 1 .75 .75 1 1.322 1.004
        0 .5 .44 .22 .6 .114 .99
        1 1 .63 .63 1.1 1.072 .986
        0 1 .33 .33 .4 .176 1.01
        0 .9 .93 .84 .6 1.591 1.02
        1 1 .58 .58 1 .531 1.002
        0 .95 .32 .3 1.6 .886 .988
        1 1 .6 .6 1.7 .964 .99
        1 1 .69 .69 .9 .398 .986
        0 1 .73 .73 .7 .398 .986
        ;

This DATA step creates a validation data set, NEW. For purposes of illustration, the first eight observations of the training data set are used.

      data new;
        input remiss cell smear infil li blast temp;
        cards;
        1 .8 .83 .66 1.9 1.1 .996
        1 .9 .36 .32 1.4 .74 .992
        0 .8 .88 .7 .8 .176 .982
        0 1 .87 .87 .7 1.053 .986
        1 .9 .75 .68 1.3 .519 .98
        0 1 .65 .65 .6 .519 .982
        1 .95 .97 .92 1 1.23 .992
        0 .95 .87 .83 1.9 1.354 1.02
        ;

The following DATA step concatenates the training and validation data sets into a single data set, BOTH, for input to PROC GENMOD. The IN= option in the SET statement creates a temporary variable, InNew, which equals 1 when the observation comes from the validation data set (NEW) and equals 0 when it comes from the training data set (REMISS). The inverse of this variable, W, is created for use as a weight variable. W equals 1 for the training observations, 0 for the validation observations.

      data both;
        set remiss new (in=InNew);
        w=not(InNew);
        run;
      proc print noobs; 
        var remiss w smear blast;
        run;

The combined data set is shown below.

remiss	w	smear	blast
1	1	0.83	1.100
1	1	0.36	0.740
0	1	0.88	0.176
0	1	0.87	1.053
1	1	0.75	0.519
0	1	0.65	0.519
1	1	0.97	1.230
0	1	0.87	1.354
0	1	0.45	0.322
0	1	0.36	0.000
0	1	0.39	0.279
0	1	0.76	0.146
0	1	0.46	0.380
0	1	0.39	0.114
0	1	0.90	1.037
1	1	0.84	2.064
0	1	0.42	0.114
0	1	0.75	1.322
0	1	0.44	0.114
1	1	0.63	1.072
0	1	0.33	0.176
0	1	0.93	1.591
1	1	0.58	0.531
0	1	0.32	0.886
1	1	0.60	0.964
1	1	0.69	0.398
0	1	0.73	0.398
1	0	0.83	1.100
1	0	0.36	0.740
0	0	0.88	0.176
0	0	0.87	1.053
1	0	0.75	0.519
0	0	0.65	0.519
1	0	0.97	1.230
0	0	0.87	1.354

These statements fit the model using the combined data set, BOTH. The training indicator variable, W, is used in the WEIGHT statement. The results are identical to a GENMOD analysis on just the training data set because observations in the validation data set have zero weight and are ignored in the model fitting process. The OUTPUT statement produces a data set, PREDS, of predicted values. The WHERE clause after the OUT= data set name causes only those observations from the validation data set to be written to the data set. The L= and U= options request that 95% confidence limits be computed and output in addition to the predicted values requested by the P= option.

      proc genmod data=both descending;
        weight w;
        model remiss = smear blast / dist=binomial;
        output out=preds(where=(w=0)) p=pred l=lower u=upper;
        run;
      proc print data=preds noobs; 
        var remiss pred lower upper smear blast;
        run;

remiss	pred	lower	upper	smear	blast
1	0.45371	0.21252	0.71879	0.83	1.100
1	0.36042	0.09194	0.75824	0.36	0.740
0	0.15204	0.01607	0.66317	0.88	0.176
0	0.43053	0.18092	0.72127	0.87	1.053
1	0.24938	0.08427	0.54533	0.75	0.519
0	0.25747	0.11118	0.49010	0.65	0.519
1	0.49178	0.17110	0.81937	0.97	1.230
0	0.55299	0.24316	0.82649	0.87	1.354

For each data set that you want to score, you would need to use this same process that involves refitting the model to the training data set. This can be avoided by using the STORE statement in PROC GENMOD and the SCORE statement in PROC PLM. The following GENMOD step fits the model and the STORE statement saves the model. To score each new data set, only a PLM step is required. Two data sets (NEW and NEW2) are scored in the following example. The ILINK option in the SCORE statement uses the inverse of the link function (logit, in this case) to obtain estimates on the mean (probability) scale.

      proc genmod data=remiss descending;
        model remiss = smear blast / dist=binomial;
        store out=logmod;
        run;
      proc plm source=logmod;
        score data=new out=preds pred=pred lclm=lower uclm=upper / ilink;
        run;
      proc plm source=logmod;
        score data=new2 out=preds pred=pred lclm=lower uclm=upper / ilink;
        run;

4. Use the Saved Parameter Estimates to Score Generalized Linear Models

Some important issues must be remembered in order to correctly and accurately compute predicted values:

Use parameter estimate values recorded to high precision to avoid round-off error.
When CLASS variables are in the model, you must use the same coding of design variables as was used to represent the CLASS predictors when the model was trained. The coding of design variables when training the model is controlled by the PARAM= option in the CLASS statement.
Use the proper inverse link function to produce the estimated mean from xβ. For example, exponentiate (EXP function) for a log-linked model such as a Poisson model; use the PROBNORM function for a probit model; use the LOGISTIC function for a logistic model.
Predictors defined in the EFFECT statement are also represented in the model by a set of design variables. As with CLASS predictors, values on these predictors must be properly translated into the set of design variables in the same way as the EFFECT statement.

Note that if the value of a CLASS variable in an observation to be scored does not appear in the training data set, then that observation cannot be scored. This is because, unlike a continuous predictor, there is no parameter corresponding to that value in the trained model as explained in this note and in Example 3 below.

Predicted values can be obtained by this method, but the computations for the standard errors of the predicted values are generally more complex and cannot be computed. As a result, confidence limits for the predicted values also cannot be computed.

Example 2: A Poisson Model with Offset

The following Poisson model is based on the data in the "Getting Started" section of the GENMOD documentation. Note that the model includes a continuous variable (age), a CLASS variable (car), and their interaction. The CLASS variable uses GENMOD's default coding method (PARAM=GLM). The model also includes an offset variable (ln). The following statements create the training data set and fit the desired model. The XVARS and P options in the MODEL statement display the predictor values and the predicted counts (Pred) for the observations in the training data set, shown below. In this example, the specified model happens to be a saturated model, so the predicted values equal the actual values. But this has no influence on the manner of scoring.

   data insure;
     input n c car $ age;
     ln = log(n);
     datalines;
     500   42  small  1
     1200  37  medium 1
     100    1  large  1
     400  101  small  2
     500   73  medium 2
     300   14  large  2
     ;
   proc genmod data=insure;
     class car;
     model c = car age car*age/ dist=poisson link=log offset=ln xvars p;
     ods output parameterestimates=pe;
     run;

Observation Statistics
Observation	c	ln	age	car	Pred	Xbeta	Std	HessWgt
1	42	6.2146081	1	small	42	3.7376696	0.1543033	42
2	37	7.0900768	1	medium	37	3.6109179	0.164399	37
3	1	4.6051702	1	large	1	-1.07E-14	1	1
4	101	5.9914645	2	small	101	4.6151205	0.0995037	101
5	73	6.2146081	2	medium	73	4.2904594	0.1170411	73
6	14	5.7037825	2	large	14	2.6390573	0.2672612	14

Notice that the parameter estimates table was saved via the ODS OUTPUT statement. The variable containing the parameter estimates (Estimate) is displayed to high precision by using a FORMAT statement in the following PROC PRINT step. The 12.10 format displays the estimates in a field 12 digits wide and with 10 decimal places. These more precise values are used in the scoring computations below to more closely match what GENMOD does internally with full precision values.

   proc print data=pe; 
     format estimate 12.10; 
     var parameter level: estimate; 
     run;

Obs	Parameter	Level1	Estimate
1	Intercept		-3.577532930
2	car	large	-2.568082297
3	car	medium	-1.456636259
4	car	small	0.0000000000
5	age		1.1005944499
6	age*car	large	0.4398505911
7	age*car	medium	0.4544158160
8	age*car	small	0.0000000000
9	Scale		1.0000000000

The following step does the scoring. In this example, the training data set is scored, so it is specified in the SET statement. A SELECT group should appear for each predictor in the CLASS statement to create the appropriately coded design variables. Since the PARAM= option was not specified in the CLASS statement, the default GLM coding is used. If a different coding method is requested via the PARAM= option in the CLASS statement, the coding of the design variables (named carlarge, carmedium, and carsmall in this example) would change. This is discussed further below. The parameter estimates from the preceding PROC PRINT step are used in the computation of the linear predictor, x'β. By definition, the parameter associated with an offset variable equals 1. x'β is computed by multiplying parameter estimates by predictor (or design) variables and adding the products. Finally, the inverse link function is applied to get a predicted mean. Since this poisson model uses the log link, the inverse link function is exponentiation that can be done with the EXP function in SAS. For ordinary regression models, such as those fit by PROC REG or PROC GLM, the link is the identity link and x'β is the predicted mean.

   data scores;
     set insure;
     select (car);
       when ("large") do;
          carlarge=1; carmedium=0; carsmall=0; end;
       when ("medium") do;
          carlarge=0; carmedium=1; carsmall=0; end;
       when ("small") do;
          carlarge=0; carmedium=0; carsmall=1; end;
       otherwise;
       end;
     xbeta=-3.577532930 +
           -2.568082297*carlarge +
           -1.456636259*carmedium +
           0*carsmall +
           1.1005944499*age +
           0.4398505911*age*carlarge +
           0.4544158160*age*carmedium +
           0*age*carsmall +
           1*ln
           ;
     mu_hat=exp(xbeta);
     run;

Notice that the computed scores (mu_hat) match the predicted values computed by the P option (Pred) in PROC GENMOD.

   proc print noobs;
     var car age c xbeta mu_hat;
     run;

car	age	c	xbeta	mu_hat
small	1	42	3.73767	42.000
medium	1	37	3.61092	37.000
large	1	1	-0.00000	1.000
small	2	101	4.61512	101.000
medium	2	73	4.29046	73.000
large	2	14	2.63906	14.000

Had effects coding (PARAM=EFFECT) been specified, the following SELECT group would properly code the design variables for use in scoring:

     select (car);
       when ("large") do;
          carlarge=1;  carmedium=0;  end;
       when ("medium") do;
          carlarge=0;  carmedium=1;  end;
       when ("small") do;
          carlarge=-1; carmedium=-1; end;
       otherwise;
       end;

For reference coding (PARAM=REF), this SELECT group would be used:

     select (car);
       when ("large") do;
          carlarge=1; carmedium=0; end;
       when ("medium") do;
          carlarge=0; carmedium=1; end;
       when ("small") do;
          carlarge=0; carmedium=0; end;
       otherwise;
       end;

For more on the various types of CLASS variable coding, see "CLASS Variable Parameterization" in the Details section of the LOGISTIC procedure documentation.

Example 3: A Probit Model

The following uses data from the example titled "Logistic Modeling with Categorical Predictors" in the LOGISTIC procedure documentation. PROC GENMOD is used to fit a probit model to the data to model the probability of no pain. Effects coding is used for the categorical predictor Treatment (A, B, or P) and reference coding is used for the Sex (F or M) with males (M) as the reference category.

   data Neuralgia;
     input Treatment $ Sex $ Age Duration Pain $ @@;
     datalines;
   P  F  68   1  No   B  M  74  16  No  P  F  67  30  No
   P  M  66  26  Yes  B  F  67  28  No  B  F  77  16  No
   A  F  71  12  No   B  F  72  50  No  B  F  76   9  Yes
   A  M  71  17  Yes  A  F  63  27  No  A  F  69  18  Yes
   B  F  66  12  No   A  M  62  42  No  P  F  64   1  Yes
   A  F  64  17  No   P  M  74   4  No  A  F  72  25  No
   P  M  70   1  Yes  B  M  66  19  No  B  M  59  29  No
   A  F  64  30  No   A  M  70  28  No  A  M  69   1  No
   B  F  78   1  No   P  M  83   1  Yes B  F  69  42  No
   B  M  75  30  Yes  P  M  77  29  Yes P  F  79  20  Yes
   A  M  70  12  No   A  F  69  12  No  B  F  65  14  No
   B  M  70   1  No   B  M  67  23  No  A  M  76  25  Yes
   P  M  78  12  Yes  B  M  77   1  Yes B  F  69  24  No
   P  M  66   4  Yes  P  F  65  29  No  P  M  60  26  Yes
   A  M  78  15  Yes  B  M  75  21  Yes A  F  67  11  No
   P  F  72  27  No   P  F  70  13  Yes A  M  75   6  Yes
   B  F  65   7  No   P  F  68  27  Yes P  M  68  11  Yes
   P  M  67  17  Yes  B  M  70  22  No  A  M  65  15  No
   P  F  67   1  Yes  A  M  67  10  No  P  F  72  11  Yes
   A  F  74   1  No   B  M  80  21  Yes A  F  69   3  No
   ;

   proc genmod data=Neuralgia;
     class Treatment (param=effect) Sex (param=ref ref="M");
     model Pain = Treatment Sex Treatment*Sex Age Duration / dist=binomial link=probit;
     output out=preds p=PrNoPain;
     ods output parameterestimates=parms;
     run;

Class Level Information
Class	Value	Design Variables
Treatment	A	1	0
	B	0	1
	P	-1	-1
Sex	F	1
	M	0

Analysis Of Parameter Estimates
Parameter			DF	Estimate	Standard Error	Wald 95% Confidence Limits		Chi-Square	Pr > ChiSq
Intercept			1	9.9221	3.7669	2.5392	17.3051	6.94	0.0084
Treatment	A		1	0.5139	0.3870	-0.2445	1.2724	1.76	0.1841
Treatment	B		1	0.7200	0.4125	-0.0885	1.5285	3.05	0.0809
Sex	F		1	0.9404	0.4204	0.1165	1.7644	5.00	0.0253
Treatment*Sex	A	F	1	-0.1622	0.5987	-1.3355	1.0112	0.07	0.7865
Treatment*Sex	B	F	1	0.1580	0.6185	-1.0542	1.3702	0.07	0.7984
Age			1	-0.1440	0.0521	-0.2462	-0.0418	7.62	0.0058
Duration			1	0.0006	0.0191	-0.0367	0.0380	0.00	0.9733
Scale			0	1.0000	0.0000	1.0000	1.0000

Below are the scores (predicted probabilities of no pain) for the first six observations, which the scoring step below reproduces.

   proc print data=preds(obs=6) noobs;
     run;

Treatment	Sex	Age	Duration	Pain	PrNoPain
P	F	68	1	No	0.43771
B	M	74	16	No	0.49933
P	F	67	30	No	0.50227
P	M	66	26	Yes	0.21259
B	F	67	28	No	0.98267
B	F	77	16	No	0.74692

These statements display the parameter estimates of the probit model with more precision for use in scoring.

   proc print data=parms noobs;
     format estimate 12.10; 
     var parameter level: estimate; 
     run;

Parameter	Level1	Level2	Estimate
Intercept			9.9221347360
Treatment	A		0.5139396491
Treatment	B		0.7200040408
Sex	F		0.9404279566
Treatment*Sex	A	F	-.1621550070
Treatment*Sex	B	F	0.1580056737
Age			-.1439733988
Duration			0.0006380643
Scale			1.0000000000

To illustrate scoring, the first six observations of the training data set are used as a validation data set. The scores for these observations should equal the predicted values computed by the GENMOD procedure above. Two additional observations are included — one with an invalid Treatment code (X) and one with a missing value for Sex. The first observation cannot be scored because there is no parameter for Treatment X in the model. In order to score this point, the training data would need to include some subjects who were given Treatment X. The second observation cannot be scored since values for all predictors in the model must be nonmissing in order to make a valid computation. See this usage note for more discussion.

   data valid;
     input Treatment $ Sex $ Age Duration Pain $ @@;
     datalines;
   P  F  68   1  No   B  M  74  16  No  P  F  67  30  No
   P  M  66  26  Yes  B  F  67  28  No  B  F  77  16  No
   X  F  50  10  .    B  .  32  15  .
   ;
        
   proc print noobs;
     run;

Treatment	Sex	Age	Duration	Pain
P	F	68	1	No
B	M	74	16	No
P	F	67	30	No
P	M	66	26	Yes
B	F	67	28	No
B	F	77	16	No
X	F	50	10
B		32	15

In the scoring step below, a SELECT group is included for each of the two categorical predictors, Treatment and Sex, using coding that matches the coding used when training the model — effects coding for Treatment and reference coding for Sex. The "Class Level Information" table (above) produced by PROC GENMOD shows you how the design variables are coded. x'β is computed using the high precision parameter estimates displayed above. Since the inverse of the probit link function is the probability from the standard normal distribution, you can use the PROBNORM function in SAS. Had the logit link been used to produce a logistic model, you would use the inverse logit function, 1/(1+exp(-x'β)), which can also be computed using the LOGISTIC function: logistic(xbeta).

   data scores;
     set valid;
     select (Treatment);
       when ("A") do;
          TrtA=1; TrtB=0; end;
       when ("B") do;
          TrtA=0; TrtB=1; end;
       when ("P") do;
          TrtA=-1; TrtB=-1; end;
       otherwise;
       end;
     select (Sex);
       when ("F") SexF=1;
       when ("M") SexF=0;
       otherwise;
       end;
     xbeta=9.9221347360 +
           0.5139396491*TrtA +
           0.7200040408*TrtB +
           0.9404279566*SexF +
           -.1621550070*TrtA*SexF +
           0.1580056737*TrtB*SexF +
           -.1439733988*age +
           0.0006380643*duration
           ;
     PrNoPain=probnorm(xbeta);
     run;

Notice that the predicted probabilities for the first six observations match those computed by PROC GENMOD above, and the predicted probabilities for the last two observations are missing as expected.

   proc print noobs;
     var treatment sex age duration pain xbeta PrNoPain;
     run;

Treatment	Sex	Age	Duration	Pain	xbeta	PrNoPain
P	F	68	1	No	-0.15678	0.43771
B	M	74	16	No	-0.00168	0.49933
P	F	67	30	No	0.00569	0.50227
P	M	66	26	Yes	-0.79746	0.21259
B	F	67	28	No	2.11222	0.98267
B	F	77	16	No	0.66483	0.74692
X	F	50	10		.	.
B		32	15		.	.

Note that PROC LOGISTIC can fit a probit model and also provides effects and reference coding. Since it has built-in scoring capability via its SCORE statement, you can fit the model and score the validation data all in a single step. Any slight differences are due to minor differences in starting values and iteration methods used by GENMOD and LOGISTIC.

   proc logistic data=Neuralgia;
     class Treatment (param=effect) Sex (param=ref ref="M");
     model Pain = Treatment Sex Treatment*Sex Age Duration / link=probit;
     score data=valid out=validscore;
     run;
     
   proc print data=validscore noobs;
     var treatment sex age duration pain P_No;
     run;

Treatment	Sex	Age	Duration	Pain	P_No
P	F	68	1	No	0.43771
B	M	74	16	No	0.49933
P	F	67	30	No	0.50226
P	M	66	26	Yes	0.21262
B	F	67	28	No	0.98266
B	F	77	16	No	0.74693
X	F	50	10		.
B		32	15		.

Example 4: Scoring a model containing spline effects

See this example that discusses the types of spline transformations available in the EFFECT statement and illustrates reproducing the spline basis functions and scoring data.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Regression Analytics ==> Longitudinal Analysis Analytics ==> Mixed Models SAS Reference ==> Procedures ==> COUNTREG SAS Reference ==> Procedures ==> GAM SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LIFEREG SAS Reference ==> Procedures ==> LOESS SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> PROBIT SAS Reference ==> Procedures ==> REG SAS Reference ==> Procedures ==> TPSPLINE SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> HPMIXED SAS Reference ==> Procedures ==> MIXED SAS Reference ==> Procedures ==> NLIN SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> PLS SAS Reference ==> Procedures ==> RSREG Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> GLMSELECT SAS Reference ==> Procedures ==> SURVEYLOGISTIC SAS Reference ==> Procedures ==> SURVEYREG SAS Reference ==> Procedures ==> DISCRIM Analytics ==> Survival Analysis SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> SURVEYPHREG SAS Reference ==> Procedures ==> ORTHOREG SAS Reference ==> Procedures ==> PLM SAS Reference ==> Procedures ==> HPLOGISTIC SAS Reference ==> Procedures ==> HPREG SAS Reference ==> Procedures ==> HPGENSELECT SAS Reference ==> Procedures ==> HPPRINCOMP SAS Reference ==> Procedures ==> HPSPLIT

Date Modified:	2016-08-26 10:11:38
Date Created:	2008-09-15 16:21:59

Support