SAS® modeling procedures provide several ways for scoring new observations using a fitted model as described in this note. When scoring new data, the predicted value for an observation will be missingNote1 if any of the following conditions occurs:
Example
To illustrate, consider a study of the analgesic effects of treatments on elderly patients with neuralgia. Two test treatments and a placebo are compared. The presence or absence of pain is recorded. The probability of pain is to be modeled using logistic regression.
Researchers recorded age and gender of the patients and the duration of complaint before the treatment began. The training data consisting of 60 patients are contained in the data set Neuralgia. The binary variable Pain is the response variable. A specification of Pain=Yes indicates that pain was present, and Pain=No indicates no pain. The variable Treatment is a categorical variable with three levels: A and B represent the two test treatments, and P represents the placebo treatment. The variable Age is the age of the patients, in years, when treatment began.
/* Training data set */ Data Neuralgia; input Treatment $ Sex $ Age Duration Pain $ @@; datalines; P F 68 1 No B M 74 16 No P F 67 30 No P M 66 26 Yes B F 67 28 No B F 77 16 No A F 71 12 No B F 72 50 No B F 76 9 Yes A M 71 17 Yes A F 63 27 No A F 69 18 Yes B F 66 12 No A M 62 42 No P F 64 1 Yes A F 64 17 No P M 74 4 No A F 72 25 No P M 70 1 Yes B M 66 19 No B M 59 29 No A F 64 30 No A M 70 28 No A M 69 1 No B F 78 1 No P M 83 1 Yes B F 69 42 No B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes A M 70 12 No A F 69 12 No B F 65 14 No B M 70 1 No B M 67 23 No A M 76 25 Yes P M 78 12 Yes B M 77 1 Yes B F 69 24 No P M 66 4 Yes P F 65 29 No P M 60 26 Yes A M 78 15 Yes B M 75 21 Yes A F 67 11 No P F 72 27 No P F 70 13 Yes A M 75 6 Yes B F 65 7 No P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No A M 65 15 No P F 67 1 Yes A M 67 10 No P F 72 11 Yes A F 74 1 No B M 80 21 Yes A F 69 3 No ;
The following validation data set will be used in the SCORE statement in PROC LOGISTIC to obtain predicted probabilities for the specified combinations of Treatment and Age. Notice that the first three observations use Treatments A, B, and P all of which appear in the training data set and nonmissing values of Age. However, the fourth observation contains a missing value (.) for Age in Treatment A. In the fifth observation, a nonmissing value of Age is specified, but the specified treatment, Z, is not one that appeared in the training data set.
/* Validation data set */ Data Validate; input Treatment $ Age; datalines; A 65 B 72 P 80 A . Z 68 ;
The following statements train the logistic model using the training data set and then score the validation data set. The EVENT="No" option specifies that the probability of Pain=No is to be modeled.
proc logistic data=Neuralgia; class Treatment; model Pain (event="No") = Treatment Age; score data=Validate out=Preds; run;
Notice that predictions are given for the first three observations in the validation data set, but not for the fourth because of the missing value of Age, and not for the fifth because the Treatment value does not appear in the training data set.
proc print data=Preds; id Treatment Age; run;
To see why this occurs it helps to know what the fitted model is. Following is the table of parameter estimates from the trained model:
From this table, the model can written as follows:
Logit(p) = 18.5356 + 0.7033*TA + 1.2759*TB - 0.2581*Age ,
where Logit(p) is the log odds of Pain=No (log odds = log(Pr(Pain=No)/Pr(Pain=Yes)). TA and TB are design variables representing the CLASS predictor, Treatment, and are coded as shown in the "Class Level Information" table below. The first Design Variable column is TA, the second column is TB.
Using the model, the first observation in the Validate data set can be scored as follows. From the "Class Level Information" table, Treatment=A is represented in the model by TA=1 and TB=0.
Logit(p) = 18.5356 + 0.7033*1 + 1.2759*0 - 0.2581*65 = 2.4624 ,
The probability of Pain=No can be obtained from the logit by the following transformation:
Pr(Pain=No) = 1 / (1+exp(-logit))
For the first observation, the predicted probability of Pain=No is 1 / (1+exp(-2.4624)) = 0.9215 and therefore the predicted probability of Pain=Yes is 1-0.9215 = 0.0785. (The slight difference from the SAS results is due to using rounded values here. The results from PROC LOGISTIC are more precise.)
For observation 2:
Logit(p) = 18.5356 + 0.7033*0 + 1.2759*1 - 0.2581*72 = 1.2283,
Pr(Pain=No) = 0.7735 and Pr(Pain=Yes) = 0.2265 .
For observation 3:
Logit(p) = 18.5356 + 0.7033*-1 + 1.2759*-1 - 0.2581*80 = -4.0916,
Pr(Pain=No) = 0.0164 and Pr(Pain=Yes) = 0.9836.
For the fourth observation:
Logit(p) = 18.5356 + 0.7033*1 + 1.2759*0 - 0.2581*.
Because the value of Age is missing, the model equation is incomplete and the logit and predicted probabilities cannot be computed. Note that simply ignoring the Age term in the model and computing the logit as 18.5356 + 0.7033*1 + 1.2759*0 is not valid because this is equivalent to setting Age=0 which is almost certainly not intended.
For the fifth observation:
Logit(p) = 18.5356 + 0.7033*. + 1.2759*. - 0.2581*68
Because Treatment Z does not appear in the training data set, there are no corresponding values of the design variables, TA and TB, so again the model equation is incomplete and the logit and predicted probabilities cannot be computed. Simply ignoring the two treatment terms and computing the logit as 18.5356 - 0.2581*68 is not valid because this is equivalent to setting TA=TB=0 and this represents no known Treatment. The only valid treatments are coded as shown in the "Class Level Information" table.
__________
NOTE 1: Some previous problems caused predicted values to incorrectly be set to missing.
NOTE 2: GLM, GENMOD, PROBIT, PHREG, LIFEREG, QUANTREG, QUANTSELECT, ROBUSTREG, SURVEYREG, SURVEYPHREG, HPLOGISTIC, HPMIXED, COUNTREG, QLIM, and possibly others.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX | ||||
SAS System | SAS/ETS | Microsoft Windows 2000 Advanced Server | ||
Microsoft Windows 95/98 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
OS/2 | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Z64 | ||||
z/OS | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> ADAPTIVEREG SAS Reference ==> Procedures ==> ANOVA SAS Reference ==> Procedures ==> COUNTREG SAS Reference ==> Procedures ==> DISCRIM SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> GAM SAS Reference ==> Procedures ==> GEE SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> GLMSELECT SAS Reference ==> Procedures ==> HPCOUNTREG SAS Reference ==> Procedures ==> HPGENSELECT SAS Reference ==> Procedures ==> HPLMIXED SAS Reference ==> Procedures ==> HPLOGISTIC SAS Reference ==> Procedures ==> HPMIXED SAS Reference ==> Procedures ==> HPQLIM SAS Reference ==> Procedures ==> HPQUANTSELECT SAS Reference ==> Procedures ==> HPREG SAS Reference ==> Procedures ==> ICPHREG SAS Reference ==> Procedures ==> LIFEREG SAS Reference ==> Procedures ==> LOESS SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> NLIN SAS Reference ==> Procedures ==> ORTHOREG SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> PLS SAS Reference ==> Procedures ==> PROBIT SAS Reference ==> Procedures ==> QLIM SAS Reference ==> Procedures ==> QUANTREG SAS Reference ==> Procedures ==> QUANTSELECT SAS Reference ==> Procedures ==> REG SAS Reference ==> Procedures ==> ROBUSTREG SAS Reference ==> Procedures ==> SURVEYLOGISTIC SAS Reference ==> Procedures ==> SURVEYPHREG SAS Reference ==> Procedures ==> SURVEYREG |
Date Modified: | 2016-03-11 14:29:47 |
Date Created: | 2008-06-02 12:17:42 |