The Receiver Operating Characteristic (ROC) curve is a popular way to summarize the predictive ability of a binary logistic model. You can produce a plot of the ROC curve for the fitted model (and a data set containing the ROC plot data) by specifying the OUTROC= option in the MODEL statement. This ROC curve summarizes the model as applied to the data used to fit the model (the training data). ODS Graphics must be on in order for PROC LOGISTIC to produce the graph. In the same PROC LOGISTIC step, you can obtain the ROC curve for the fitted model as applied to a separate, validation data set. To do this, add a SCORE statement that also includes the OUTROC= option. If the model, trained on one data sample, is used to score multiple independent samples, the same can be done using multiple SCORE statements.
To make visual comparison easier, you might want to produce a single graph that overlays the two ROC curves. This can be done by combining the OUTROC= data sets and then producing the overlaid plot using PROC SGPLOT.
In the discussion below, a plot is produced that allows visual comparison of the ROC curves. If a statistical comparison among such independent ROC curves is desired, tests can be done as illustrated in this note.
The method shown below can also be used to overlay separate ROC curves produced when the BY statement is used as illustrated in this note.
Overlaid plot of ROC curves fit to training and validation data
The following uses the example shown in this note. As shown in that note, these statements fit the model and produce two separate ROC graphs – one for the training data and one for the validation data.
proc logistic data=train; model y(event="1") = entry / outroc=troc; score data=valid out=valpred outroc=vroc; run;
The following statements concatenate the training and validation OUTROC= data sets. The Zero data set contains the (0,0) points for the two curves since these points are omitted in the OUTROC= data sets and these are added as well. A character variable, DATA, is added to identify the blocks of observations that come from the training and the validation data.
data Zero; input data $ _1mspec_ _sensit_; datalines; train 0 0 valid 0 0 ; data Plotdata; set zero troc(in=train) vroc(in=valid); if valid then data="valid"; if train then data="train"; run;
The graph of overlaid ROC curves is produced using the statements below. In an ROC plot, since the sensitivity and 1-specificity axes both range from zero to one, an ROC plot is conventionally square. Beginning in SAS® 9.4 TS1M0, you can use the ASPECT=1 option to produce a square plot.Note The LINEPARM statement produces the diagonal line that represents a model with no predictive ability. The GROUP= option is specified in the SERIES statement to produce separate curves for the training and validation data as identified by the DATA variable produced above.
The code below also illustrates how a few alterations can be made to the appearance of the ROC graph produced by PROC LOGISTIC. The STYLEATTRS statement with the WALLCOLOR= option allows the plot area background to be colored as desired. A light shade of gray is used. Similarly, the color and pattern of the diagonal line are specified using the LINEATTRS= option in the LINEPARM statement. The line is also made semi-transparent using the TRANSPARENCY= option. And by omitting the GRID option in the YAXIS and XAXIS statements, the usual set of light grid lines is removed. Other aspects of the axes could be altered using options in the axis statements. If the legend identifying the curves is not desired, add the NOAUTOLEGEND option in the PROC SGPLOT statement. The INSET statement writes the AUC (area under the ROC curve) values for the training and validation data inside the plot area. Finally, a custom title is specified in the TITLE statement.
proc sgplot data=Plotdata aspect=1; styleattrs wallcolor=grayEE; xaxis values=(0 to 1 by 0.25) offsetmin=.05 offsetmax=.05; yaxis values=(0 to 1 by 0.25) offsetmin=.05 offsetmax=.05; lineparm x=0 y=0 slope=1 / transparency=.5 lineattrs=(color=black pattern=longdash); series x=_1mspec_ y=_sensit_ / group=data; inset ("Training AUC" = "0.7193" "Validation AUC" = "0.6350") / border position=bottomright; title "ROC curves for training and validation data"; run;
Overlaid plot of ROC curves fit to multiple, independent samples
This same method can be used to overlay the ROC curves from multiple data sets scored by the same model. Add a SCORE statement in the PROC LOGISTIC step for each data set. Combine the data sets using a DATA step similar to the one above, adding a variable that identifies each block of ROC data. PROC SGPLOT can then be used to produce the plot of overlaid ROC curves.
In the DATA step below, a separate data set is created for each Block in the input data. Additionally, a categorized version of the ENTRY variable is created and named ECAT.
data Block1 Block2 Block3 Block4; label Y = 'No. of damaged plants' n = 'No. of plants'; input block entry lat lng n Y @@; ecat=1; if 6<entry<=10 then ecat=2; if 11<entry<=16 then ecat=3; if block=1 then output Block1; else if block=2 then output Block2; else if block=3 then output Block3; else if block=4 then output Block4; datalines; 1 14 1 1 8 2 1 16 1 2 9 1 1 7 1 3 13 9 1 6 1 4 9 9 1 13 2 1 9 2 1 15 2 2 14 7 1 8 2 3 8 6 1 5 2 4 11 8 1 11 3 1 12 7 1 12 3 2 11 8 1 2 3 3 10 8 1 3 3 4 12 5 1 10 4 1 9 7 1 9 4 2 15 8 1 4 4 3 19 6 1 1 4 4 8 7 2 15 5 1 15 6 2 3 5 2 11 9 2 10 5 3 12 5 2 2 5 4 9 9 2 11 6 1 20 10 2 7 6 2 10 8 2 14 6 3 12 4 2 6 6 4 10 7 2 5 7 1 8 8 2 13 7 2 6 0 2 12 7 3 9 2 2 16 7 4 9 0 2 9 8 1 14 9 2 1 8 2 13 12 2 8 8 3 12 3 2 4 8 4 14 7 3 7 1 5 7 7 3 13 1 6 7 0 3 8 1 7 13 3 3 14 1 8 9 0 3 4 2 5 15 11 3 10 2 6 9 7 3 3 2 7 15 11 3 9 2 8 13 5 3 6 3 5 16 9 3 1 3 6 8 8 3 15 3 7 7 0 3 12 3 8 12 8 3 11 4 5 8 1 3 16 4 6 15 1 3 5 4 7 12 7 3 2 4 8 16 12 4 9 5 5 15 8 4 4 5 6 10 6 4 12 5 7 13 5 4 1 5 8 15 9 4 15 6 5 17 6 4 6 6 6 8 2 4 14 6 7 12 5 4 7 6 8 15 8 4 13 7 5 13 2 4 8 7 6 13 9 4 3 7 7 9 9 4 10 7 8 6 6 4 2 8 5 12 8 4 11 8 6 9 7 4 5 8 7 11 10 4 16 8 8 15 7 ;
Using the ECAT categorical predictor, the PROC LOGISTIC statements below fit a logistic model to only the data in Block 1. This model is then used to score the data in in each of the four Blocks and the ROC curve is produced for each. Since the data set of predicted values is not needed, OUT=_NULL_ is specified in each SCORE statement to suppress creation of the OUT= data set, and the ROC data is saved using the OUTROC= option. The FITSTAT option is included in each SCORE statement to produce a table containing the areas (AUCs) under the four ROC curves. This table is saved in data set AUC by the ODS OUTPUT statement.
The three DATA steps which follow create (0,0) points for the curves (these are not included in the OUTROC= data sets) and merge together the ROC data sets from all of the Blocks into a single data set for plotting. The SQL step extracts the AUCs from the AUC data set and stores them in macro variables.
Finally, the SGPLOT step produces the plot of overlaid ROC curves and also displays the associated AUC values.
proc logistic data=Block1; class ecat / param=ref; model y/n = ecat; score data=Block1 out=_null_ outroc=Block1ROC fitstat; score data=Block2 out=_null_ outroc=Block2ROC fitstat; score data=Block3 out=_null_ outroc=Block3ROC fitstat; score data=Block4 out=_null_ outroc=Block4ROC fitstat; ods output scorefitstat=AUC; run; data Zero; do Block=1 to 4; _1mspec_=0; _sensit_=0; output; end; run; data Plotdata; set Block1ROC(in=b1) Block2ROC(in=b2) Block3ROC(in=b3) Block4ROC(in=b4); if b1 then Block=1; if b2 then Block=2; if b3 then Block=3; if b4 then Block=4; run; data PlotData; set zero PlotData; by Block; run; proc sql noprint; select distinct(AUC) into :auc1 - :auc4 from AUC order by dataset; quit; proc sgplot data=PlotData aspect=1; xaxis values=(0 to 1 by 0.25) grid offsetmin=.05 offsetmax=.05; yaxis values=(0 to 1 by 0.25) grid offsetmin=.05 offsetmax=.05; lineparm x=0 y=0 slope=1 / transparency=.5 lineattrs=(color=black); series x=_1mspec_ y=_sensit_ / group=Block; inset ("AUC block 1" = "&auc1" "AUC block 2" = "&auc2" "AUC block 3" = "&auc3" "AUC block 4" = "&auc4") / opaque position=bottomright; title "ROC Curves scored from Block 1 model"; run;
______________
Note: In earlier releases, you can use equal values in the HEIGHT= and WIDTH= options of the ODS GRAPHICS statement. Specify this statement prior to the PROC SGPLOT statements. For example, the following will produce a square plot that has 480 pixels on each side.
ods graphics / height=480px width=480px;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
Z64 | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Categorical Data Analysis Analytics ==> Statistical Graphics SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> SGPLOT |
Date Modified: | 2020-03-11 14:30:28 |
Date Created: | 2014-05-13 14:57:12 |