-
ALPHA=number
-
specifies the level of significance for % confidence intervals. The value number must be between 0 and 1; the default value is 0.05, which results in 95% intervals. This value is used as the default confidence
level for limits computed by the following options:
You can override the default in most of these cases by specifying the ALPHA= option in the separate statements.
-
COVOUT
-
adds the estimated covariance matrix to the OUTEST=
data set. For the COVOUT option to have an effect, the OUTEST= option must be specified. See the section OUTEST= Output Data Set for more information.
-
DATA=SAS-data-set
-
names the SAS data set containing the data to be analyzed. If you omit the DATA= option, the procedure uses the most recently
created SAS data set. The INMODEL=
option cannot be specified with this option.
-
DESCENDING
DESC
-
reverses the sort order for the levels of the response variable. If both the DESCENDING and ORDER=
options are specified, PROC LOGISTIC orders the levels according to the ORDER= option and then reverses that order. This
option has the same effect as the response variable option DESCENDING
in the MODEL
statement. See the section Response Level Ordering for more detail.
-
EXACTONLY
-
requests only the exact analyses. The asymptotic analysis that PROC LOGISTIC usually performs is suppressed.
-
EXACTOPTIONS (options)
-
specifies options that apply to every EXACT
statement in the program. The available options are summarized here, and full descriptions are available in the EXACTOPTIONS
statement.
Option
|
Description
|
ABSFCONV
|
Specifies the absolute function convergence criterion
|
ADDTOBS
|
Adds the observed sufficient statistic to the sampled exact distribution
|
BUILDSUBSETS
|
Builds every distribution for sampling
|
EPSILON=
|
Specifies the comparison fuzz for partial sums of sufficient statistics
|
FCONV
|
Specifies the relative function convergence criterion
|
MAXTIME=
|
Specifies the maximum time allowed in seconds
|
METHOD=
|
Specifies the DIRECT, NETWORK, NETWORKMC, or MCMC algorithm
|
N=
|
Specifies the number of Monte Carlo samples
|
ONDISK
|
Uses disk space
|
SEED=
|
Specifies the initial seed for sampling
|
STATUSN=
|
Specifies the sampling interval for printing a status line
|
STATUSTIME=
|
Specifies the time interval for printing a status line
|
XCONV
|
Specifies the relative parameter convergence criterion
|
-
INEST=SAS-data-set
-
names the SAS data set that contains initial estimates for all the parameters in the model. If BY-group processing is used,
it must be accommodated in setting up the INEST= data set. See the section INEST= Input Data Set for more information.
-
INMODEL=SAS-data-set
-
specifies the name of the SAS data set that contains the model information needed for scoring new data. This INMODEL= data
set is the OUTMODEL=
data set saved in a previous PROC LOGISTIC call. The OUTMODEL= data set should not be modified before its use as an INMODEL=
data set.
The DATA=
option cannot be specified with this option; instead, specify the data sets to be scored in the SCORE
statements. FORMAT statements are not allowed when the INMODEL= data set is specified; variables in the DATA=
and PRIOR=
data sets in the SCORE
statement should be formatted within the data sets.
You can specify the BY
statement provided that the INMODEL= data set is created under the same BY-group processing.
The CLASS, EFFECT, EFFECTPLOT, ESTIMATE, EXACT, LSMEANS, LSMESTIMATE, MODEL, OUTPUT, ROC, ROCCONTRAST, SLICE, STORE, TEST,
and UNIT statements are not available with the INMODEL= option.
-
MULTIPASS
-
forces the procedure to reread the DATA=
data set as needed rather than require its storage in memory or in a temporary file on disk. By default, the data set is
cleaned up and stored in memory or in a temporary file. This option can be useful for large data sets. All exact analyses
are ignored in the presence of the MULTIPASS option. If a STRATA
statement is specified, then the data set must first be grouped or sorted by the strata variables.
-
NAMELEN=number
-
specifies the maximum length of effect names in tables and output data sets to be number characters, where number is a value between 20 and 200. The default length is 20 characters.
-
NOCOV
-
specifies that the covariance matrix not be saved in the OUTMODEL=
data set. The covariance matrix is needed for computing the confidence intervals for the posterior probabilities in the OUT=
data set in the SCORE
statement. Specifying this option will reduce the size of the OUTMODEL=
data set.
-
NOPRINT
-
suppresses all displayed output. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 20: Using the Output Delivery System, for more information.
-
ORDER=DATA | FORMATTED | FREQ | INTERNAL
RORDER=DATA | FORMATTED | INTERNAL
-
specifies the sort order for the levels of the response variable. See the response variable option ORDER=
in the MODEL
statement for more information. For ordering of CLASS variable levels, see the ORDER=
option in the CLASS
statement.
-
OUTDESIGN=SAS-data-set
-
specifies the name of the data set that contains the design matrix for the model. The data set contains the same number of
observations as the corresponding DATA=
data set and includes the response variable (with the same format as in the DATA= data set), the FREQ
variable, the WEIGHT
variable, the OFFSET=
variable, and the design variables for the covariates, including the Intercept variable of constant value 1 unless the NOINT
option in the MODEL
statement is specified.
-
OUTDESIGNONLY
-
suppresses the model fitting and creates only the OUTDESIGN=
data set. This option is ignored if the OUTDESIGN= option is not specified.
-
OUTEST=SAS-data-set
-
creates an output SAS data set that contains the final parameter estimates and, optionally, their estimated covariances (see
the preceding COVOUT
option). The output data set also includes a variable named _LNLIKE_
, which contains the log likelihood.
See the section OUTEST= Output Data Set for more information.
-
OUTMODEL=SAS-data-set
-
specifies the name of the SAS data set that contains the information about the fitted model. This data set contains sufficient
information to score new data without having to refit the model. It is solely used as the input to the INMODEL=
option in a subsequent PROC LOGISTIC call. The OUTMODEL= option is not available with the STRATA
statement. Information in this data set is stored in a very compact form, so you should not modify it manually.
Note: The STORE statement can also be used to save your model. See the section STORE Statement for more information.
-
PLOTS <(global-plot-options)> <=plot-request <(options)>>
PLOTS <(global-plot-options)> =(plot-request <(options)> <…plot-request <(options)>>)
-
controls the plots produced through ODS Graphics. When you specify only one plot-request, you can omit the parentheses from around the plot-request. For example:
PLOTS = ALL
PLOTS = (ROC EFFECT INFLUENCE(UNPACK))
PLOTS(ONLY) = EFFECT(CLBAR SHOWOBS)
ODS Graphics must be enabled before plots can be requested. For example:
ods graphics on;
proc logistic plots=all;
model y=x;
run;
For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.
If the PLOTS option is not specified or is specified with no plot-requests, then graphics are produced by default in the following situations:
-
If the INFLUENCE
or IPLOTS
option is specified in the MODEL
statement, then the INFLUENCE
plots are produced unless the MAXPOINTS=
cutoff is exceeded.
-
If you specify the OUTROC=
option in the MODEL
statement, then ROC curves are produced. If you also specify a SELECTION=
method, then an overlaid plot of all the ROC curves for each step of the selection process is displayed.
-
If the OUTROC=
option is specified in a SCORE
statement, then the ROC curve for the scored data set is displayed.
-
If you specify ROC
statements, then an overlaid plot of the ROC curves for the model (or the selected model if a SELECTION=
method is specified) and for all the ROC statement models is displayed.
-
If you specify the CLODDS=
option in the MODEL statement or if you specify the ODDSRATIO
statement, then a plot of the odds ratios and their confidence limits is displayed.
For general information about ODS Graphics, see Chapter 21: Statistical Graphics Using ODS.
The following global-plot-options are available:
-
LABEL
-
displays a label on diagnostic plots to aid in identifying the outlying observations. This option enhances the plots produced
by the DFBETAS
, DPC
, INFLUENCE
, LEVERAGE
, and PHAT
options. If an ID
statement is specified, then the plots are labeled with the ID variables. Otherwise, the observation number is displayed.
-
MAXPOINTS=NONE | number
-
suppresses the plots produced by the DFBETAS
, DPC
, INFLUENCE
, LEVERAGE
, and PHAT
options if there are more than number observations. Also, observations are not displayed on the EFFECT
plots when the cutoff is exceeded. The default is MAXPOINTS=5000. The cutoff is ignored if you specify MAXPOINTS=NONE.
-
ONLY
-
specifically requested plot-requests are displayed.
-
UNPACKPANELS | UNPACK
-
suppresses paneling. By default, multiple plots can appear in some output panels. Specify UNPACKPANEL to display each plot separately.
The following plot-requests are available:
-
ALL
-
produces all appropriate plots. You can specify other options with ALL. For example, to display all plots and unpack the DFBETAS
plots you can specify plots=(all dfbetas(unpack))
.
-
DFBETAS <(UNPACK)>
-
displays plots of DFBETAS
versus the case (observation) number. This displays the statistics generated by the DFBETAS=_ALL_
option in the OUTPUT
statement. The UNPACK option displays the plots separately. See Output 72.6.5 for an example of this plot.
-
DPC<(dpc-options)>
-
displays plots of DIFCHISQ
and DIFDEV
versus the predicted event probability
, and displays the markers according to the value of the confidence interval displacement C
. See Output 72.6.8 for an example of this plot. You can specify the following dpc-options:
-
MAXSIZE=Smax
-
specifies the maximum size when TYPE=BUBBLE or TYPE=LABEL. For TYPE=BUBBLE, the size is the bubble radius and MAXSIZE=21 by
default; for TYPE=LABEL, the size is the font size and MAXSIZE=20 by default. This dpc-option is ignored if TYPE=GRADIENT.
-
MAXVALUE=Cmax
-
displays all observations for which C Cmax at the value of the MAXSIZE= option when TYPE=BUBBLE or TYPE=LABEL. By default, Cmax=. This dpc-option is ignored if TYPE=GRADIENT.
-
MINSIZE=Smin
-
specifies the minimum size when TYPE=BUBBLE or TYPE=LABEL. Any observation that maps to a smaller size is displayed at this
size. For TYPE=BUBBLE, the size is the bubble radius and MINSIZE=3.5 by default; for TYPE=LABEL, the size is the font size
and MINSIZE=2 by default. This dpc-option is ignored if TYPE=GRADIENT.
-
TYPE=BUBBLE | GRADIENT | LABEL
-
specifies how the C statistic is displayed. You can specify the following values:
- BUBBLE
-
displays circular markers whose areas are proportional to C and whose colors are determined by their response.
- GRADIENT
-
colors the markers according to the value of C.
- LABEL
-
displays the ID variables (if an ID
statement is specified) or the observation number. The colors of the ID variable or observation numbers are determined by
their response, and their font sizes are proportional to .
By default, TYPE=GRADIENT.
-
UNPACKPANELS | UNPACK
-
displays the plots separately.
-
EFFECT<(effect-options)>
-
displays and enhances the effect plots for the model. For more information about effect plots and the available effect-options, see the section PLOTS=EFFECT Plots.
Note: The EFFECTPLOT
statement provides much of the same functionality and more options for creating effect plots. See Outputs Output 72.2.11, Output 72.3.5, Output 72.4.8, Output 72.7.4, and Output 72.16.4 for examples of effect plots.
-
INFLUENCE<(UNPACK | STDRES)>
-
displays index plots of RESCHI
, RESDEV
, leverage
, confidence interval displacements C
and CBar
, DIFCHISQ
, and DIFDEV
. These plots are produced by default when any plot-request is specified and the MAXPOINTS=
cutoff is not exceeded. The UNPACK option displays the plots separately. The STDRES option also displays index plots of STDRESCHI
, STDRESDEV
, and RESLIK
. See Outputs Output 72.6.3 and Output 72.6.4 for examples of these plots.
-
LEVERAGE<(UNPACK)>
-
displays plots of DIFCHISQ
, DIFDEV
, confidence interval displacement C
, and the predicted probability versus the leverage
. The UNPACK option displays the plots separately. See Output 72.6.7 for an example of this plot.
-
NONE
-
suppresses all plots.
-
ODDSRATIO <(oddsratio-options)>
-
displays and enhances the odds ratio plots for the model. For more information about odds ratio plots and the available oddsratio-options, see the section Odds Ratio Plots. See Outputs Figure 72.7,Output 72.2.9, Output 72.3.3, and Output 72.4.5 for examples of this plot.
-
PHAT<(UNPACK)>
-
displays plots of DIFCHISQ
, DIFDEV
, confidence interval displacement C
, and leverage
versus the predicted event probability
. The UNPACK option displays the plots separately. See Output 72.6.6 for an example of this plot.
-
ROC<(ID<=keyword>)>
-
displays the ROC curve. If you also specify a SELECTION=
method, then an overlaid plot of all the ROC curves for each step of the selection process is displayed. If you specify ROC
statements, then an overlaid plot of the model (or the selected model if a SELECTION=
method is specified) and the ROC statement models is displayed. If the OUTROC=
option is specified in a SCORE
statement, then the ROC curve for the scored data set is displayed.
The ID= option labels certain points on the ROC curve. Typically, the labeled points are closest to the upper left corner
of the plot, and points directly below or to the right of a labeled point are suppressed. This option is identical to, and
has the same keywords as, the ID=
suboption of the ROCOPTIONS
option.
You can define the following macro variables to modify the labels and titles on the graphic:
- _ROC_ENTRY_ID
-
sets the note for the ID= option on the ROC plot.
- _ROC_ENTRYTITLE
-
sets the first title line on the ROC plot.
- _ROC_ENTRYTITLE2
-
sets the second title line on the ROC plot.
- _ROC_XAXISOPTS_LABEL
-
sets the X-axis label on the ROC and overlaid ROC plots.
- _ROC_YAXISOPTS_LABEL
-
sets the Y-axis label on the ROC and overlaid ROC plots.
- _ROCOVERLAY_ENTRYTITLE
-
sets the title on the overlaid ROC plot.
To revert to the default labels and titles, you can specify the macro variables in a %SYMDEL statement. For example:
%let _ROC_ENTRYTITLE=New Title;
Submit PROC LOGISTIC statement
%symdel _ROC_ENTRYTITLE;
See Output 72.7.3 and Example 72.8 for examples of these ROC plots.
-
ROCOPTIONS (options)
-
specifies options that apply to every model specified in a ROC
statement. Some of these options also apply to the SCORE
statement. The following options are available:
-
ALPHA=number
-
sets the significance level for creating confidence limits of the areas and the pairwise differences. The ALPHA=
value specified in the PROC LOGISTIC statement is the default. If neither ALPHA= value is specified, then ALPHA=0.05 by default.
-
CROSSVALIDATE | X
-
uses cross validated predicted probabilities instead of the model-predicted probabilities for all ROC and area under the ROC
curve (AUC) computations; for more information, see the section Classification Table. The cross validated probabilities are also used in computations for the "Association of Predicted Probabilities and Observed
Responses" table. If you use a SCORE
statement, then the OUTROC=
data set and the AUC statistic from the FITSTAT
option use the cross validated probabilities only when you score the original data set; otherwise, the model-predicted probabilities
are used.
-
EPS=value
-
is an alias for the ROCEPS=
option in the MODEL statement. This value is used to determine which predicted probabilities are equal. The default value
is the square root of the machine epsilon, which is about 1E–8.
-
ID<=keyword>
-
displays labels on certain points on the individual ROC curves and also on the SCORE
statement’s ROC curve. This option overrides the ID= suboption of the PLOTS=ROC
option. If several observations lie at the same place on the ROC curve, the value for the last observation is displayed.
If you specify the ID option with no keyword, any variables that are listed in the ID
statement are used. If no ID
statement is specified, the observation number is displayed. The following keywords are available:
- PROB
-
displays the model predicted probability.
- OBS
-
displays the (last) observation number.
- SENSIT
-
displays the true positive fraction (sensitivity).
- 1MSPEC
-
displays the false positive fraction (1–specificity).
- FALPOS
-
displays the fraction of nonevents that are predicted as events.
- FALNEG
-
displays the fraction of events that are predicted as nonevents.
- POSPRED
-
displays the positive predictive value (1–FALPOS).
- NEGPRED
-
displays the negative predictive value (1–FALNEG).
- MISCLASS
-
displays the misclassification rate.
- ID
-
displays the ID variables.
The SENSIT, 1MSPEC, FALPOS, and FALNEG statistics are defined in the section Receiver Operating Characteristic Curves. The misclassification rate is the number of events that are predicted as nonevents and the number of nonevents that are
predicted as events as calculated by using the given cutpoint (predicted probability) divided by the number of observations.
If the PEVENT=
option is also specified, then FALPOS and FALNEG are computed using the first PEVENT= value and Bayes’ theorem, as discussed
in the section False Positive, False Negative, and Correct Classification Rates Using Bayes’ Theorem.
-
NODETAILS
-
suppresses the display of the model fitting information for the models specified in the ROC
statements.
-
OUT=SAS-data-set-name
-
is an alias for the OUTROC=
option in the MODEL
statement.
-
WEIGHTED
-
uses frequencyweight in the ROC computations (Izrael et al. 2002) instead of just frequency. Typically, weights are considered in the fit of the model only, and hence are accounted for in
the parameter estimates. The "Association of Predicted Probabilities and Observed Responses" table uses frequency (unless
the BINWIDTH=0
option is also specified on the MODEL statement), and is suppressed when ROC comparisons are performed. This option also
affects SCORE
statement ROC and area under the ROC curve (AUC) computations.
-
SIMPLE
-
displays simple descriptive statistics (mean, standard deviation, minimum and maximum) for each continuous explanatory variable.
For each CLASS variable involved in the modeling, the frequency counts of the classification levels are displayed. The SIMPLE
option generates a breakdown of the simple descriptive statistics or frequency counts for the entire data set and also for
individual response categories.
-
TRUNCATE
-
determines class levels by using no more than the first 16 characters of the formatted values of CLASS, response, and strata
variables. When formatted values are longer than 16 characters, you can use this option to revert to the levels as determined
in releases previous to SAS 9.0. This option invokes the same option in the CLASS
statement.