As part
of your analysis, you want to include some parametric models for comparison
with the decision trees that you
built in Build Decision Trees. Because it
is familiar to the management of your organization, you have decided
to include a logistic regression as one of the parametric models.
To use
the Regression node to fit a logistic regression model, complete
the following steps:
-
Select
the
Model tab on the Toolbar.
-
Select
the Regression node icon. Drag the node into the Diagram Workspace.
-
Connect
the Transform Variables node to the Regression node.
-
To examine
histograms of the imputed and transformed input variables, select
the Regression node. In the Properties Panel, scroll down to view
the Train properties, and click on the ellipses that represent the
value of
Variables. The
Variables
- Reg window opens.
-
Select
all variables that have the prefix LOG_. Click
Explore, and then click
OK in the confirmation window
that opens. The
Explore window opens.
You can
select a bar in any histogram, and the observations that are in that
bucket are highlighted in the EMWS.Trans_TRAIN data set window and
in the other histograms. Close the
Explore window to return to the
Variables - Reg window.
-
(Optional)
Explore the histograms of other input variables.
-
Close
the
Variables - Reg window.
-
In the
Properties Panel, scroll down to view the Train properties. Click
on the model selection
Selection Model, and
select
Stepwise from the drop-down menu that
appears. This specification causes SAS Enterprise Miner to use stepwise
variable selection to build the logistic regression model.
Note: The Regression
node automatically performs logistic regression if the target variable
is a class variable that takes one of two values. If the target variable
is a continuous variable, then the Regression node performs linear
regression.
-
In the
Diagram Workspace, right-click the Regression node, and select
Run from the resulting menu. Click
Yes in the confirmation window that opens.
-
In the
window that appears when processing completes, click
Results. The
Results window opens.
-
Maximize
the
Output window. This window details the
variable selection process. Lines 1711–1727 list a summary
of the steps that were taken. Line 1732 lists the variables that are
in the final model.
Note: Notice that
the variables that were selected are all class variables (recall that
OPT_MEDIAN_HOME_VALUE is a binned version of the continuous MEDIAN_HOME_VALUE
variable). Therefore, use caution when interpreting the parameter
estimates. SAS Enterprise Miner uses reference-cell coding in the
design matrix, and therefore parameter estimates reflect differences
from a reference group.
-
Minimize
the
Output window and maximize the
Score Rankings Overlay window. From the drop-down menu,
select
Cumulative Total Expected Profit.
The data
that is used to construct this plot is ordered by expected profit.
For this example, you have defined a profit matrix. Therefore, expected
profit is a function of both the probability of donation for an individual
and the profit associated with the corresponding outcome. A value
is computed for each decision from the sum of the decision matrix
values multiplied by the classification probabilities and minus any
defined cost. The decision with the greatest value is selected, and
the value of that selected decision for each observation is used to
compute overall profit measures.
The plot
represents the cumulative total expected profit that results from
soliciting the best
n% of the individuals (as
determined by expected profit) on your mailing list. For example,
if you were to solicit the best 40% of the individuals, the total
expected profit from the validation data would be around $1850. If
you were to solicit everyone on the list, then based on the validation
data you could expect a $2250 profit on the campaign.
-
Close
the
Results window.