As part of your analysis,
you want to include some parametric models for comparison with the
decision trees that you
built in Build Decision Trees. Because it is familiar to the management of your organization,
you have decided to include a logistic regression as
one of the parametric models.
To use the Regression
node to fit a logistic regression model:
-
Select the
Model tab on the Toolbar.
-
Select the Regression
node icon. Drag the node into the Diagram Workspace.
-
Connect the Transform
Variables node to the Regression node.
-
To examine histograms
of the imputed and transformed input variables, right-click the Regression
node and select
Update. In the diagram workspace,
select the Regression node. In the Properties Panel, scroll down to
view the Train properties, and click on the ellipses that represent
the value of
Variables. The
Variables
— Reg window appears.
-
Select all variables
that have the prefix LG10_. Click
Explore.
The
Explore window appears.
You can select a bar
in any histogram, and the observations that are in that bucket are
highlighted in the EMWS.Trans_TRAIN data set window and in the other
histograms. Close the
Explore window to return
to the
Variables - Reg window.
-
(Optional) You can explore
the histograms of other input variables.
-
Close the
Variables - Reg window.
-
In the Properties Panel,
scroll down to view the Train properties. Click on the
Selection Model property in the
Model Selection subgroup, and select
Stepwise from the drop-down
menu that appears. This specification causes SAS Enterprise Miner
to use stepwise variable selection to build the logistic regression
model.
Note: The Regression node automatically
performs logistic regression if the target variable is a class variable
that takes one of two values. If the target variable is a continuous
variable, then the Regression node performs linear regression.
-
In the Diagram Workspace,
right-click the Regression node, and select
Run from the resulting menu. Click
Yes in the
confirmation window that opens.
-
In the window that appears
when processing completes, click
Results. The
Results window appears.
-
Maximize the
Output window. This window details the variable selection
process. Lines 402–423 list a summary of the steps that were
taken.
-
Minimize the
Output window and maximize the
Score Rankings
Overlay window. From the drop-down menu, select
Cumulative Total Expected Profit.
The data that is used
to construct this plot is ordered by expected profit. For this example,
you have defined a profit matrix. Therefore, expected profit is a
function of both the probability of donation for an individual and
the profit associated with the corresponding outcome. A value is computed
for each decision from the sum of the decision matrix values multiplied
by the classification probabilities and minus any defined cost. The
decision with the greatest value is selected, and the value of that
selected decision for each observation is used to compute overall
profit measures.
The plot represents
the cumulative total expected profit that results from soliciting
the best
n% of the individuals (as determined
by expected profit) on your mailing list. For example, if you were
to solicit the best 40% of the individuals, the total expected profit
from the validation data would be around $850. If you were to solicit
everyone on the list, then based on the validation data, you could
expect a $2150 profit on the campaign.
-
Close the
Results window.