The Gradient Boosting
node uses a partitioning algorithm to search for an optimal partition of the data
for
a single
target variable. Gradient boosting is an approach that resamples the analysis data several times
to generate results that form a weighted average of the resampled
data set. Tree boosting creates a series of decision trees that form a single predictive
model.
Like decision trees, boosting makes no assumptions about the distribution of the data.
Boosting is less prone to
overfit the data than a single decision tree. If a decision tree fits the data fairly well,
then boosting often improves the fit. For more information about the Gradient Boosting
node, see the SAS Enterprise Miner help documentation.
To create a gradient
boosting model of the data:
-
Select the
Model tab
on the Toolbar.
-
Select the
Gradient
Boosting node icon. Drag the node into the Diagram Workspace.
-
Connect the
Control
Point node to the
Gradient Boosting node.
-
Select the
Gradient
Boosting node. In the Properties Panel, set the following
properties:
-
Click on the value for the Maximum
Depth property, in the Splitting Rule subgroup,
and enter 10
. This property
determines the number of generations in each decision tree created
by the Gradient Boosting node.
-
Click on the value for the Number
of Surrogate Rules property, in the Node subgroup,
and enter 2
. Surrogate rules
are backup rules that are used in the event of missing data. For example,
if your primary splitting rule sorts donors based on their ZIP codes,
then a reasonable surrogate rule would sort based on the donor’s
city of residence.
-
In the Diagram Workspace,
right-click the Gradient Boosting node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the
Run
Status window, select
OK.
Tip
The book “
Decision Trees for Analytics
Using SAS Enterprise Miner” offers additional information
about alternative measures of the effectiveness of a split, options
for training and pruning, suggestions for guiding tree growth, and
examples of multiple tree and gradient boosting models.