The Gradient Boosting
node uses a partitioning algorithm to search for an optimal partition of the data
for
a single
target variable. Gradient boosting is an approach that resamples the analysis data several times
to generate results that form a weighted average of the resampled
data set. Tree boosting creates a series of decision trees that form a single predictive
model.
Like decision trees, boosting makes no assumptions about the distribution of the data.
Boosting is less prone to
overfit the data than a single decision tree. If a decision tree fits the data fairly well,
then boosting often improves the fit. For more information about the Gradient Boosting
node, see the SAS Enterprise Miner help documentation.
To create a gradient
boosting model of the data:
-
Select the
Model tab
on the Toolbar.
-
Select the
Gradient
Boosting node icon. Drag the node into the Diagram Workspace.
-
Connect the
Control
Point node to the
Gradient Boosting node.
-
Select the
Gradient
Boosting node. In the Properties Panel, set the following
properties:
-
Click on the value for the Maximum
Depth property, in the Splitting Rule subgroup,
and enter 10. This property
determines the number of generations in each decision tree created
by the Gradient Boosting node.
-
Click on the value for the Number
of Surrogate Rules property, in the Node subgroup,
and enter 2. Surrogate rules
are backup rules that are used in the event of missing data. For example,
if your primary splitting rule sorts donors based on their ZIP codes,
then a reasonable surrogate rule would sort based on the donor’s
city of residence.
-
In the Diagram Workspace,
right-click the Gradient Boosting node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the
Run
Status window, select
OK.
Tip
The book “
Decision Trees for Analytics
Using SAS Enterprise Miner” offers additional information
about alternative measures of the effectiveness of a split, options
for training and pruning, suggestions for guiding tree growth, and
examples of multiple tree and gradient boosting models.