The DTREE Procedure

Overview: DTREE Procedure

The DTREE procedure in SAS/OR software is an interactive procedure for decision analysis. The procedure interprets a decision problem represented in SAS data sets, finds the optimal decisions, and plots on a line printer or a graphics device the decision tree showing the optimal decisions.

To use PROC DTREE you first construct a decision model to represent your problem. This model, called a generic decision tree model, is made up of stages.^[1] Every stage has a stage name, which identifies the stage, as well as a type, which specifies the type of the stage. There are three types of stages: decision stages, chance stages, and end stages. In addition, every stage has possible outcomes.

A decision stage represents a particular decision you have to make. The outcomes of a decision stage are the possible alternatives (or actions) of the decision. A chance stage represents an uncertain factor in the decision problem (a statistician might call it a random variable; here it is called an uncertainty). The outcomes of a chance stage are events, one of which will occur according to a given probability distribution. An end stage terminates a particular scenario (a sequence of alternatives and events). It is not necessary to include an end stage in your model; the DTREE procedure adds an end stage to your model if one is needed.

Each outcome of a decision or chance stage also has several attributes, an outcome name to identify the outcome, a reward to give the instant reward of the outcome, and a successor to specify the name of the stage that comes next when this outcome is realized. For chance stages, a probability attribute is also needed. It gives the relative likelihood of this outcome. Every decision stage should have at least two alternatives, and every chance stage should have at least two events. Probabilities of events for a chance stage must sum to 1. End stages do not have any outcomes.

The structure of a decision model is given in the STAGEIN= data set. It contains the stage name, the type, and the attributes (except probability) of all outcomes for each stage in your model. You can specify each stage in one observation or across several observations. If a diagrammatic representation of a decision problem is all you want, you probably do not need any other data sets.

If you want to evaluate and analyze your decision problem, you need another SAS data set, called the PROBIN= data set. This data set describes the probabilities or conditional probabilities for every event in your model. Each observation in the data set contains a list of given conditions (list of outcomes), if there are any, and at least one combination of event and probability. Each event and probability combination identifies the probability that the event occurs given that all the outcomes specified in the list occur. If no conditions are given, then the probabilities are unconditional probabilities.

The third data set, called the PAYOFFS= data set, contains the value of each possible scenario. You can specify one or more scenarios and the associated values in one observation. If the PAYOFFS= data set is omitted, the DTREE procedure assumes that all values are zero and uses rewards for outcomes to evaluate the decision problem.

You can use PROC DTREE to display, evaluate, and analyze your decision problem. In the PROC DTREE statement, you specify input data sets and other options. A VARIABLES statement identifies the variables in the input data set that describe the model. This statement can be used only once and must appear immediately after the PROC DTREE statement. The EVALUATE statement evaluates the decision tree. You can display the optimal decisions by using the SUMMARY statement, or you can plot the complete tree with the TREEPLOT statement. Finally, you can also associate HTML pages with decision tree nodes and create Web-enabled decision tree diagrams.

It is also possible to interactively modify some attributes of your model with the MODIFY statement and to change the order of decisions by using the MOVE statement. Before making any changes to the model, you should save the current model with the SAVE statement so that you can call it back later by using the RECALL statement. Questions about the value of perfect information or the value of perfect control are answered using the VPI and VPC statements. Moreover, any options that can be specified in the PROC DTREE statement can be reset at any time with the RESET statement.

All statements can appear in any order and can be used as many times as desired with one exception. The RECALL statement must be preceded by at least one SAVE statement. In addition, only one model can be saved at any time; the SAVE statement overwrites the previously saved model. Finally, you can use the QUIT statement to stop processing and exit the procedure.

The DTREE procedure produces one output data set. The IMAGEMAP= data set contains the outline coordinates for the nodes in the decision tree that can be used to generate HTML MAP tags.

PROC DTREE uses the Output Delivery System (ODS), a SAS subsystem that provides capabilities for displaying and controlling the output from SAS procedures. ODS enables you to convert any of the output from PROC DTREE into a SAS data set. For further details, refer to the chapter on ODS in the SAS/STAT User’s Guide.

^[1]The stages are often referred to as variables in many decision analysis articles.