### Input Data Sets

A decision problem is normally constructed in three steps:

1. A structuring of the problem in terms of decisions, uncertainties, and consequences.

2. Assessment of probabilities for the events.

3. Assessment of values (payoffs, losses, or preferences) for each consequence or scenario.

PROC DTREE represents these three steps in three SAS data sets. The STAGEIN= data set describes the structure of the problem. In this data set, you define all decisions and define all key uncertainties. This data set also contains the relative order of when decisions are made and uncertainties are resolved (planning horizon). The PROBIN= data set assigns probabilities for the uncertain events, and the PAYOFFS= data set contains the values (or utility measure) for each consequence or scenario. See the section Overview: DTREE Procedure and the section Getting Started: DTREE Procedure for a description of these three data sets.

PROC DTREE is designed to minimize the rules for describing a problem. For example, the PROBIN= data set is required only when the evaluation and analysis of a decision problem is necessary. Similarly, if the PAYOFFS= data set is not specified, the DTREE procedure assumes all payoff values are 0. The order of the observations is not important in any of the input data sets. Since a decision problem can be structured in many different ways and the data format is so flexible, all possible ways of describing a given decision problem cannot be shown here. However, some alternate ways of supplying the same problem are demonstrated. For example, the following statements show another way to input the oil wildcatter’s problem described in the section Introductory Example.

```
data Dtoils3;
format _STNAME_ \$12. _STTYPE_ \$2. _OUTCOM_ \$10.
_REWARD_ dollar12.0  _SUCCES_ \$12.;
input _STNAME_ \$12. _STTYPE_  \$4. _OUTCOM_ \$12.
_REWARD_ dollar12.0  _SUCCES_ \$12.;
datalines;
Drill       D   Drill              .    Cost
.           .   Not_drill          .    .
Cost        C   Low           -\$150,000 Oil_deposit
.           .   Fair          -\$300,000 Oil_deposit
.           .   High          -\$500,000 Oil_deposit
Oil_deposit C   Dry                .    .
.           .   Wet            \$700,000 .
.           .   Soaking      \$1,200,000 .
;

/* -- create PAYOFFS= data set                    -- */
data Dtoilp3;
input _EVENT1 \$ _PROB1 _EVENT2 \$ _PROB2;
datalines;
Low    0.2   Dry       0.5
Fair   0.6   Wet       0.3
High   0.2   Soaking   0.2
;
/* -- PROC DTREE statements                       -- */
title "Oil Wildcatter's Problem";
proc dtree stagein=Dtoils3
probin=Dtoilp3
nowarning;
evaluate / summary;
```

Note that the STAGEIN= data set describes the problem structure and the payoffs (using the REWARD= variable). Thus, the PAYOFFS= data set is no longer needed. Note also the changes made to the PROBIN= data set. The results, shown in Figure 7.8, are the same as those shown in Figure 7.2. However, the rewards and the payoffs are entirely different entities in decision tree models. Recall that the reward of an outcome means the instant returns when the outcome is realized. On the other hand, the payoffs are the return from each scenario. In the other words, the decision tree model described in the previous code and the model described in the section Introductory Example are not equivalent, even though they have the same optimal decision.

Figure 7.8: Optimal Decision Summary of the Oil Wildcatter’s Problem

 Oil Wildcatter's Problem

The DTREE Procedure
Optimal Decision Summary

Order of Stages
Stage Type
Drill Decision
Cost Chance
Oil_deposit Chance
_ENDST_ End

Decision Parameters
Decision Criterion: Maximize Expected Value (MAXEV)
Optimal Decision Yields: 140000

Optimal Decision Policy
Up to Stage Drill
Alternatives or
Outcomes
Cumulative Reward Evaluating Value
Drill \$0 140000*
Not_drill \$0 0

You can try many alternative ways to specify your decision problem. Then you can choose the model that is most convenient and closest to your real problem. If PROC DTREE cannot interpret the input data, it writes a message to that effect to the SAS log unless the NOWARNING option is specified. However, there are mistakes that PROC DTREE cannot detect. These often occur after the model has been modified with either the MOVE statement or the MODIFY statement. After a MOVE statement is specified, it is a good idea to display the decision tree (using the TREEPLOT statement) and check the probabilities and value assessments to make sure they are reasonable.

For example, using the REWARD= variable in the STAGEIN= data set to input the payoff information as shown in the previous code may cause problems if you change the order of the stages. Suppose you move the stage 'Cost' to the beginning of the tree, as was done in the section Sensitivity Analysis and Value of Perfect Information:

```      move Cost before Drill;
evaluate / summary;
```

The optimal decision yields \$140,000, as shown on the optimal decision summary in Figure 7.9.

Figure 7.9: Optimal Decision Summary of the Oil Wildcatter’s Problem

 Oil Wildcatter's Problem

The DTREE Procedure
Optimal Decision Summary

Order of Stages
Stage Type
Cost Chance
Drill Decision
Oil_deposit Chance
_ENDST_ End

Decision Parameters
Decision Criterion: Maximize Expected Value (MAXEV)
Optimal Decision Yields: 140000

Optimal Decision Policy
Up to Stage Drill
Alternatives or Outcomes Cumulative Reward Evaluating Value
Low Drill \$-150,000 450000*
Low Not_drill \$-150,000 0
Fair Drill \$-300,000 450000*
Fair Not_drill \$-300,000 0
High Drill \$-500,000 450000*
High Not_drill \$-500,000 0

Recall that when this was done in the section Sensitivity Analysis and Value of Perfect Information, the optimal decision yielded \$150,000. The reason for this discrepancy is that the cost of drilling, implemented as (negative) instant rewards here, is imposed on all scenarios including those that contain the outcome 'Not_drill'. This mistake can be observed easily from the Cumulative Reward column of the optimal decision summary shown Figure 7.9.

Changing a decision stage to a chance stage is another example where using the MODIFY statement without care may cause problems. PROC DTREE cannot determine the probabilities of outcomes for this new chance stage unless they are included in the PROBIN= data set. In contrast to changing a chance stage to a decision stage (which yields insight on the value of gaining control of an uncertainty), changing a decision stage to a chance stage is not likely to yield any valuable insight even if the needed probability data are included in the PROBIN= data set, and it should be avoided.