IMSTAT Procedure (Analytics)

Example 8: Training and Validating a Decision Tree

Details

This PROC IMSTAT example demonstrates how to use the DECISIONTREE statement to generate a decision tree and then use a validation data set for scoring against the tree.
The data for this example is available from the Machine Learning Repository of the University of California at Irvine.
Frank, A. & Asuncion, A. 2010. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. Available at http://archive.ics.uci.edu/ml. Accessed on December 4, 2012.

Program

libname mylib 'path-to-datasets';
libname example sasiola host="grid001.example.com" port=10010 tag='hps';

data example.bank_train_1; set mylib.bank_train_1;
data example.bank_valid_1; set mylib.bank_valid_1;run;

proc imstat data=example.bank_train_1;   
    decisiontree subscribe_term_deposit / 
        nbins      =10 
        maxlevel   =7
        maxbranches=4    
        input      =(age job marital_status education 
                     default balance housing loan contact 
                     day month duration campaign previous 
                     poutcome)
        nominal    =(contact default education housing 
                     job loan marital_status month 
                     poutcome)     
        multvar 
        prune 
        leafsize   =5 
        save       =DTreeTab; 1

/*   ods output dtree=example.banktree_train_1; 2 */ 
run;

   decisiontree subscribe_term_deposit / 
        treetab   =DTreeTab 
        scoredata =example.bank_valid_1
        detail  
        save      =DTreeScoreTab;
run;
/*
    table example.bank_valid_1;run; 3  
    decisiontree subscribe_term_deposit / 
        treedata=example.banktree_train_1; 4
*/

    free DTreeTab DTreeScoreTab;
quit;

Program Description

  1. The SAVE= option stores the result table so that it can be used in subsequent statements. It is named DTreeTab.
  2. As an alternative to the SAVE= option, the ODS OUTPUT statement can also be used to save the result table.
  3. To use the table that was stored with the ODS OUTPUT statement, the TABLE statement switches the active table to bank_valid_1.
  4. The TREEDATA= option specifies the decision tree that was saved with the ODS OUTPUT statement.

Output

Partial Results for the Decision Tree Created from the Training Data
Tree node information for the bank_train_1 data set
Partial Results for Classification Information Generated with the DETAIL Option
Tree scoring results for the bank_valid_1 data set