IMSTAT Procedure (Analytics)

Example 8: Storing and Scoring a Decision Tree

Details

You can store the representation of a decision tree in an in-memory table on the server and at the same time score the input table. This process generates two temporary tables: the temporary table with the tree representation and the temporary table with the scoring results.
This enables you to compute decision trees for high-cardinality problems. The results from tree building and tree scoring are available to you without transferring large amounts of data between the SAS client and the server. You can query the tree (for a drill-down, for example) and query the scoring information efficiently, using the storing and querying features of the server. Also, by storing them as temporary tables, you can process them with other IMSTAT procedure statements.

Program

libname example sasiola host="grid001.example.com" port=10010 tag='hps';

data example.heart; 
    set sashelp.heart;
run;

proc imstat;
  table example.heart;
  decisiontree Weight / input=(Sex DeathCause 
                               Chol_Status 
                               BP_Status Weight 
                               Smoking_Status) 
                        nbinstarget=5 
                        temptable   1
                        vars=(Sex DeathCause 
                              Chol_Status 
                              BP_Status Weight 
                              Smoking_Status)
                        nomissobs;
run;

  table example._&temptree_;  2
  /* tableinfo; */
  /* columninfo; */
  fetch _CI0_ _CI1_ _Val0_ _Val1_ _Parent_ -- _TargetUpperbd_ 
        / from=1 to=10 format;
run;
  
  table example.&_tempscore_;   3
  /* tableinfo; */
  /* columninfo; */
  where _NodeList2_=6;
  fetch / from=1 to=5 format;
run;

  table example.heart;
  decisiontree Weight / treelasr=example.&_temptree_;  4
run;

Program Description

  1. The TEMPTABLE option specifies to save the decision tree and the scoring results in in-memory tables on the server.
  2. The &_TEMPTREE_ macro variable is used to access the representation of the decision tree and the following FETCH statement prints a subset of the variables from the first ten rows.
  3. The &_TEMPSCORE_ macro variable is used to access the scoring table. The following FETCH statement prints the first five rows.
  4. The TREELASR= option demonstrates how to score an input table explicitly that is already in memory.

Output

Partial Results for the Decision Tree Created from the Heart Data Set
Tree node information for the Heart data set
Partial Results for the Scoring Table
Scoring information for the Heart data set
When the tree is scored with the DECISIONTREE statement (the last statement in the example), the misclassification rate information is printed to the SAS log.
Misclassification Rate Information
NOTE: The misclassification rate for scoring the decision tree is 0.368401 
      using table EXAMPLE.HEART with 5209 records out of 5209.