During the manufacture of a semiconductor device, the levels of temperature, atomic composition, and other parameters are vital to ensuring that the final device is usable. This example creates a decision tree model for the performance of finished devices.
The following statements create a data set named MBE_DATA
, which contains measurements for 20 devices:
data mbe_data; label gtemp = 'Growth Temperature of Substrate'; label atemp = 'Anneal Temperature'; label rot = 'Rotation Speed'; label dopant = 'Dopant Atom'; label usable = 'Experiment Could be Performed'; input gtemp atemp rot dopant $ 38-40 usable $ 46-54; datalines; 384.614 633.172 1.01933 C Unusable 363.874 512.942 0.72057 C Unusable 397.395 671.179 0.90419 C Unusable 389.962 653.940 1.01417 C Unusable 387.763 612.545 1.00417 C Unusable 394.206 617.021 1.07188 Si Usable 387.135 616.035 0.94740 Si Usable 428.783 745.345 0.99087 Si Unusable 399.365 600.932 1.23307 Si Unusable 455.502 648.821 1.01703 Si Unusable 387.362 697.589 1.01623 Ge Usable 408.872 640.406 0.94543 Ge Usable 407.734 628.196 1.05137 Ge Usable 417.343 612.328 1.03960 Ge Usable 482.539 669.392 0.84249 Ge Unusable 367.116 564.246 0.99642 Sn Unusable 398.594 733.839 1.08744 Sn Unusable 378.032 619.561 1.06137 Sn Usable 357.544 606.871 0.85205 Sn Unusable 384.578 635.858 1.12215 Sn Unusable ;
The variables GTEMP
and ATEMP
are temperatures, ROT
is a rotation speed, and DOPANT
is the atom that is used during device growth. The variable USABLE
indicates whether the device is usable.
The following statements create the tree model:
proc hpsplit data=mbe_data maxdepth=1; target usable; input gtemp atemp rot dopant; output importance=import; prune none; run;
There is only one INPUT statement because all of the numeric variables are interval inputs.
The MAXDEPTH=1 option specifies that the tree is to stop splitting when the maximum specified depth of one is reached. In
other words, PROC HPSPLIT tries to split the data by each input variable and then chooses the best variable on which to split
the data. The split that is chosen divides the data into higher and lower incidences of the target variable (USABLE
). The PRUNE statement suppresses pruning because there is only one split.
The OUTPUT statement saves information about variable importance in a data set named IMPORT
. The following statements list the relevant observation in IMPORT
:
proc print data=import(where=(_itype_='Import')); run;
The result of these statements is provided in Output 13.2.1.
Output 13.2.1: Variable Importance of the One-Split Decision Tree
Obs | _TREENUM_ | _CRITERION_ | _OBSMISS_ | _OBSUSED_ | _OBSVALID_ | _OBSTMISS_ | _ITYPE_ | gtemp | atemp | rot | dopant |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | Entropy | 0 | 20 | 0 | 0 | Import | 0 | 0 | 0 | 1 |
The dopant atom is the most important consideration in determining the usability of the sample because the input DOPANT
is used in the one-split decision tree (the other input variables are not used at all.)