Classification Variables and the SPLIT Option

PROC HPQUANTSELECT supports the ability to split classification variables when you do model selection. You use the SPLIT option in the CLASS statement to specify that the columns of the design matrix that correspond to effects that contain a split classification variable can enter or leave a model independently of the other design columns of that effect. The following statements illustrate the use of the SPLIT option:

data splitExample;
   length C2 $6;
   drop i;
   do i=1 to 1000;
     C1 = 1 + mod(i,6);
     if      i < 250 then C2 = 'Low';
     else if i < 500 then C2 = 'Medium';
     else                 C2 = 'High';
     x1 = ranuni(1);
     x2 = ranuni(1);
     y = x1+3*(C2 ='low')  + 10*(C1=3) +5*(C1=5) + rannor(1);

proc hpquantselect data=splitExample;
   class C1(split) C2(order=data);
   model y = C1 C2 x1 x2/orderselect clb;
   selection method=forward;

The "Class Levels" table in Figure 59.14 is produced by default whenever you specify a CLASS statement.

Figure 59.14: Class Levels


Class Level Information
Class Levels   Values
C1 6 * 1 2 3 4 5 6
C2 3   Low Medium High

* Associated Parameters Split

The SPLIT option has been specified for the classification variable C1. This permits the parameters that are associated with the effect C1 to enter or leave the model individually. The "Parameter Estimates" table in Figure 59.15 shows that for this example the parameters that correspond to only levels 3 and 5 of C1 are in the selected model. Finally, note that the ORDERSELECT option in the MODEL statement displays the parameters in the order in which they first entered the model.

Figure 59.15: Parameter Estimates

Parameter Estimates
Parameter DF Estimate Standard
95% Confidence Limits t Value Pr > |t|
Intercept 1 -0.21596 0.09024 -0.39304 -0.03887 -2.39 0.0169
C1_3 1 10.08952 0.09852 9.89619 10.28285 102.41 <.0001
C1_5 1 5.04115 0.10835 4.82854 5.25376 46.53 <.0001
x1 1 1.29863 0.14014 1.02363 1.57363 9.27 <.0001