IMSTAT Procedure (Analytics)

Example 17: Transforming Variables with Imputation and Binning

Details

This IMSTAT procedure example demonstrates using the TRANSFORM statement to impute values for missing variables, bin the variables into equal weight bins, and then use the FREQUENCY statement to see the distribution of values in the bins.

Program

libname example sasiola host="grid001.example.com" port=10010 tag='hps';

data example.fitness1;  1
   input Oxygen RunTime RunPulse @@;
   datalines;
44.609  11.37  178     45.313  10.07  185
54.297   8.65  156     59.571    .      .
49.874   9.22    .     44.811  11.63  176
  .     11.95  176          .  10.85    .
39.442  13.08  174     60.055   8.63  170
50.541    .      .     37.388  14.03  186
44.754  11.12  176     47.273    .      .
51.855  10.33  166     49.156   8.95  180
40.836  10.95  168     46.672  10.00    .
46.774  10.25    .     50.388  10.08  168
39.407  12.63  174     46.080  11.17  156
45.441   9.63  164       .      8.92    .
45.118  11.08    .     39.203  12.88  168
45.790  10.47  186     50.545   9.93  148
48.673   9.40  186     47.920  11.50  170
47.467  10.50  170
;
run;


proc imstat;
    table ml.fitness1;  2
    transform
      (impute=mean input=(oxygen runtime) name="meanBinned"   bin=bucket(10))  3
      (impute=median input=(runpulse)     name="medianBinned" bin=bucket(10))
       / details score code(file="/data/scorefitness1.sas" comment replace);  4
run;
  
  table example._&TEMPLAST_;
  fetch / format to=5;   5
run;
  
  frequency me:;  6
quit;

Program Description

  1. The data are loaded serially into the server with a DATA step.
  2. The TABLE statement references the in-memory table that was created with the DATA step.
  3. The first transform request is to impute values of the Oxygen and RunTime variables using the mean values for each variable. The imputed values and nonmissing original values are binned into 10 equal width bins. The second transform request imputes values of the RunPulse variable using the median value. Values for that variable are also binned into 10 equal width bins.
  4. The DETAILS option specifies to display information about the bins. The SCORE option specifies to create an in-memory temporary table that has results of imputing and binning the variables. The CODE option, with the FILE= suboption, writes DATA step scoring code to a file on the SAS client.
  5. The TABLE statement references the temporary in-memory table that has the scored results. The FETCH statement displays the first five rows from the table.
  6. The FREQUENCY statement displays the distribution of the three binned variables. Because the NAME= options for both transformation requests begin with "me," meanBinned and medianBinned, the names of the scored variables can be referenced with the colon wildcard operator.

Output

The first display shows the results of the TRANSFORM statement.
Transform statement results
The second display shows the binning information that was requested with the DETAILS option.
Binning information
The third display shows the first five records from the temporary in-memory table that has the scored values. In this case, the original variables, Oxygen, RunPulse, and RunTime, are paired with the imputed and binned values. The NAME= option from each transform request is used as a prefix to the new variables.
Scoring information
The following display shows the results of the FREQUENCY statement for the meanBinned_Oxygen variable. The results for the meanBinned_RunTime and medianBinned_RunPulse variables is similar.
Frequency distribution