Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The TRANSREG Procedure

Example 15.2: Graphical Displays of Box-Cox Transformations

This example shows how to make graphical displays of the Box-Cox transformation results. Plots include the log likelihood function with the confidence interval, root mean squared error as a function of the power parameter, R2 as a function of the power parameter, the Box-Cox transformation of the variable y, the original scatter plot based on the untransformed data, and the new scatter plot based on the transformed data. Also, a condensed version of the log likelihood table with the confidence interval is printed.

   title h=1 'Box-Cox Graphical Displays';

   data x;
      input y x @@;
      datalines;
   10.0  3.0  72.6  8.3  59.7  8.1  20.1  4.8  90.1  9.8   1.1  0.9
   78.2  8.5  87.4  9.0   9.5  3.4   0.1  1.4   0.1  1.1  42.5  5.1
   57.0  7.5   9.9  1.9   0.5  1.0 121.1  9.9  37.5  5.9  49.5  6.7
    8.3  1.8   0.6  1.8  53.0  6.7 112.8 10.0  40.7  6.4   5.1  2.4
   73.3  9.5 122.4  9.9  87.2  9.4 121.2  9.9  23.1  4.3   7.1  3.5
   12.4  3.3   5.6  2.7 113.0  9.6 110.5 10.0   3.1  1.5  52.4  7.9
   80.4  8.1   0.6  1.6 115.1  9.1  15.9  3.1  56.5  7.3  85.4  9.8
   32.5  5.8  43.0  6.2   0.1  0.8  21.8  5.2  15.2  3.5   5.2  3.0
    0.2  0.8  73.5  8.2   4.9  3.2   0.2  0.3  69.0  9.2   3.6  3.5
    0.2  0.9 101.3  9.9  10.0  3.7  16.9  3.0  11.2  5.0   0.2  0.4
   80.8  9.4  24.9  5.7 113.5  9.7   6.2  2.1  12.5  3.2   4.8  1.8
   80.1  8.3  26.4  4.8  13.4  3.8  99.8  9.7  44.1  6.2  15.3  3.8
    2.2  1.5  10.3  2.7  13.8  4.7  38.6  4.5  79.1  9.8  33.6  5.8
    9.1  4.5  89.3  9.1   5.5  2.6  20.0  4.8   2.9  2.9  82.9  8.4
    7.0  3.5  14.5  2.9  16.0  3.7  29.3  6.1  48.9  6.3   1.6  1.9
   34.7  6.2  33.5  6.5  26.0  5.6  12.7  3.1   0.1  0.3  15.4  4.2
    2.6  1.8  58.6  7.9  81.2  8.1  37.2  6.9
   ;

The TRANSREG procedure is run to find the Box-Cox transformation. The lambda list is -2 TO 2 BY 0.01, which produces 401 lambdas. This many power parameters makes a nice graphical display with plenty of detail around the confidence interval. However, 401 values is a lot to print, so for this reason, the usual Box-Cox transformation information table is excluded from the printed output. Instead, it is output to a SAS data set using ODS so a sample of it can be printed. Just the confidence interval and the rows corresponding to power parameters that are multiples of 0.5 are printed. Null labels are provided for the columns that need to be printed without headers. The details table is also output to a SAS data set using ODS, since it contains information that will be incorporated into some of the plots.

   * Fit Box-Cox model, output results to output data sets;
   ods output boxcox=b details=d;
   ods exclude boxcox;
   proc transreg details data=x;
      model boxcox(y / convenient lambda=-2 to 2 by 0.01) = identity(x);
      output out=trans;
      run;

   proc print noobs label data=b(drop=rmse);
      title2 'Confidence Interval';
      where ci ne ' ' or abs(lambda - round(lambda, 0.5)) < 1e-6;
      label convenient = '00'x ci = '00'x;
      run;

Output 15.2.1: Box-Cox Graphical Displays
 
Box-Cox Graphical Displays

The TRANSREG Procedure

TRANSREG Univariate Algorithm Iteration History for
BoxCox(y)
Iteration
Number
Average
Change
Maximum
Change
R-Square Criterion
Change
Note
1 0.00000 0.00000 0.95396   Converged
 
Algorithm converged.
 
Model Statement Specification Details
Type DF Variable Description Value
Dep 1 BoxCox(y) Lambda Used 0.5
      Lambda 0.46
      Log Likelihood -167.0
      Conv. Lambda 0.5
      Conv. Lambda LL -168.3
      CI Limit -169.0
      Alpha 0.05
      Options Convenient Lambda Used
Ind 1 Identity(x) DF 1

 
Box-Cox Graphical Displays
Confidence Interval

Lambda   R-Square Log Like  
-2.00   0.14 -1030.56  
-1.50   0.17 -810.50  
-1.00   0.22 -602.53  
-0.50   0.39 -415.56  
0.00   0.78 -257.92  
0.41   0.95 -168.40 *
0.42   0.95 -167.86 *
0.43   0.95 -167.46 *
0.44   0.95 -167.19 *
0.45   0.95 -167.05 *
0.46   0.95 -167.04 <
0.47   0.95 -167.16 *
0.48   0.95 -167.41 *
0.49   0.95 -167.79 *
0.50 + 0.95 -168.28 *
0.51   0.95 -168.89 *
1.00   0.89 -253.09  
1.50   0.79 -345.35  
2.00   0.70 -435.01  

These next steps extract information from the Box-Cox transformation and details tables, and store the information in macro variables. The confidence interval limit from the details table provides a vertical axis reference line for the log likelihood plot. The convenient power parameter ('Lambda Used') is extracted from the footnote. The confidence interval is extracted from the confidence interval observations of the Box-Cox transformation table and will be used in the footnote and for horizontal axis reference lines in the log likelihood plot.

   * Store values for reference lines;
   data _null_;
      set d;
      if description = 'CI Limit'
         then call symput('vref',   formattedvalue);
      if description = 'Lambda Used'
         then call symput('lambda', formattedvalue);
      run;

   data _null_;
      set b end=eof;
      where ci ne ' ';
      if _n_ = 1
         then call symput('href1', compress(put(lambda, best12.)));
      if ci  = '<'
         then call symput('href2', compress(put(lambda, best12.)));
      if eof
         then call symput('href3', compress(put(lambda, best12.)));
      run;

These steps plot the log likelihood, root mean square error, and R2. The input data set is the Box-Cox transformation table, which was output using ODS.

   * Plot log likelihood, confidence interval;
   axis1 label=(angle=90 rotate=0) minor=none;
   axis2 minor=none;
   proc gplot data=b;
      title2 'Log Likelihood';
      plot loglike * lambda / vref=&vref href=&href1 &href2 &href3
                              vaxis=axis1 haxis=axis2 frame cframe=ligr;
      footnote "Confidence Interval: &href1 - &href2 - &href3, "
               "Lambda = &lambda";
      symbol v=none i=spline c=blue;
      run;

      footnote;
      title2 'RMSE';
      plot rmse * lambda / vaxis=axis1 haxis=axis2 frame cframe=ligr;
      run;

      title2 'R-Square';
      plot rsquare * lambda / vaxis=axis1 haxis=axis2 frame cframe=ligr;
      axis1 order=(0 to 1 by 0.1) label=(angle=90 rotate=0) minor=none;
      run; quit;

Output 15.2.2: Box-Cox Graphical Displays
trgg5.gif (4517 bytes)

trgg6.gif (3183 bytes)

trgg7.gif (3463 bytes)

The optimal power parameter is 0.46, but since 0.5 is in the confidence interval, and since the CONVENIENT option was specified, the procedure chooses a square root transformation.

The next steps plot the transformation of y, the original scatter plot based on the untransformed data, and the new scatter plot based on the transformed data. The input data set is the ordinary output data set from PROC TRANSREG. The transformation of the variable y by default is ty.

   axis1 label=(angle=90 rotate=0) minor=none;
   axis2 minor=none;
   proc gplot data=trans;
      title2 'Transformation';
      symbol i=splines v=star c=blue;
      plot ty * y / vaxis=axis1 haxis=axis2 frame cframe=ligr; 
      run;

      title2 'Original Scatter Plot';
      symbol i=none v=star c=blue;
      plot y * x / vaxis=axis1 haxis=axis2 frame cframe=ligr;
      run;

      title2 'Transformed Scatter Plot';
      symbol i=none v=star c=blue;
      plot ty * x / vaxis=axis1 haxis=axis2 frame cframe=ligr;
      run; quit;

Output 15.2.3: Box-Cox Graphical Displays
trgg8.gif (3677 bytes)

trgg9.gif (3896 bytes)

trgg10.gif (3963 bytes)

The square root transformation makes the scatter plot more linear.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.