Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The TRANSREG Procedure

Example 15.1: Basic Box-Cox Transformations

This example illustrates finding a Box-Cox transformation of some artificial data. Data were generated from the model

y=e^{x + \epsilon}

where \epsilon \sim {\rm N}(0, 1). The transformed data can be fit with a linear model

\log(y)=x + \epsilon

   title 'Basic Box-Cox Example';

   data x;
      do x = 1 to 8 by 0.025;
         y = exp(x + normal(7));
         output;
         end;
      run;

   proc transreg data=x ss2 details;
      title2 'Defaults';
      model boxcox(y) = identity(x);
      run;

Output 15.1.1: Basic Box-Cox Example, Default Output
 
Basic Box-Cox Example
Defaults

The TRANSREG Procedure

Transformation Information for
BoxCox(y)
Lambda   R-Square Log Like  
-3.00   0.03 -4601.01  
-2.75   0.04 -4266.08  
-2.50   0.04 -3934.11  
-2.25   0.05 -3605.75  
-2.00   0.06 -3281.88  
-1.75   0.07 -2963.74  
-1.50   0.10 -2653.14  
-1.25   0.14 -2352.72  
-1.00   0.21 -2066.32  
-0.75   0.34 -1799.25  
-0.50   0.52 -1558.55  
-0.25   0.71 -1360.28  
0.00 + 0.79 -1275.31 <
0.25   0.70 -1382.62  
0.50   0.51 -1589.03  
0.75   0.34 -1834.53  
1.00   0.22 -2105.88  
1.25   0.15 -2397.35  
1.50   0.11 -2704.64  
1.75   0.08 -3024.24  
2.00   0.06 -3353.38  
2.25   0.05 -3689.91  
2.50   0.04 -4032.18  
2.75   0.03 -4378.97  
3.00   0.03 -4729.37  
< - Best Lambda
* - Confidence Interval
+ - Convenient Lambda

PROC TRANSREG correctly selects the log transformation \lambda=0,with a narrow confidence interval. The maximum of the log likelihood function is flagged with the less-than sign (<), and the convenient power parameter of \lambda=0 in the confidence interval is flagged by the plus sign (+). The rest of the output is shown next.

Output 15.1.2: Basic Box-Cox Example, Default Output
 
Basic Box-Cox Example
Defaults

The TRANSREG Procedure

TRANSREG Univariate Algorithm Iteration History for
BoxCox(y)
Iteration
Number
Average
Change
Maximum
Change
R-Square Criterion
Change
Note
1 0.00000 0.00000 0.79064   Converged
 
Algorithm converged.
 
Model Statement Specification Details
Type DF Variable Description Value
Dep 1 BoxCox(y) Lambda Used 0
      Lambda 0
      Log Likelihood -1275.3
      Conv. Lambda 0
      Conv. Lambda LL -1275.3
      CI Limit -1277.2
      Alpha 0.05
Ind 1 Identity(x) DF 1
 
Univariate ANOVA Table Based on the Usual Degrees of Freedom
Source DF Sum of Squares Mean Square F Value Liberal p
Model 1 1145.884 1145.884 1053.66 >= <.0001
Error 279 303.421 1.088    
Corrected Total 280 1449.305      
The above statistics are not adjusted for the fact that the
dependent variable was transformed and so are generally liberal.
 
Root MSE 1.04285 R-Square 0.7906
Dependent Mean 4.49653 Adj R-Sq 0.7899
Coeff Var 23.19225 Lambda 0.0000
 
Univariate Regression Table Based on the Usual Degrees of Freedom
Variable DF Coefficient Type II
Sum of
Squares
Mean Square F Value Liberal p
Intercept 1 0.01551366 0.01 0.01 0.01 >= 0.9185
Identity(x) 1 0.99578183 1145.88 1145.88 1053.66 >= <.0001

The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal.


This next example uses several options. The LAMBDA= option specifies power parameters sparsely from -2 to -0.5 and from 0.5 to 2 just to get the general shape of the log likelihood function in that region. Between -0.5 and 0.5, more power parameters are tried. The CONVENIENT option is specified so that if a power parameter such as \lambda=1 or \lambda=0 is found in the confidence interval, it will be used instead of the optimal power parameter. PARAMETER=2 is specified to add 2 to each y before performing the transformations. ALPHA=0.00001 specifies a wide confidence interval.

   proc transreg data=x ss2 details;
      title2 'Several Options Demonstrated';
      model boxcox(y / lambda=-2 -1 -0.5 to 0.5 by 0.05 1 2
                       convenient
                       parameter=2
                       alpha=0.00001)
          = identity(x);
      run;

Output 15.1.3: Basic Box-Cox Example, Several Options Demonstrated
 
Basic Box-Cox Example
Several Options Demonstrated

The TRANSREG Procedure

Transformation Information for
BoxCox(y)
Lambda   R-Square Log Like  
-2.000   0.22 -2583.73  
-1.000   0.45 -1779.35  
-0.500   0.67 -1439.82  
-0.450   0.70 -1410.51  
-0.400   0.72 -1382.74  
-0.350   0.74 -1356.92  
-0.300   0.76 -1333.59  
-0.250   0.77 -1313.42  
-0.200   0.79 -1297.21  
-0.150   0.79 -1285.83 *
-0.100   0.80 -1280.09 <
-0.050   0.80 -1280.63 *
0.000 + 0.79 -1287.71 *
0.050   0.78 -1301.19  
0.100   0.76 -1320.56  
0.150   0.74 -1345.09  
0.200   0.72 -1373.99  
0.250   0.69 -1406.51  
0.300   0.65 -1442.02  
0.350   0.62 -1480.02  
0.400   0.58 -1520.13  
0.450   0.54 -1562.05  
0.500   0.50 -1605.57  
1.000   0.22 -2105.88  
2.000   0.06 -3320.36  
< - Best Lambda
* - Confidence Interval
+ - Convenient Lambda

The results show that the optimal power parameter is -0.1 but 0 is in the confidence interval, hence a log transformation is chosen. The rest of the output is shown next.

Output 15.1.4: Basic Box-Cox Example, Several Options Demonstrated
 
Basic Box-Cox Example
Several Options Demonstrated

The TRANSREG Procedure

TRANSREG Univariate Algorithm Iteration History for
BoxCox(y)
Iteration
Number
Average
Change
Maximum
Change
R-Square Criterion
Change
Note
1 0.00000 0.00000 0.79238   Converged
 
Algorithm converged.
 
Model Statement Specification Details
Type DF Variable Description Value
Dep 1 BoxCox(y) Lambda Used 0
      Lambda -0.1
      Log Likelihood -1280.1
      Conv. Lambda 0
      Conv. Lambda LL -1287.7
      CI Limit -1289.9
      Alpha 0.00001
      Parameter 2
      Options Convenient Lambda Used
Ind 1 Identity(x) DF 1

 
Basic Box-Cox Example
Several Options Demonstrated

The TRANSREG Procedure

Univariate ANOVA Table Based on the Usual Degrees of Freedom
Source DF Sum of Squares Mean Square F Value Liberal p
Model 1 999.438 999.4381 1064.82 >= <.0001
Error 279 261.868 0.9386    
Corrected Total 280 1261.306      
The above statistics are not adjusted for the fact that the
dependent variable was transformed and so are generally liberal.
 
Root MSE 0.96881 R-Square 0.7924
Dependent Mean 4.61429 Adj R-Sq 0.7916
Coeff Var 20.99591 Lambda 0.0000
 
Univariate Regression Table Based on the Usual Degrees of Freedom
Variable DF Coefficient Type II
Sum of
Squares
Mean Square F Value Liberal p
Intercept 1 0.42939328 8.746 8.746 9.32 >= 0.0025
Identity(x) 1 0.92997620 999.438 999.438 1064.82 >= <.0001

The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.