 The TRANSREG Procedure

## Example 15.1: Basic Box-Cox Transformations

This example illustrates finding a Box-Cox transformation of some artificial data. Data were generated from the model

where . The transformed data can be fit with a linear model

   title 'Basic Box-Cox Example';

data x;
do x = 1 to 8 by 0.025;
y = exp(x + normal(7));
output;
end;
run;

proc transreg data=x ss2 details;
title2 'Defaults';
model boxcox(y) = identity(x);
run;


Output 15.1.1: Basic Box-Cox Example, Default Output

 Basic Box-Cox Example Defaults

 Transformation Information forBoxCox(y) Lambda R-Square Log Like -3.00 0.03 -4601.01 -2.75 0.04 -4266.08 -2.50 0.04 -3934.11 -2.25 0.05 -3605.75 -2.00 0.06 -3281.88 -1.75 0.07 -2963.74 -1.50 0.10 -2653.14 -1.25 0.14 -2352.72 -1.00 0.21 -2066.32 -0.75 0.34 -1799.25 -0.50 0.52 -1558.55 -0.25 0.71 -1360.28 0.00 + 0.79 -1275.31 < 0.25 0.70 -1382.62 0.50 0.51 -1589.03 0.75 0.34 -1834.53 1.00 0.22 -2105.88 1.25 0.15 -2397.35 1.50 0.11 -2704.64 1.75 0.08 -3024.24 2.00 0.06 -3353.38 2.25 0.05 -3689.91 2.50 0.04 -4032.18 2.75 0.03 -4378.97 3.00 0.03 -4729.37 < - Best Lambda* - Confidence Interval+ - Convenient Lambda

PROC TRANSREG correctly selects the log transformation ,with a narrow confidence interval. The maximum of the log likelihood function is flagged with the less-than sign (<), and the convenient power parameter of in the confidence interval is flagged by the plus sign (+). The rest of the output is shown next.

 TRANSREG Univariate Algorithm Iteration History forBoxCox(y) IterationNumber AverageChange MaximumChange R-Square CriterionChange Note 1 0.00000 0.00000 0.79064 Converged

 Algorithm converged.

 Model Statement Specification Details Type DF Variable Description Value Dep 1 BoxCox(y) Lambda Used 0 Lambda 0 Log Likelihood -1275.3 Conv. Lambda 0 Conv. Lambda LL -1275.3 CI Limit -1277.2 Alpha 0.05 Ind 1 Identity(x) DF 1

 Univariate ANOVA Table Based on the Usual Degrees of Freedom Source DF Sum of Squares Mean Square F Value Liberal p Model 1 1145.884 1145.884 1053.66 >= <.0001 Error 279 303.421 1.088 Corrected Total 280 1449.305 The above statistics are not adjusted for the fact that thedependent variable was transformed and so are generally liberal.

 Root MSE 1.04285 R-Square 0.7906 Dependent Mean 4.49653 Adj R-Sq 0.7899 Coeff Var 23.1923 Lambda 0

 Univariate Regression Table Based on the Usual Degrees of Freedom Variable DF Coefficient Type IISum ofSquares Mean Square F Value Liberal p Intercept 1 0.01551366 0.01 0.01 0.01 >= 0.9185 Identity(x) 1 0.99578183 1145.88 1145.88 1053.66 >= <.0001

 The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal.

This next example uses several options. The LAMBDA= option specifies power parameters sparsely from -2 to -0.5 and from 0.5 to 2 just to get the general shape of the log likelihood function in that region. Between -0.5 and 0.5, more power parameters are tried. The CONVENIENT option is specified so that if a power parameter such as or is found in the confidence interval, it will be used instead of the optimal power parameter. PARAMETER=2 is specified to add 2 to each y before performing the transformations. ALPHA=0.00001 specifies a wide confidence interval.

   proc transreg data=x ss2 details;
title2 'Several Options Demonstrated';
model boxcox(y / lambda=-2 -1 -0.5 to 0.5 by 0.05 1 2
convenient
parameter=2
alpha=0.00001)
= identity(x);
run;


Output 15.1.3: Basic Box-Cox Example, Several Options Demonstrated

 Basic Box-Cox Example Several Options Demonstrated

 Transformation Information forBoxCox(y) Lambda R-Square Log Like -2.000 0.22 -2583.73 -1.000 0.45 -1779.35 -0.500 0.67 -1439.82 -0.450 0.70 -1410.51 -0.400 0.72 -1382.74 -0.350 0.74 -1356.92 -0.300 0.76 -1333.59 -0.250 0.77 -1313.42 -0.200 0.79 -1297.21 -0.150 0.79 -1285.83 * -0.100 0.80 -1280.09 < -0.050 0.80 -1280.63 * 0.000 + 0.79 -1287.71 * 0.050 0.78 -1301.19 0.100 0.76 -1320.56 0.150 0.74 -1345.09 0.200 0.72 -1373.99 0.250 0.69 -1406.51 0.300 0.65 -1442.02 0.350 0.62 -1480.02 0.400 0.58 -1520.13 0.450 0.54 -1562.05 0.500 0.50 -1605.57 1.000 0.22 -2105.88 2.000 0.06 -3320.36 < - Best Lambda* - Confidence Interval+ - Convenient Lambda

The results show that the optimal power parameter is -0.1 but 0 is in the confidence interval, hence a log transformation is chosen. The rest of the output is shown next.

 TRANSREG Univariate Algorithm Iteration History forBoxCox(y) IterationNumber AverageChange MaximumChange R-Square CriterionChange Note 1 0.00000 0.00000 0.79238 Converged

 Algorithm converged.

 Model Statement Specification Details Type DF Variable Description Value Dep 1 BoxCox(y) Lambda Used 0 Lambda -0.1 Log Likelihood -1280.1 Conv. Lambda 0 Conv. Lambda LL -1287.7 CI Limit -1289.9 Alpha 0.00001 Parameter 2 Options Convenient Lambda Used Ind 1 Identity(x) DF 1

 Univariate ANOVA Table Based on the Usual Degrees of Freedom Source DF Sum of Squares Mean Square F Value Liberal p Model 1 999.438 999.4381 1064.82 >= <.0001 Error 279 261.868 0.9386 Corrected Total 280 1261.306 The above statistics are not adjusted for the fact that thedependent variable was transformed and so are generally liberal.

 Root MSE 0.96881 R-Square 0.7924 Dependent Mean 4.61429 Adj R-Sq 0.7916 Coeff Var 20.9959 Lambda 0

 Univariate Regression Table Based on the Usual Degrees of Freedom Variable DF Coefficient Type IISum ofSquares Mean Square F Value Liberal p Intercept 1 0.42939328 8.746 8.746 9.32 >= 0.0025 Identity(x) 1 0.92997620 999.438 999.438 1064.82 >= <.0001

 The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal.

