Example 15.1: Basic Box-Cox Transformations
This example illustrates finding a Box-Cox transformation of some artificial
data. Data were generated from the model

where
. The transformed data can be fit with a
linear model

title 'Basic Box-Cox Example';
data x;
do x = 1 to 8 by 0.025;
y = exp(x + normal(7));
output;
end;
run;
proc transreg data=x ss2 details;
title2 'Defaults';
model boxcox(y) = identity(x);
run;
Output 15.1.1: Basic Box-Cox Example, Default Output
| Basic Box-Cox Example |
| Defaults |
Transformation Information for BoxCox(y) |
| Lambda |
|
R-Square |
Log Like |
|
| -3.00 |
|
0.03 |
-4601.01 |
|
| -2.75 |
|
0.04 |
-4266.08 |
|
| -2.50 |
|
0.04 |
-3934.11 |
|
| -2.25 |
|
0.05 |
-3605.75 |
|
| -2.00 |
|
0.06 |
-3281.88 |
|
| -1.75 |
|
0.07 |
-2963.74 |
|
| -1.50 |
|
0.10 |
-2653.14 |
|
| -1.25 |
|
0.14 |
-2352.72 |
|
| -1.00 |
|
0.21 |
-2066.32 |
|
| -0.75 |
|
0.34 |
-1799.25 |
|
| -0.50 |
|
0.52 |
-1558.55 |
|
| -0.25 |
|
0.71 |
-1360.28 |
|
| 0.00 |
+ |
0.79 |
-1275.31 |
< |
| 0.25 |
|
0.70 |
-1382.62 |
|
| 0.50 |
|
0.51 |
-1589.03 |
|
| 0.75 |
|
0.34 |
-1834.53 |
|
| 1.00 |
|
0.22 |
-2105.88 |
|
| 1.25 |
|
0.15 |
-2397.35 |
|
| 1.50 |
|
0.11 |
-2704.64 |
|
| 1.75 |
|
0.08 |
-3024.24 |
|
| 2.00 |
|
0.06 |
-3353.38 |
|
| 2.25 |
|
0.05 |
-3689.91 |
|
| 2.50 |
|
0.04 |
-4032.18 |
|
| 2.75 |
|
0.03 |
-4378.97 |
|
| 3.00 |
|
0.03 |
-4729.37 |
|
< - Best Lambda * - Confidence Interval + - Convenient Lambda |
|
PROC TRANSREG correctly selects the log transformation
,with a narrow confidence interval. The maximum of the log likelihood
function is flagged with the less-than sign (<), and the convenient power
parameter of
in the confidence interval is flagged by the plus
sign (+). The rest of the output is shown next.
Output 15.1.2: Basic Box-Cox Example, Default Output
| Basic Box-Cox Example |
| Defaults |
TRANSREG Univariate Algorithm Iteration History for BoxCox(y) |
Iteration Number |
Average Change |
Maximum Change |
R-Square |
Criterion Change |
Note |
| 1 |
0.00000 |
0.00000 |
0.79064 |
|
Converged |
| Model Statement Specification Details |
| Type |
DF |
Variable |
Description |
Value |
| Dep |
1 |
BoxCox(y) |
Lambda Used |
0 |
| |
|
|
Lambda |
0 |
| |
|
|
Log Likelihood |
-1275.3 |
| |
|
|
Conv. Lambda |
0 |
| |
|
|
Conv. Lambda LL |
-1275.3 |
| |
|
|
CI Limit |
-1277.2 |
| |
|
|
Alpha |
0.05 |
| Ind |
1 |
Identity(x) |
DF |
1 |
| Univariate ANOVA Table Based on the Usual Degrees of Freedom |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Liberal p |
| Model |
1 |
1145.884 |
1145.884 |
1053.66 |
>= <.0001 |
| Error |
279 |
303.421 |
1.088 |
|
|
| Corrected Total |
280 |
1449.305 |
|
|
|
The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal. |
| Root MSE |
1.04285 |
R-Square |
0.7906 |
| Dependent Mean |
4.49653 |
Adj R-Sq |
0.7899 |
| Coeff Var |
23.19225 |
Lambda |
0.0000 |
| Univariate Regression Table Based on the Usual Degrees of Freedom |
| Variable |
DF |
Coefficient |
Type II Sum of Squares |
Mean Square |
F Value |
Liberal p |
| Intercept |
1 |
0.01551366 |
0.01 |
0.01 |
0.01 |
>= 0.9185 |
| Identity(x) |
1 |
0.99578183 |
1145.88 |
1145.88 |
1053.66 |
>= <.0001 |
| The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal. |
|
This next example uses several options.
The LAMBDA= option specifies power parameters
sparsely from -2 to -0.5 and from 0.5 to 2 just to get the general shape of
the log likelihood function in that region. Between -0.5 and 0.5, more
power parameters are tried. The CONVENIENT option is specified so that
if a power parameter such as
or
is found in the
confidence interval, it will be used instead of the optimal power
parameter. PARAMETER=2 is specified to add 2 to each y before
performing the transformations. ALPHA=0.00001 specifies a wide
confidence interval.
proc transreg data=x ss2 details;
title2 'Several Options Demonstrated';
model boxcox(y / lambda=-2 -1 -0.5 to 0.5 by 0.05 1 2
convenient
parameter=2
alpha=0.00001)
= identity(x);
run;
Output 15.1.3: Basic Box-Cox Example, Several Options Demonstrated
| Basic Box-Cox Example |
| Several Options Demonstrated |
Transformation Information for BoxCox(y) |
| Lambda |
|
R-Square |
Log Like |
|
| -2.000 |
|
0.22 |
-2583.73 |
|
| -1.000 |
|
0.45 |
-1779.35 |
|
| -0.500 |
|
0.67 |
-1439.82 |
|
| -0.450 |
|
0.70 |
-1410.51 |
|
| -0.400 |
|
0.72 |
-1382.74 |
|
| -0.350 |
|
0.74 |
-1356.92 |
|
| -0.300 |
|
0.76 |
-1333.59 |
|
| -0.250 |
|
0.77 |
-1313.42 |
|
| -0.200 |
|
0.79 |
-1297.21 |
|
| -0.150 |
|
0.79 |
-1285.83 |
* |
| -0.100 |
|
0.80 |
-1280.09 |
< |
| -0.050 |
|
0.80 |
-1280.63 |
* |
| 0.000 |
+ |
0.79 |
-1287.71 |
* |
| 0.050 |
|
0.78 |
-1301.19 |
|
| 0.100 |
|
0.76 |
-1320.56 |
|
| 0.150 |
|
0.74 |
-1345.09 |
|
| 0.200 |
|
0.72 |
-1373.99 |
|
| 0.250 |
|
0.69 |
-1406.51 |
|
| 0.300 |
|
0.65 |
-1442.02 |
|
| 0.350 |
|
0.62 |
-1480.02 |
|
| 0.400 |
|
0.58 |
-1520.13 |
|
| 0.450 |
|
0.54 |
-1562.05 |
|
| 0.500 |
|
0.50 |
-1605.57 |
|
| 1.000 |
|
0.22 |
-2105.88 |
|
| 2.000 |
|
0.06 |
-3320.36 |
|
< - Best Lambda * - Confidence Interval + - Convenient Lambda |
|
The results show that the optimal power parameter is -0.1 but 0 is in
the confidence interval, hence a log transformation is chosen. The rest
of the output is shown next.
Output 15.1.4: Basic Box-Cox Example, Several Options Demonstrated
| Basic Box-Cox Example |
| Several Options Demonstrated |
TRANSREG Univariate Algorithm Iteration History for BoxCox(y) |
Iteration Number |
Average Change |
Maximum Change |
R-Square |
Criterion Change |
Note |
| 1 |
0.00000 |
0.00000 |
0.79238 |
|
Converged |
| Model Statement Specification Details |
| Type |
DF |
Variable |
Description |
Value |
| Dep |
1 |
BoxCox(y) |
Lambda Used |
0 |
| |
|
|
Lambda |
-0.1 |
| |
|
|
Log Likelihood |
-1280.1 |
| |
|
|
Conv. Lambda |
0 |
| |
|
|
Conv. Lambda LL |
-1287.7 |
| |
|
|
CI Limit |
-1289.9 |
| |
|
|
Alpha |
0.00001 |
| |
|
|
Parameter |
2 |
| |
|
|
Options |
Convenient Lambda Used |
| Ind |
1 |
Identity(x) |
DF |
1 |
|
| Basic Box-Cox Example |
| Several Options Demonstrated |
| Univariate ANOVA Table Based on the Usual Degrees of Freedom |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Liberal p |
| Model |
1 |
999.438 |
999.4381 |
1064.82 |
>= <.0001 |
| Error |
279 |
261.868 |
0.9386 |
|
|
| Corrected Total |
280 |
1261.306 |
|
|
|
The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal. |
| Root MSE |
0.96881 |
R-Square |
0.7924 |
| Dependent Mean |
4.61429 |
Adj R-Sq |
0.7916 |
| Coeff Var |
20.99591 |
Lambda |
0.0000 |
| Univariate Regression Table Based on the Usual Degrees of Freedom |
| Variable |
DF |
Coefficient |
Type II Sum of Squares |
Mean Square |
F Value |
Liberal p |
| Intercept |
1 |
0.42939328 |
8.746 |
8.746 |
9.32 |
>= 0.0025 |
| Identity(x) |
1 |
0.92997620 |
999.438 |
999.438 |
1064.82 |
>= <.0001 |
| The above statistics are not adjusted for the fact that the dependent variable was transformed and so are generally liberal. |
|
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.