Variable Transformations |
Many statistical analyses assume that the data are normally distributed. If a variable is not normally distributed, it is often possible to improve normality by using an appropriate transformation of the variable. The three transformations used most often for this purpose are the logarithmic, square root, and inverse transformations.
In this example, you apply a logarithmic transformation to the driltime variable of the Miningx data set. Note that the driltime variable is nonnegative, so a logarithmic transformation is well-defined.
Open the Miningx data set. |
Create a histogram of the driltime variable. |
The histogram is shown in Figure 32.1.
Figure 32.1: Histogram of Drilling Time
Clearly, the driltime
variable is not normally distributed. You might explore
whether some transformation of driltime is approximately normal.
To begin, you might try a logarithmic transformation.
Select Analysis Variable Transformation from the main menu. |
The Variable Transformation Wizard in Figure 32.2
appears.
Figure 32.2: Selecting a Transformation
The first page of the wizard enables you to select a transformation
family and a specific transformation within that family. The
logarithmic transformation is available from several items in the
Family list, including the Common family. This transformation
is of the form , so you need to specify the variable
and the parameter .
The transformation log(Y+a) is highlighted by default. Since this is the desired transformation, you can proceed to the next page of the wizard.
Click Next. |
The wizard displays the page shown in Figure 32.3. Note that the
transformation appears on the page's title bar.
Figure 32.3: Selecting a Variable and a Parameter
Select the driltime variable, and click Set Y. |
The parameter is an offset that is useful if your variable contains nonpositive values. For these data, you can accept the default value of .
Click Next. |
The wizard displays the page shown in Figure 32.4. You can use
this page to specify a variable name (and, optionally, a label) for
the new variable.
Figure 32.4: Specifying the Variable Name and Label
For this example, you can accept the default variable name.
Click Finish. |
Stat Studio adds the new variable, Log_driltime, as the last variable in the data set. You can horizontally scroll through the data table to see the variable.
To complete this example, you can visualize the distribution of the new variable.
Create a histogram of the Log_driltime variable. |
The histogram (Figure 32.5)
shows improved normality, but the transformed
data distribution is still skewed to the right.
Figure 32.5: A Histogram of the Transformed Data
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.