Transform Data Task

About the Transform Data Task

The Transform Data task enables you to transform one or more variables in the input data set. These transformed variables are saved to an output data set.

Example: Transforming the Data in the BASEBALL Data Set

  1. In the Tasks section, expand the Data folder and double-click Transform Data. The user interface for the Transform Data task opens.
  2. On the Data tab, select SASHELP.BASEBALL as the input data set.
    This figure shows a subset of the data for the Name, nRuns, and Salary columns.
    Name, nRuns, and Salary Columns in the Sashelp.Baseball Data Set
  3. To transform the data in the nRuns column, complete these steps under the Transform 1 heading:
    1. Assign the nRuns column to the Variable 1 role.
    2. From the Transform drop-down list, select Natural log.
  4. To convert the values in the Salary column to dollars, complete these steps under the Transform 2 heading:
    1. Assign the Salary column to the Variable 2 role.
    2. From the Transform drop-down list, select Specify custom transformation.
    3. In the Custom transform box, enter Salary*1000.
  5. To run the task, click Submit SAS Code.
The output data set contains two additional columns. The log_nRuns column lists the values of the natural log of the values in the nRuns column. The tr2_Salary columns contains the values from the Salary column multiplied by 1,000.
Subset of Work.Transform Data Set

Transforming Columns from the Input Data Set

Using the Transform Data task, you can transform up to three columns from your input data set. To run the Transform Data task, you must assign a column to the Variable 1 role.
Role
Description
Transform n
Variable n
specifies the variable to transform.
Transform
specifies the transform to use. Here are the available transforms:
  • Inverse square
  • Inverse
  • Inverse square root
  • Natural log
  • Square root
  • Square
To create your own transformation, select Specify custom transformation. An example of a custom transformation is Salary*1000.
Output Data Set
Show output data
specifies whether to include the output data in the results that appear on the Results tab. You can include all or a subset of the output data. The task always creates the output data set that appears on the Output Data tab. This data set is also saved to the specified location.