Standardize Data Task

About the Standardize Data Task

The Standardize Data task enables you to center or standardize one or more numeric variables by using a variety of methods. The standardized variables are saved in an output data set.

Example: Standardizing Variables in the SASHELP.BASEBALL Data Set

To create this example:
  1. In the Tasks section, expand the Data folder and double-click Standardize Data. The user interface for the Standardize Data task opens.
  2. On the Data tab, select the SASHELP.BASEBALL data set.
  3. Assign the nHits column to the Variables to standardize role.
  4. To run the task, click Submit SAS Code.
Here is a subset of the output data:
A Subset of the Output Data for the Standardize Data Task

Assigning Data to Roles

To run the Standardize Data task, you must assign a column to the Variables to standardize role.
Role
Description
Roles
Variables to standardize
lists the numeric variables to be standardized.
Additional Roles
Frequency count
is the variable that contains the frequency of occurrence for other values in the observation. The task treats the data set as if each observation appeared n times, where n is the value of the Frequency count variable for the observation.
Weight
specifies a numeric variable in the input data set with values that are used to weight each observation. These values can be nonintegers. An observation is used in the analysis only if the value of the Weight variable is greater than zero.
Group analysis by
creates separate analyses of observations in the groups that are defined by the BY variables.

Setting the Options

Option Name
Description
Methods
Center data only
specifies that you want to use either the mean or median standardization method.
Standardization method
specifies that you want to use one of these standardization methods:
  • Standard deviation (which is the default and the method most often associated with standardization)
  • Andrew’s wave estimate. The tuning constant for this method must be greater than 0. The default value is 4.7.
  • Euclidean length
  • Huber’s estimate. The tuning constant for this method must be greater than 0. The default value is 1.
  • Interquartile range
  • Range
  • Sum
  • Tukey’s biweight estimate. The tuning constant for this method must be greater than 0. The default value is 6. (Goodall 1983)
Treatment of Missing Values
Missing values method
specifies whether to omit observations with a missing value or to replace the missing value. You can replace the missing value with one of these options:
  • Default location measure, which is the location measure used by the selected centering or standardization method
  • Mean
  • Median
  • Minimum
  • Specify custom value, which enables you to specify the value for all variables that are being standardized
Statistics
Display location and scale measures
displays the location and scale measures in the results. These measures give you an idea of what the standardization process accomplished.

Setting the Output Options

By default, the Standardize Data task creates an output data set that includes both the original and standardized variables. You can add a prefix to the variable names to differentiate between the original and standardized variables. By default, the task adds the Standardize_ prefix to the standardized variable.
The Show output data option specifies whether to include the output data in the results that appear on the Results tab. You can include all or a subset of the output data. The task always creates the output data set that appears on the Output Data tab. This data set is also saved to the specified location..