Bin Continuous Data Task

About the Bin Continuous Data Task

The Bin Continuous Data task is a data preparation task. This task divides the data values of a continuous variable into intervals and replaces the values for each interval with a single value that is representative of the interval.
Note: This task is available only if you are running SAS 9.4 or later and if you have SAS/STAT.

Example: Winsorized Binning

In this example, the task provides the basic Winsorized statistical information for the input data.
To create this example:
  1. To create the Work.Ex12 data set, enter this code into a Program tab:
    data ex12;
       length id 8;
       do id=1 to 10000;
          x1 = ranuni(101);
          x2 = 10*ranuni(201);
          x3 = 100*ranuni(301);
          output;
       end;
    run;
    Click Submit SAS code.
  2. In the Tasks section, expand the High-Performance Statistics folder and double-click Bin Continuous Data. The user interface for the Bin Continuous Data task opens.
  3. On the Data tab, select the WORK.EX12 data set.
  4. To the Variables to bin role, assign the x1 and x2 columns.
  5. Select the Options tab and set these options:
    • In the Number of bins box, enter 10.
    • From the Method drop-down list, select Winsorized binning.
  6. To run the task, click Submit SAS code.
Here is a subset of the results:
Performance Information, Binning Information, and Mapping

Assigning Data to Roles

To run the Bin Continuous Data task, you must assign a variable to the Variables to bin role.
Role
Description
Roles
Variables to bin
specifies one or more variables as input variables for binning. The specified variables must be interval variables.
Additional Roles
Frequency count
specifies a numeric variable that contains the frequency of occurrence for each observation. If the frequency value is less than 1 or is missing, the observation is not used in the analysis. If no variable is assigned to the Frequency count role, each observation is assigned a frequency of 1.

Setting Options

Option Name
Description
Methods
Number of bins
specifies the global number of binning levels for all binning variables. This value can be any integer between 2 and 1,000, inclusive. The default number of binning levels is 16.
Method
specifies which binning method to use.
  • Bucket binning creates equal-length bins and assigns the data to one of these bins. You can choose the number of bins during the binning. The default number of bins (the binning level) is 16.
  • Winsorized binning is similar to bucket binning except that both tails are cut off to obtain a smooth binning result. This technique is often used to remove outliers during the data preparation stage.
    You must specify a value for the Winsor rate option. Valid values are from 0.0 to 0.5 (exclusive). The default value is 0.05.
  • Pseudo-quantile binning mimics the results of the quantile binning method but is more efficient by consuming less CPU time and memory.
Statistics
Select statistics to display
In the results, you can specify whether to include statistics.
Here are the additional statistics that you can include:
  • Basic statistics displays the mean, pseudo-median, standard deviation, minimum, maximum, and number of bins for each binning variable.
  • Quantile statistics displays the estimated quantiles and extremes table.

Creating an Output Data Set

You can specify whether to save the results to an output data set. In the Additional variables to include in the output data set role, specify any columns from the input data set that you want to include in the output data set.
To view all or a subset of the output data set in the results, select Show output data.