Usage Note 24205: Rare event oversampling for model fitting in SAS Enterprise Miner
One way to bias the classification of the rare event is to oversample the rare event. Simply speaking, you need to put a higher proportion of the rare event in the sample than exists in the population.
The following steps outline how to create a flow depicting a rare event case analysis in the SAS Enterprise Miner 4.x series:
- Specify the data to use in the Input Data Source node by changing the target variable from a value of input to target.
- To get the correct lift charts for the data set, open the target profiler (right-click the target variable and select Edit Target Profile), go to the Prior tab, select Proportional to Data, right-click and select Set to use.
Save your changes and exit the Input Data Source node.
- To oversample the rare event, add a sampling node. The General tab has options for sampling methods. Select the Stratified method.
- Go to the Stratification tab to set the target variable to be used for stratifying the data for oversampling.
- On the Options subtab, select the Equal Size criteria. This causes the sample to use all of the observations with target variable = 1, and a randomly sampled equal number of observations with target variable = 0. Do not select the Adjust frequency for oversampling check box. Close and save the settings for the sampling node.
- *Optional*- Add a Data Partition node. This will help give a generalized model for the data.
- Add a modeling node such as regression or tree.
- Run the flow.
To create a flow that depicts a rare event case analysis in the SAS Enterprise Miner 5.x series, complete the following steps:
- Create the data source. Be sure to define a target variable.
- To oversample the rare event, add a sampling node and attach it to the Input Data Source node. Set the sample method property to Stratify.
- Click the ... button beside the variables and choose your target variable as the stratification variable.
- In the Stratified property section, set the criterion property to Equal. This causes the sample to use all of the observations with Y = 1, and a randomly sampled equal number of observations with Y = 0. Do not set Adjust frequency for oversampling to yes.
- *Optional*- Add a Data Partition node. This will help give a generalized model for the data.
- Add your modeling node. Make any necessary changes.
- Run the flow.
Operating System and Release Information
| SAS System | SAS Enterprise Miner | All | n/a | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
| Type: | Usage Note |
| Priority: | low |
| Topic: | Analytics ==> Power and Sample Size Analytics ==> Data Mining
|
| Date Modified: | 2007-10-30 16:14:27 |
| Date Created: | 2005-01-17 13:21:31 |