Usage Note 24205: Rare-event oversampling for model fitting in SAS® Enterprise Miner(tm)
In SAS Enterprise Miner, one way to bias the classification of a rare event is to over-sample the rare event. In the sample, put a higher proportion of rare-event observations than the proportion that exists in the population.
To create a flow that depicts a rare-event-case analysis in SAS Enterprise Miner, complete the following steps:
- Create the data source. Define a target variable.
- To over-sample the rare event, add a Sample node, and attach the node to the Input Data node. Set the Sample Method property to Stratify.
- Click the ... (ellipsis) button beside the Variables property, and choose your target variable as the stratification variable.
- In the Stratified property section, depending on the percentage of events for the target variable, you most likely set the Criterion property to Level Based or Equal.
- Setting the Criterion property to Level Based. This is becoming a more common sampling technique. For binary target variables, set the Level Selection property to Event. Set the Level Proportion to 100.0. This combination ensures that all events are part of the created sample. Set the Sample Proportion to be the proportion of the sample that contains the selected level.
- Setting the Criterion property to Equal. This setting causes the sample to use all of the event observations, and an equal number of randomly selected non-event observations. Set the Oversampling Adjust Frequency property to No.
- *Optional* - Add a Data Partition node. This node allows a generalized model for the data.
- Add your modeling node. Make any necessary changes.
- Run the flow.
For a video presentation, see SAS Note 34270, "Oversampling techniques in SAS® Enterprise Miner(tm)".
See also SAS KB0036282: "How to model a rare target using an oversample approach in SAS® Enterprise Miner(tm)".
The following steps outline how to create a flow depicting a rare event case analysis in the SAS Enterprise Miner 4.x series.
- Specify the data to use in the Input Data Source node by changing the target variable from a value of input to target.
- To get the correct lift charts for the data set, open the target profiler (right-click the target variable and select Edit Target Profile). Go to the Prior tab, and select Proportional to Data, right-click, and select Set to use. Save your changes and exit the Input Data Source node.
- To over-sample the rare event, add a sampling node. The General tab has options for sampling methods. Select the Stratified method.
- Go to the Stratification tab to set the target variable to be used for stratifying the data for oversampling.
- On the Options subtab, select the Equal Size criteria. This setting causes the sample to use all of the observations with the event, and a randomly sampled equal number of observations with the non-event. Do not select the Adjust frequency for oversampling check box. Close and save the settings for the sampling node.
- *Optional*- Add a Data Partition node. This node gives a generalized model for the data.
- Add a modeling node such as regression or tree.
- Run the flow.
NOTE: When the target variable has a measurement level of interval, oversampling based off of the target variable is not possible. A variable role of stratification is grayed out for an interval target variable.
Operating System and Release Information
SAS System | SAS Enterprise Miner | All | n/a | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | Analytics ==> Power and Sample Size Analytics ==> Data Mining
|
Date Modified: | 2018-06-15 15:45:30 |
Date Created: | 2005-01-17 13:21:31 |