Previous Page | Next Page

Setting Up Your Project

Define the Donor Data Source


Overview of the Enterprise Miner Data Source

In order to access the example data in Enterprise Miner, you need to define the imported data as an Enterprise Miner data source. An Enterprise Miner data source stores all of the data set's metadata. Enterprise Miner metadata includes the data set's name, location, library path, as well as variable role assignments, measurement levels, and other attributes that guide the data mining process. The metadata is necessary in order to start data mining. Note that Enterprise Miner data sources are not the actual training data, but are the metadata that defines the data source for Enterprise Miner.

The data source must reside in an allocated library. You assigned the libname Donor to the data that is found in C:\EM53\GS\Data when you created the SAS Library for this example.

The following tasks use the Data Source wizard in order to define the data source that you will use for this example.


Specify the Data Type

In this task you open the Data Source wizard and identify the type of data that you will use.

  1. Right-click the Data Sources folder in the Project Navigator and select Create Data Source to open the Data Source wizard. Alternatively, you can select File [arrow]  New  [arrow]  Data Source from the main menu, or you can click the Create Data Source on the Shortcut Toolbar.

    [untitled graphic]

  2. In the Source box of the Data Source Wizard Metadata Source window, select SAS Table to tell SAS Enterprise Miner that the data is formatted as a SAS table.

    [untitled graphic]

  3. Click Next. The Data Source Wizard Select a SAS Table window opens.


Select a SAS Table

In this task, you specify the data set that you will use, and view the table metadata.

  1. Click Browse in the Data Source Wizard - Select a SAS Table window.

    [untitled graphic]

    The Select a SAS Table window opens.

  2. Click the SAS library named Donor in the list of libraries on the left. The Donor library folder expands to show all the data sets that are in the library.

    [untitled graphic]

  3. Select the DONOR_RAW_DATA table and click OK. The two-level name DONOR.DONOR_RAW_DATA appears in the Table box of the Select a SAS Table window.

    [untitled graphic]

  4. Click Next. The Table Information window opens. Examine the metadata in the Table Properties section. Notice that the DONOR_RAW_DATA data set has 50 variables and 19,372 observations.

    [untitled graphic]

  5. After you finish examining the table metadata, click Next. The Data Source Wizard Metadata Advisor Options window opens.


Configure the Metadata

The Metadata Configuration step activates the Metadata Advisor, which you can use to control how Enterprise Miner organizes metadata for the variables in your data source.

In this task, you generate and examine metadata about the variables in your data set.

  1. Select Advanced and click Customize.

    [untitled graphic]

    The Advanced Advisor Options window opens.

    In the Advanced Advisor Options window, you can view or set additional metadata properties. When you select a property, the property description appears in the bottom half of the window.

    [untitled graphic]

    Notice that the threshold value for class variables is 20 levels. You will see the effects of this setting when you view the Column Metadata window in the next step. Click OK to use the defaults for this example.

  2. Click Next in the Data Source Wizard Metadata Advisor Options window to generate the metadata for the table. The Data Source Wizard Column Metadata window opens.

    Note:   In the Column Metadata window, you can view and, if necessary, adjust the metadata that has been defined for the variables in your SAS table. Scroll through the table and examine the metadata. In this window, columns that have a white background are editable, and columns that have a gray background are not editable.  [cautionend]

  3. Select the Names column header to sort the variables alphabetically.

    Note that the roles for the variables CLUSTER_CODE and CONTROL_NUMBER are set to Rejected because the variables exceed the maximum class count threshold of 20. This is a direct result of the threshold values that were set in the Data Source Wizard Metadata Advisory Options window in the previous step. To see all of the levels of data, select the columns of interest and then click Explore in the upper right-hand corner of the window.

  4. Redefine these variable roles and measurement levels:

    • Set the role for the CONTROL_NUMBER variable to ID.

    • Set these variables to the Interval measurement level:

      • CARD_PROM_12

      • INCOME_GROUP

      • RECENT_CARD_RESPONSE_COUNT

      • RECENT_RESPONSE_COUNT

      • WEALTH_RATING

  5. Set the role for the variable TARGET_D to Rejected, since you will not model this variable. Note that Enterprise Miner correctly identified TARGET_D and TARGET_B as targets since they start with the prefix TARGET.

  6. Select the TARGET_B variable and click Explore to view the distribution of TARGET_B. As an exercise, select additional variables and explore their distributions.

    [untitled graphic]

  7. In the Sample Properties window, set Fetch Size to Max and then click Apply.

  8. Select the bar that corresponds to donors (TARGET_B = '1') on the TARGET_B histogram and note that the donors are highlighted in the DONOR.DONOR_RAW_DATA table.

    [untitled graphic]

  9. Close the Explore window.

  10. Sort the Metadata table by Level and check your customized metadata assignments.

    [untitled graphic]

  11. Select the Report column and select Yes for URBANICITY and DONOR_AGE to define them as report variables. These variables will be used as additional profiling variables in results such as assessment tables and cluster profiles plots.

    [untitled graphic]

  12. Click Next to open the Data Source Wizard Decision Configuration window.

    [untitled graphic]

    To end this task, select Yes and click Next in order to open the Decision Configuration window.


Define Prior Probabilities and a Profit Matrix

The Data Source Wizard Decision Configuration window enables you to define a target profile that produces optimal decisions from a model. You can specify target profile information such as the profit or loss of each possible decision, prior probabilities, and cost functions. In order to create a target profile in the Decision Configuration window, you must have a variable that has a role of Target in your data source. You cannot define decisions for an interval level target variable.

In this task, you specify whether to implement decision processing when you build your models.

[untitled graphic]

  1. Select the Prior Probabilities tab. Click Yes to reveal the Adjusted Prior column and enter the following adjusted probabilities, which are representative of the underlying population of donors.

    • Level 1 = 0.05

    • Level 0 = 0.95

    [Data Source Wizard Decision Configuration Prior Probabilities Tab]

  2. Select the Decision Weights tab and specify the following weight values:

    Level Decision 1 Decision 2
    1 14.5 0
    0 -0.5 0

    A profit value of $14.50 is obtained after accounting for a 50-cent mailing cost. The focus of this example will be to develop models that maximize profit.

    [Data Source Wizard Decision Configuration Decision Weights Tab]

  3. Click Next to open the Data Source Attributes window. In this window, you can specify a name, role, and segment for your data source.

    [Data Source Wizard Data Source Attributes Window]

  4. Click Finish to add the donor table to the Data Sources folder of the Project Navigator.


Optional Steps

Previous Page | Next Page | Top of Page