Create a Data Source

In this section, you define the CS_ACCEPTS SAS data set as a SAS Enterprise Miner data source. A SAS Enterprise Miner data source defines all the information about a SAS table or a view to another file type that is needed for data mining. This information includes the name and location of the data set, variable roles, measurement levels, and other attributes that inform the data mining process. After they are defined, the data sources can be used in any diagram within a project and can be copied from one project to another.
It is important to note that data sources are not the actual training data, but instead are the metadata that define the source data. The source data itself must reside in an allocated library. This project uses data in the SAMPSIO library.
To create a new data source for the sample data:
  1. On the File menu, select Newthen selectData Source. The Data Source Wizard opens.
  2. Proceed through the steps that are outlined in the wizard.
    1. SAS Table is automatically selected as the Source. Click Next.
    2. Enter SAMPSIO.CS_ACCEPTS as the two-level filename of the Table. Click Next.
    3. The Data Source Wizard — Table Information window appears. Metadata is data about data sets. Some metadata, such as field names, is stored with the data. Other metadata, such as how a particular variable in a data set should be used in a predictive model, must be manually specified. When you define modeling metadata, you are establishing relevant facts about the data set prior to model construction.
      Click Next.
    4. Select the Advanced option button. Use the Advanced option when you want SAS Enterprise Miner to automatically set the variable roles and measurement levels. Automatic initial roles and level values are based on the variable type, the variable format, and the number of distinct values contained in the variable.
      Click Next.
    5. In the Data Source Wizard — Column Metadata window, change the value of Role for the variables to match the description below.
      • _freq_ should have the Role Frequency.
      • GB should have the Role Target.
      • All other variables should have the Role Input.
      To change an attribute, click on the value of that attribute and select from the drop-down menu that appears. Click Next.
      You can use the Show code button to write SAS code to conditionally assign variable attributes. This is especially useful when you want to apply a metadata rule to several variables.
    6. In the Data Source Wizard — Decision Configuration window, click Next.
    7. In the Data Source Wizard — Create Sample window, click Next.
    8. The Role of the data source is automatically selected as Raw. Click Next.
    9. Click Finish.
The CS_ACCEPTS data source has been added to your project.
To add the CS_REJECTS data, complete the following steps:
  1. On the File menu, select Newthen selectData Source. The Data Source Wizard opens.
  2. Proceed through the steps that are outlined in the wizard.
    1. SAS Table is automatically selected as the Source. Click Next.
    2. Enter SAMPSIO.CS_REJECTS as the two-level filename of the Table. Click Next.
    3. The Data Source Wizard — Table Information window appears. Click Next.
    4. Select the Advanced option button. Click Next.
    5. In the Data Source Wizard — Column Metadata window, ensure the value of Role for all variables is set to Input. Click Next.
    6. In the Data Source Wizard — Decision Configuration window, click Next.
    7. In the Data Source Wizard — Create Sample window, click Next.
    8. Change the Role of the data source to Score. Click Next.
    9. Click Finish.