Example Process Flow Diagram

Task 2. Defining the Input Data Set

  1. Drag an Input Data Source node from the Tools Palette of the Project Navigator or from the toolbox onto the Diagram Workspace.

  2. Double-click on the Input Data Source node icon in the Diagram Workspace to open the node's configuration interface. The Input Data Source node opens to the Data tab.

  3. Type SAMPSIO.DMAGECR in the Source Data text box, and then press the ENTER key. Alternatively, you can click Selectto find and set SAMPSIO.DMAGECR as the input data source.

    [Data tab in Input Data Source window selecting SAMPSIO.DMAGECR as the Source Data, Raw Role.]

The node automatically creates the metadata sample, which is used in several Enterprise Miner nodes to determine metadata information about the analysis data set. By default, the metadata sample consists of a random sample of 2,000 cases. Because there are only 1,000 customer cases in this data set, the metadata sample contains all the customer cases in the SAMPSIO.DMAGECR data set. Enterprise Miner assigns a model role and measurement level to each variable that is listed in the Variables tab based on the metadata sample. You can reassign the model role or measurement level for a variable. The summary statistics for interval and class variables are also calculated from the metadata sample.

You can control the size of the metadata sample by selecting Change in the Data tab.

Note:   The SAMPSIO.DMAGECR credit data set is small relative to many data marts, which often contain gigabytes or even terabytes of data. Therefore, there is no need to sample this data set with a Sampling node prior to creating the partitioned data sets with the Data Partition node.  [cautionend]

space
Previous Page | Next Page | Top of Page