In this section, you
define the CS_ACCEPTS SAS data set as a SAS Enterprise Miner data
source. A SAS Enterprise Miner data source defines all the
information about a SAS table or a view to another file type that
is needed for data mining. This information includes the name and
location of the data set, variable roles, measurement levels, and
other attributes that inform the data mining process. After they are
defined, the data sources can be used in any diagram within a project
and can be copied from one project to another.
It is important to note
that data sources are not the actual training data, but instead is
the metadata that defines the source data. The source data itself
must reside in an allocated library. This project uses data in the
SAMPSIO library.
To create a new data
source for the
sample data:
-
On the
File menu,
select
NewData Source. The Data Source Wizard opens.
-
Proceed through the
steps that are outlined in the wizard.
-
SAS Table is
automatically selected as the
Source. Click
Next.
-
Enter
SAMPSIO.CS_ACCEPTS
as
the two-level filename of the
Table. Click
Next.
-
The
Data
Source Wizard — Table Information window appears.
Metadata is data about data sets. Some metadata, such as field
names, is stored with the data. Other metadata, such as how a particular
variable in a data set should be used in a predictive model, must
be manually specified. When you define modeling metadata, you are
establishing relevant facts about the data set prior to model construction.
-
Click
Advanced.
Use the Advanced option when you want SAS Enterprise Miner to automatically
set the variable roles and measurement levels. Automatic initial
roles and level values are based on the variable type, the variable
format, and the number of distinct values contained in the variable.
-
In the
Data
Source Wizard — Column Metadata window, change
the value of
Role for the variables to match
the description below.
-
_freq_ should have the
Role Frequency.
-
GB should have the
Role Target.
-
All other variables should have
the
Role Input.
To change an attribute, click on the value of that
attribute and select from the drop-down menu that appears. Click
Next.
You can use the
Show
code option to write SAS code to conditionally assign
variable attributes. This is especially useful when you want to apply
a metadata rule to several variables.
-
In the
Data
Source Wizard — Decision Configuration window,
click
Next.
-
In the
Data
Source Wizard — Create Sample window, click
Next.
-
The
Role of
the data source is automatically selected as
Raw.
Click
Next.
-
The CS_ACCEPTS data
source has been added to your project.
To add the CS_REJECTS
data, complete the following steps:
-
On the
File menu,
select
NewData Source. The Data Source Wizard opens.
-
Proceed through the
steps that are outlined in the wizard.
-
SAS Table is
automatically selected as the
Source. Click
Next.
-
Enter
SAMPSIO.CS_REJECTS
as
the two-level filename of the
Table. Click
Next.
-
The
Data
Source Wizard — Table Information window appears.
Click
Next.
-
Click
Advanced.
Click
Next.
-
In the
Data
Source Wizard — Column Metadata window, ensure
that the value of
Role for all variables is
set to
Input. Click
Next.
-
In the
Data
Source Wizard — Create Sample window, click
Next.
-
Change the
Role of
the data source to
Score. Click
Next.
-