You have
already set up the project and defined the input data source that
you will use in this example. Now, you will import the data and perform
the following tasks, which help you learn properties of the input
data and prepare it for subsequent modeling:
-
You will explore the statistical properties of the
variables in the input data set. The results that are generated in
this step will give you an idea of which variables are most useful
in predicting the target response (whether a person donates or not)
in this data set.
-
You will plot the input data to discover patterns
and trends. Such patterns include the shape of the distributions of
the variables, numbers of missing values, and data entry errors.
-
You will partition the data into two data sets, a
training data set and a validation data set. Such partitioning is
common practice in data mining and enables you to develop a complete
model that is not overfitted to a particular set of data.
-
You will specify how SAS Enterprise Miner should handle
missing values of predictor variables.
Tip
It is always
a good idea to plot the input data and to check it for missing values
before you proceed to model building. Knowing the statistical properties
of your input data is essential for building an accurate and robust
predictive model.