You have already set up the
project and defined the input data source that you will use in this example. Now, you will
import the data and perform the following tasks, which help you learn properties of
the input data and prepare it for subsequent modeling:
-
You will explore the statistical properties of the
variables in the input data set. The results that are generated in
this step will give you an idea of which variables are most useful
in predicting the target response (whether a person donates or not)
in this data set.
-
You will partition the data into two data sets, a
training data set and a validation data set. Such partitioning is
common practice in data mining and enables you to develop a complete
model that is not overfitted to a particular set of data.
-
You will specify how SAS Enterprise Miner should handle
missing values of predictor variables.
Tip
It is always a good idea to
plot the input data and to check it for missing values before you
proceed to model building. Knowing the statistical properties of your
input data is essential for building an accurate and robust predictive
model.