SAS Enterprise Miner
streamlines the data mining process to create highly accurate predictive
and descriptive models based on analysis of vast amounts of data from
across an enterprise. Data mining is applicable in a variety of industries
and provides methodologies for such diverse business problems as fraud
detection, householding, customer retention and attrition, database
marketing, market segmentation, risk analysis, affinity analysis,
customer satisfaction, bankruptcy prediction, and portfolio analysis.
In SAS Enterprise Miner,
the data mining process has the following (SEMMA) steps:
-
Sample the data by creating one or more data sets. The sample should be
large enough to contain significant information, yet small enough
to process. This step includes the use of data preparation tools for
data import, merge, append, and filter, as well as statistical sampling
techniques.
-
Explore the data by searching for relationships, trends, and anomalies in
order to gain understanding and ideas. This step includes the use
of tools for statistical reporting and graphical exploration, variable
selection methods, and variable clustering.
-
Modify the data by creating, selecting, and transforming the variables
to focus the model selection process. This step includes the use of
tools for defining transformations, missing value handling, value
recoding, and interactive binning.
-
Model the data by using the analytical tools to train a statistical or
machine learning model to reliably predict a desired outcome. This
step includes the use of techniques such as linear and logistic regression,
decision trees, neural networks, partial least squares, LARS and LASSO,
nearest neighbor, and importing models defined by other users or even
outside SAS Enterprise Miner.
-
Assess the data by evaluating the usefulness and reliability of the findings
from the data mining process. This step includes the use of tools
for comparing models and computing new fit statistics, cutoff analysis,
decision support, report generation, and score code management.
You might or might not
include all of the SEMMA steps in an analysis, and it might be necessary
to repeat one or more of the steps several times before you are satisfied
with the results.
After you have completed
the SEMMA steps, you can apply a scoring formula from one or more
champion models to new data that might or might not contain the target
variable. Scoring new data that is not available at the time of model
training is the goal of most data mining problems.
Furthermore, advanced
visualization tools enable you to quickly and easily examine large
amounts of data in multidimensional histograms and to graphically
compare modeling results.
Scoring new data that
is not available at the time of model training is the goal of most
data mining problems. SAS Enterprise Miner includes tools for generating and testing
complete score code for the entire process flow diagram as SAS Code,
C code, and Java code, as well as tools for interactively scoring
new data and examining the results. You can register your model to
a SAS Metadata Server to share your results with users of applications
such as SAS Enterprise Guide and SAS Data Integration Studio that
can integrate the score code into reporting and production processes.
SAS Model Manager complements the data mining process by providing
a structure for managing projects through development, testing, and production environments
and is fully integrated with SAS Enterprise Miner.