SAS Enterprise Miner
streamlines the data mining process to create highly accurate predictive
and descriptive models based on analysis of vast amounts of data from
across an enterprise. Data mining is applicable in a variety of industries
and provides methodologies for such diverse business problems as fraud
detection, householding, customer retention and attrition, database
marketing, market segmentation, risk analysis, affinity analysis,
customer satisfaction, bankruptcy prediction, and portfolio analysis.
In SAS Enterprise Miner,
the data mining process has the following (SEMMA) steps:
-
Sample the
data by creating one or more data sets. The sample should be large
enough to contain significant information, yet small enough to process.
This step includes the use of data preparation tools for data import,
merge, append, and filter, as well as statistical sampling techniques.
-
Explore the
data by searching for relationships, trends, and anomalies in order
to gain understanding and ideas. This step includes the use of tools
for statistical reporting and graphical exploration, variable selection
methods, and variable clustering.
-
Modify the
data by creating, selecting, and transforming the variables to focus
the model selection process. This step includes the use of tools for
defining transformations, missing value handling, value recoding,
and interactive binning.
-
Model the
data by using the analytical tools to train a statistical or machine
learning model to reliably predict a desired outcome. This step includes
the use of techniques such as linear and logistic regression, decision
trees, neural networks, partial least squares, LARS and LASSO, nearest
neighbor, and importing models defined by other users or even outside SAS
Enterprise Miner.
-
Assess the
data by evaluating the usefulness and reliability of the findings
from the data mining process. This step includes the use of tools
for comparing models and computing new fit statistics, cutoff analysis,
decision support, report generation, and score code management.
You might or might not
include all of the SEMMA steps in an analysis, and it might be necessary
to repeat one or more of the steps several times before you are satisfied
with the results.
After you have completed
the SEMMA steps, you can apply a scoring formula from one or more
champion models to new data that might or might not contain the target variable.
Scoring new data that is not available at the time of model training
is the goal of most data mining problems.
Furthermore, advanced
visualization tools enable you to quickly and easily examine large
amounts of data in multidimensional histograms and to graphically
compare modeling results.
Scoring new data that
is not available at the time of model training is the goal of most
data mining exercises. SAS Enterprise Miner includes tools for generating
and testing complete score code for the entire process flow diagram
as SAS Code, C code, and Java code, as well as tools for interactively
scoring new data and examining the results. You can register your
model to a SAS Metadata Server to share your results with users of
applications such as SAS Enterprise Guide and SAS Data Integration
Studio that can integrate the score code into reporting and production processes.
SAS Model Manager complements the data mining process by providing
a structure for managing projects through development, testing, and
production environments and is fully integrated with SAS Enterprise
Miner.