SAS Enterprise
Miner streamlines the data mining process to create highly accurate
predictive and descriptive models based on analysis of vast amounts
of data from across an enterprise. Data mining is applicable in a
variety of industries and provides methodologies for such diverse
business problems as fraud detection, householding, customer retention
and attrition, database marketing, market segmentation, risk analysis,
affinity analysis, customer satisfaction, bankruptcy prediction, and
portfolio analysis.
In SAS
Enterprise Miner, the data mining process has the following (SEMMA)
steps:
-
Sample the data by creating one or more data sets. The sample should be
large enough to contain significant information, yet small enough
to process. This step includes the use of data preparation tools for
data import, merge, append, and filter, as well as statistical sampling
techniques.
-
Explore the data by searching for relationships, trends, and anomalies in
order to gain understanding and ideas. This step includes the use
of tools for statistical reporting and graphical exploration, variable
selection methods, and variable clustering.
-
Modify the data by creating, selecting, and transforming the variables
to focus the model selection process. This step includes the use of
tools for defining transformations, missing value handling, value
recoding, and interactive binning.
-
Model the data by using the analytical tools to train a statistical or
machine learning model to reliably predict a desired outcome. This
step includes the use of techniques such as linear and logistic regression,
decision trees, neural networks, partial least squares, LARS and LASSO,
nearest neighbor, and importing models defined by other users or even
outside SAS Enterprise Miner.
-
Assess the data by evaluating the usefulness and reliability of the findings
from the data mining process. This step includes the use of tools
for comparing models and computing new fit statistics, cutoff analysis,
decision support, report generation, and score code management.
You might
or might not include all of the SEMMA steps in an analysis, and it
might be necessary to repeat one or more of the steps several times
before you are satisfied with the results.
After
you have completed the SEMMA steps, you can apply a scoring formula
from one or more champion models to new data that might or might not
contain the target variable. Scoring new data that is not available
at the time of model training is the goal of most data mining problems.
Furthermore,
advanced visualization tools enable you to quickly and easily examine
large amounts of data in multidimensional histograms and to graphically
compare modeling results.
Scoring
new data that is not available at the time of model training is the
goal of most data mining problems. SAS Enterprise Miner includes
tools for generating and testing complete score code for the entire
process flow diagram as SAS Code, C code, and Java code, as well as
tools for interactively scoring new data and examining the results.
You can register your model to a SAS Metadata Server to share your
results with users of applications such as SAS Enterprise Guide and
SAS Data Integration Studio that can integrate the score code into
reporting and production processes. SAS Model Manager complements
the data mining process by providing a structure for managing projects
through development, test, and production environments and is fully
integrated with SAS Enterprise Miner.