SAS Enterprise Miner
streamlines the data mining process to create highly accurate predictive
and descriptive models based on analysis of vast amounts of data from
across an enterprise. Data mining is applicable in a variety of industries
and provides methodologies for such diverse business problems as fraud
detection, householding, customer retention and attrition, database
marketing, market segmentation, risk analysis, affinity analysis,
customer satisfaction, bankruptcy prediction, and portfolio analysis.
In SAS Enterprise Miner, the data mining process has the following (
SEMMA) steps:
-
Sample the data by creating one or more data sets. The sample should be large enough to
contain significant information, yet small enough to process. This step includes the
use of data preparation tools for data import, merge, append, and filter, as well
as statistical
sampling techniques.
-
Explore the data by searching for relationships, trends, and anomalies in order to gain understanding
and ideas. This step includes the use of tools for statistical reporting and graphical
exploration,
variable selection methods, and variable clustering.
-
Modify the data by creating, selecting, and transforming the variables to focus the
model selection process. This step includes the use of tools for defining transformations,
missing value handling, value recoding, and interactive binning.
-
Model the data by using the analytical tools to train a statistical or machine learning
model to reliably predict a desired outcome. This step includes the use of techniques
such
as linear and
logistic regression, decision trees, neural networks, partial least squares, LARS and LASSO, nearest
neighbor, and importing models defined by other users or even outside SAS Enterprise
Miner.
-
Assess the
data by evaluating the usefulness and reliability of the findings
from the data mining process. This step includes the use of tools
for comparing models and computing new fit statistics, cutoff analysis,
decision support, report generation, and score code management.
You might or might not include all of the SEMMA steps in an analysis, and it might
be necessary to repeat one or more of the steps
several times before you are satisfied with the results.
After you have completed the SEMMA steps, you can apply a scoring formula from one
or more champion models to new data
that might or might not contain the
target variable. Scoring new data that is not available at the time of model training is the goal
of most data mining problems.
Furthermore, advanced
visualization tools enable you to quickly and easily examine large
amounts of data in multidimensional histograms and to graphically
compare modeling results.
Scoring new data that is not available at the time of model training is the goal of
most data mining exercises. SAS Enterprise Miner includes
tools for generating and testing complete score code for the entire
process flow diagram as SAS Code, C code, and Java code, as well as tools for interactively scoring new
data and examining the results. You can register your model to a SAS Metadata Server
to share your results with users of applications such as SAS Enterprise Guide and
SAS Data Integration Studio that can integrate the score code into reporting and production
processes. SAS Model Manager complements the data mining process by providing a structure
for managing projects through development, testing, and production environments and
is fully integrated with SAS Enterprise Miner.