Introduction to SAS Enterprise Miner 5.3 Software |
SAS defines data mining as the process of uncovering hidden patterns in large amounts of data. Many industries use data mining to address business problems and opportunities such as fraud detection, risk and affinity analyses, database marketing, householding, customer churn, bankruptcy prediction, and portfolio analysis.The SAS data mining process is summarized in the acronym SEMMA, which stands for sampling, exploring, modifying, modeling, and assessing data.
Sample the data by creating one or more data tables. The sample should be large enough to contain the significant information, yet small enough to process.
Explore the data by searching for anticipated relationships, unanticipated trends, and anomalies in order to gain understanding and ideas.
Modify the data by creating, selecting, and transforming the variables to focus the model selection process.
Model the data by using the analytical tools to search for a combination of the data that reliably predicts a desired outcome.
Assess the data by evaluating the usefulness and reliability of the findings from the data mining process.
The SEMMA data mining process is driven by a process flow diagram, which you can modify and save. The Graphical User Interface is designed in such a way that the business analyst who has little statistical expertise can navigate through the data mining methodology, while the quantitative expert can go "behind the scenes" to fine-tune the analytical process.
SAS Enterprise Miner 5.3 contains a collection of sophisticated analysis tools that have a common user-friendly interface that you can use to create and compare multiple models. Analytical tools include clustering, association and sequence discovery, market basket analysis, path analysis, self-organizing maps / Kohonen, variable selection, decision trees and gradient boosting, linear and logistic regression, two stage modeling, partial least squares, support vector machines, and neural networking. Data preparation tools include outlier detection, variable transformations, variable clustering, interactive binning, principal components, rule building and induction, data imputation, random sampling, and the partitioning of data sets (into train, test, and validate data sets). Advanced visualization tools enable you to quickly and easily examine large amounts of data in multidimensional histograms and to graphically compare modeling results.
Enterprise Miner is designed for PCs or servers that are running under Windows XP, UNIX, Linux, or subsequent releases of those operating environments. The figures and screen captures that are presented in this document were taken on a PC that was running under Windows XP.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.