What's New Table of Contents |
SAS Enterprise Miner 5.1 represents a new paradigm for distributing the power of SAS data mining. A three-tiered architecture separates the data mining computational server from the user interface workstations. The new architecture provides unprecedented flexibility that allows users to configure efficient installations that scale from single-user systems to very large enterprise computing solutions. Powerful servers can be dedicated to computing while end users move from office to home to remote site without losing access to projects or services. Mining processes can be run in parallel and scheduled in a batch environment. Data miners can also distribute Web reports over the corporate intranet to product managers.
Customers should select the configuration that is appropriate in their operating environment. Together, the SAS System CDs and the SAS Ancillary CD contain all components that are needed to install any configuration.
SAS Enterprise Miner 5.1 in SAS 9.1.3 contains the following new nodes:
The Segment Profile node helps assess data sets by using a splitting algorithm to create segmented data. Segmented data is data that is grouped by segment variable, based on common attributes or values among the specified input variables. Use the Segment Profile Node to identify the variables that you want to segment the data set by, and to identify the input variables that you want to use to discriminate between a segment and the entire data set. In addition, the Segment Profile node computes summary statistics for designated report variables.
SAS Credit Scoring is a new solution in SAS Enterprise Miner 5.1 in SAS 9.1.3. This solution offers the ability to rapidly generate automated credit scoring models that rely mostly on statistical models.
If your site has licensed SAS Credit Scoring, the credit scoring nodes will appear on the Credit Scoring tab in your Enterprise Miner session. SAS Enterprise Miner includes the following credit scoring nodes:
You can use these nodes to build scorecard models that assign score points to customer attributes; to group and select characteristics, automatically or interactively, by using Weights of Evidence and Information Value measures; and to normalize score points to conform with company or industry standards. The business value of statistical models can be assessed by using strategy curves, profit charts, and a reject inference process in order to arrive at models for scoring through-the-door populations. A credit exchange tool provides additional reporting of the credit scoring results and can exchange information with the SAS Credit Risk solution.
For more information, see the documentation for SAS Credit Scoring.
There is no current process for converting SAS Enterprise Miner Version 4.x or 3.x projects to SAS Enterprise Miner 5.1.
SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 work directly on SAS tables and views. Neither product directly integrates further data preparation or processing except for customized code that you write in the SAS Code node. SAS Enterprise Miner 5.1 includes a Merge node that enables users to merge input data sets.
SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 can register models to a SAS Metadata Repository. The information registered by using SAS Enterprise Miner 5.1 is more complete.
The online documentation explains many of the details about installing, configuring, and using SAS Enterprise Miner 5.1. This document adds overview information about SAS Enterprise Miner 5.1, and a comparison of its features to the features in SAS Enterprise Miner 4.3.
The following table shows the primary differences between SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1.
Configuration | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Server | SAS 9.1 MVA | SAS 9.1 MVA |
Middleware | N/A | Java. The middleware manages multiple users and model training can be disconnected. |
Client | SAS 9.1 Windows | Java 1.4.1 |
Metadata Server | Optional, but is needed for model registration. | Required |
Project Storage | Client and Server | Server |
SAS Environment | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Program Editor, Log, Output windows | Yes | Yes |
Viewer for SAS/GRAPH output | Yes | Yes |
SAS/INSIGHT | Yes | No |
SAS DMS-based solutions | Yes | No |
Enterprise Miner Interfaces | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
SAS System GUI | Yes | No |
Java client GUI | No | Yes |
Batch project execution | No | Yes |
Web application results viewer | Yes | Yes |
Java API | No | Yes (experimental) |
SAS Code node interface | Yes | Yes. It provides better support for macro variables, macros, code generation, and results definition. |
SAS Code node-based custom nodes | No | Yes. XML definitions for node properties. |
DMTOOL custom nodes | Yes | No |
SAS Enterprise Miner Procedures Usage | Unsupported | Unsupported |
Metadata | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Table and Column analytical metadata | Yes | Yes |
Batch interface for creating table and column metadata | No | Yes |
Batch interface for creating target decision profiles | No | Yes |
Sample-based metadata calculation statistics | Yes | No |
Complete data-based metadata calculation statistics | No | Yes |
Configurable data advisory rules | No | Yes. You can set thresholds for missing percentages. |
Extensible column attributes | No | Yes. You can add additional column attributes to be included in reports. |
Report column attribute | No | Yes. You can include variables that have the report attribute in most reports such as score rankings and score distributions. |
Hidden variables | No | Yes. You can hide rejected variables from the variable usage user interfaces but retain them in the data for score applications. |
Data Model for entire project | No | Yes. It facilitates batch and GUI execution of common projects. |
GUI Functionality | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
XML diagram exchange | No | Yes |
Diagram copy/paste between projects | No | Yes |
Open multiple diagrams | No | Yes |
Open multiple results windows | No | Yes |
Open multiple child windows in node results | No | Yes |
Common property sheet | No | Yes |
Individual property dialogs | Yes | No |
Group processing | Yes (stratified, bagging, and boosting) | No |
Job Execution | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Stop running diagram | No | Yes |
Run multiple diagrams | No | Yes |
Disconnect while diagram runs | No | Yes. Middleware configuration is required. |
Continue work while diagrams run | No | Yes |
Share projects with multiple users | Yes | Yes |
Batch mode execution | No | Yes |
Scheduling | No | No |
Multi-threaded procedures | Yes | Yes |
Multi-tasking projects | No | Yes |
Reporting | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
HTML Reports | Yes | No |
SAS Publish Packages (SPK) | No | Yes |
Web application for viewing stored models | Yes | Yes |
Interactive Analysis | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Tree growing and pruning | Yes | Yes |
Neural network model | Yes | No |
Association rule: WHERE clause | Yes | No |
Transformation bin allocation | Yes | No |
Filter outliers selection | Yes | No |
Link analysis | Yes | No |
Interactive grouping | Yes | Yes, if your site licenses SAS Credit Scoring. The SAS Enterprise Miner Interactive Grouping Desktop Application enables you to modify the groupings interactively. |
Decision threshold charts | Yes | No |
Decision Processing | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Class target profile matrix | Yes | Yes |
Class target loss matrix | Yes | Yes |
Cost values and cost variables | Yes | Yes |
Class target variable number of decisions | Yes | No |
Interval target decisions | Yes | No |
Model Assessment | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Class probability score rankings: gain, lift, etc. | Yes | Yes |
Class probability score distributions | No | Yes |
Classification tables in node output listings | Sometimes | Always |
Decision tables in node output listings | No | Yes |
Type I and Type II error table in node output listings | No | Yes |
Interval target score rankings | No | Yes |
Interval target score distributions | No | Yes |
Interval target prediction vs. actual target | Yes. It is sample based and available in Model Manager. | Yes. It is user-generated plots of exported data. |
Score rankings printed in output listings | No | Yes |
Score distributions printed in output listings | No | Yes |
Post-model decision matrix what-if investigations | Yes | No |
Decision threshold charts | Yes | No |
Kolmogorov-Smirnov (KS), Receiver Operating Characteristic (ROC) index, GINI statistics | No | Yes |
Validation data assessment | Yes | Yes |
Train and Test data assessment | Optional. You must enable it through Model Manager, | Yes |
ROC chart | Yes | Yes |
Scoring | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
SAS score code | Yes | Yes |
C and Java score code | Yes | Yes |
Separation of residual and non-residual score code | No | Yes |
PMML generation | No | Yes |
Node Tools | SAS Enterprise Miner 4.3 | SAS Enterprise Miner 5.1 |
Input Data | Yes | Yes |
Sampling | Yes | Yes |
Partitioning | Yes | Yes |
Time Series | Yes | Yes |
Variable Selection | Yes | Yes |
Clustering | Yes. The procedure FASTCLUS is used. | Yes. The procedure DMVQ is used to support class variables, score code, and PMML. |
SOM | Yes | No |
Link Analysis | Yes | No |
Insight | Yes | No. Graphs can be generated from the Results window of any node. |
Distribution Explorer | Yes | No. Graphs can be generated from the Results window of any node. |
Multiplot | Yes | Yes |
StatExplore | No | Yes. This node computes univariate and bivariate distribution statistics for interval and class variables. Target and segment variables are used as by variables and/or correlation terms. |
Merge | No | Yes. This node merges training, test, and validation data sets by row number or by ID variable. It is useful for combining predictions from multiple models or for matching ID in multiple tables. |
Association | Yes | Yes, but the Results window does not support filtering rules and scatter plot for items. This node supports network display of rules. It generates a transposed data set that has one row per customer and variables for rules. The transposed data set can be used to cluster or predict customer behavior by rules. |
Path Analysis | No | Yes. This node uses the new PATH procedure that includes a referrer variable for Web log analysis. |
Transform | Yes | Yes, but this node does not support user-specified equations. It supports the creation of dummy and interactive terms. |
Drop | No | Yes. This node drops variables from temporary tables for processing efficiency. |
Filter | Yes | Yes, but this node does not support graphical selection of filter ranges. |
Impute | Yes, this node is the Replacement node. | Yes |
Principal Components | Yes. It is in the Princomp/Dmneural node. | Yes. This node does not support the selection of number of components in the Results window. |
Regression (linear and logistic) | Yes | Yes |
Dmine Regression | No. Dmine regression is available as an optional output from the Variable Selection node. | Yes. This node uses the DMINE procedure to produce models that directly include the Analysis of Variance (AOV), group, and interaction effects for interval and binary targets. |
Decision Tree | Yes. Use the Tree Desktop Application for interactive training. | Yes. Use the Tree Desktop Application for interactive training. |
Neural Network | Yes | Yes. This node does not support interactive training or advanced user network configuration. |
Rule Induction | No | Yes. This node uses an algorithm for building models by recursively identifying target events. It is useful for modeling rare events. This functionality was formerly included in DMTOOL. |
Autoneural | No | Yes. This node uses an algorithm for automated MLP network building. It selects the type and number of activation functions from four different architectures. This functionality was formerly included in DMTOOL. |
DMNeural | Yes. This node is part of the Princomp/DMNeural node. | Yes. |
Two Stage Model | Yes | Yes. You can specify the options for the first and second stage models, and the neural network models that have two targets. |
Memory-Based Reasoning | Yes. This node is not recommended for score deployment, because it requires training table availability. | Yes. |
Ensemble | Yes | Yes. The node supports simple averaging and voting methods. It does not support bagging and boosting models. |
Model Comparison | Yes. It is the Assessment node. | Yes. This node computes ROC and KS and automatically selects a model based on your selection. |
Segment Profile | No | Yes. This node uses a designated segment variable to segment the data set based on input variables that discriminate between a segment and the entire data set. Additionally, the node generates graphics and summary information to profile the generated segments. |
Group Processing | Yes | No |
Subdiagram | Yes | No |
Control Point | Yes | Yes |
SAS Code | Yes | Yes. This node provides extended support through better organized macros and macro variables. It supports building model and model assessment functions, and the creation of report tables and plots. |
Score | Yes | Yes. If score data is defined, the node always scores data to create output view and table. |
Score Converter | Yes | No. C and Java code are included in the SPK results package. PMML code for decision trees is available on a request basis. |
User Defined Model | Yes | No |
Reporter | Yes | No. Reports in the SPK format can be generated from any node. |
Data Set Attributes | Yes | Yes. The Metadata node in SAS Enterprise Miner 5.1 replaces the Data Set Attributes node in SAS Enterprise Miner 4.3. |
Data Mining Database | Yes | N/A |
Interactive Grouping | Yes. This node supports user-driven grouping of variable levels and bins based on GINI, Information Gain, and Weight of Evidence (WOE) scores. | Yes, if your site licenses SAS Credit Scoring. |
Scorecard | Yes, if your site licenses SAS Credit Scoring. | Yes, if your site licenses SAS Credit Scoring. |
Reject Inference | Yes, if your site licenses SAS Credit Scoring. | Yes, if your site licenses SAS Credit Scoring. |
Credit Exchange | Yes, if your site licenses SAS Credit Scoring. | Yes, if your site licenses SAS Credit Scoring. |
All SAS computations are performed on the server and are unaffected by the client/server network performance. All project information such as functional settings and intermediate data sets are stored on the SAS server. GUI operations (such as editing a process flow diagram, property sheets, and variables tables) depend on transferring data from the SAS server to the client GUI, and are affected by both the server's availability and client/server network performance. SAS Enterprise Miner 5.1 is designed to be tolerant of network disconnections and will clean up resources that include SAS sessions, accordingly. A reliable 512Kbps or greater network bandwidth is recommended for reasonable client/server performance.
The SAS Enterprise Miner 5.1 client will start a SAS session for interacting with the server and
submitting user-entered SAS code. In
the default configuration, SAS Enterprise Miner 5.1 starts a separate SAS session for each
parallel branch in the process flow diagram. Also, a user of SAS Enterprise Miner 5.1 can start several diagrams that run simultaneously. Each new diagram starts a new SAS session.
Note: It is possible for a
user on a single CPU desktop system to start so many SAS sessions that
overall system performance is seriously degraded.
Some tasks in SAS Enterprise Miner 5.1 (such as data sorting, variable selection, and regression modeling) have been rebuilt to distribute their work over multiple CPUs on the same system.
SAS Enterprise Miner 4.3 and 5.1 use the same SAS procedures for core modeling and summarization, therefore, you should get the same regression results when you use either release of the software.
SAS Enterprise Miner 4.3 and 5.1 generate SAS code to manage the use of SAS procedures. Because the SAS code that is generated by each of these releases of SAS Enterprise Miner is different, you will get different results in analytical metadata management and derived functions.
SAS Enterprise Miner 4.3 uses the following attributes to maintain analytical metadata for variables:
SAS Enterprise Miner 5.1 contains the following additional attributes:
Use the %EMDS macro to create a data source definition that has additional attributes for variables that are not used by Enterprise Miner but that are included in reports that Enterprise Miner creates.
SAS Enterprise Miner 4.3 creates a metadata sample (which has 2,000 observations or less, by default) for data summarization, graphics, and interactive modeling. The metadata sample is downloaded and persists on the client system. SAS Enterprise Miner 5.1 is a completely server-based system that does not use a metadata sample.
SAS Enterprise Miner 4.3 and 5.1 do not require different server installation. However, a SAS Enterprise Miner 4.3 server requires the SAS connect spawner, and SAS Enterprise Miner 5.1 server requires the SAS object spawner.
The SAS Enterprise Miner 5.1 server is responsible for project storage and computations. Having more memory available improves both computational and GUI performance. A minimum of 512MB memory and 1GB or more is recommended for optimal performance.
Note: SAS Enterprise Miner 5.1 was not tested on z/OS operating systems. Currently, SAS Enterprise Miner 5.1 requires the HFS or UNIX environment under z/OS.
SAS Enterprise Miner 5.1 requires Java release 1.4.1. The SAS base language and procedures are not being ported to Java and will continue to run in a threaded SAS System.
The Java middleware is a new feature in SAS Enterprise Miner 5.1, but it is not a required component. The middleware is useful when you want to disconnect the client from a running diagram, or to facilitate multiple users who are sharing SAS servers and a SAS Open Metadata Server.
The Java client is based on Java Swing GUI libraries that are part of Java 1.4.1. Following is a list of performance considerations for the SAS Enterprise Miner 5.1 Java client:
SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 can register models to the Enterprise Miner model repository. The repository is a member of the SAS Metadata Server (SMS). When you register a SAS Enterprise Miner model, details of the model are stored in the SMS. Additional details can be stored in a WEBDAV server, if that is available, and then configured.
The Model Repository Viewer is a Web application that queries and presents models that are found in the Enterprise Miner model repository. This function is useful in archiving models over long periods of time and in distributing models to people who are not users of Enterprise Miner such as business managers and database administrators. The components of the Model Repository Viewer can be used to build custom applications for Web delivery of data mining models.
SAS Enterprise Miner 5.1 supports SAS language-based batch processing by using macros. You can use all these macros together in one or in many SAS jobs that conduct complete data-mining model building without using the GUI in SAS Enterprise Miner 5.1, or you can use the macros with the GUI in a complementary cycle.
SAS Enterprise Miner 5.1 supports a Java language API. See the SAS Enterprise Miner 5.1 documentation for usage information. Java API is useful when you are building a Java language application and want to include data mining functionality. For example, if you build a Web portal to neural networks, your Java servlet code can use the SAS Enterprise Miner 5.1 Java API to run the SAS neural network. The Enterprise Miner Java API is experimental in SAS 9.1.