Resources

What's New Table of Contents  

What's New in SAS Enterprise Miner 5.1

Overview

SAS Enterprise Miner 5.1 represents a new paradigm for distributing the power of SAS data mining. A three-tiered architecture separates the data mining computational server from the user interface workstations. The new architecture provides unprecedented flexibility that allows users to configure efficient installations that scale from single-user systems to very large enterprise computing solutions. Powerful servers can be dedicated to computing while end users move from office to home to remote site without losing access to projects or services. Mining processes can be run in parallel and scheduled in a batch environment. Data miners can also distribute Web reports over the corporate intranet to product managers.

Customers should select the configuration that is appropriate in their operating environment. Together, the SAS System CDs and the SAS Ancillary CD contain all components that are needed to install any configuration.

SAS Enterprise Miner 5.1 in SAS 9.1.3 contains the following new nodes:


Segment Profile Node

The Segment Profile node helps assess data sets by using a splitting algorithm to create segmented data. Segmented data is data that is grouped by segment variable, based on common attributes or values among the specified input variables. Use the Segment Profile Node to identify the variables that you want to segment the data set by, and to identify the input variables that you want to use to discriminate between a segment and the entire data set. In addition, the Segment Profile node computes summary statistics for designated report variables.


Credit Scoring Nodes

SAS Credit Scoring is a new solution in SAS Enterprise Miner 5.1 in SAS 9.1.3. This solution offers the ability to rapidly generate automated credit scoring models that rely mostly on statistical models.

If your site has licensed SAS Credit Scoring, the credit scoring nodes will appear on the Credit Scoring tab in your Enterprise Miner session. SAS Enterprise Miner includes the following credit scoring nodes:

You can use these nodes to build scorecard models that assign score points to customer attributes; to group and select characteristics, automatically or interactively, by using Weights of Evidence and Information Value measures; and to normalize score points to conform with company or industry standards. The business value of statistical models can be assessed by using strategy curves, profit charts, and a reject inference process in order to arrive at models for scoring through-the-door populations. A credit exchange tool provides additional reporting of the credit scoring results and can exchange information with the SAS Credit Risk solution.

For more information, see the documentation for SAS Credit Scoring.


Conversion from Previous Releases

There is no current process for converting SAS Enterprise Miner Version 4.x or 3.x projects to SAS Enterprise Miner 5.1.


Data Import and Export

SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 work directly on SAS tables and views. Neither product directly integrates further data preparation or processing except for customized code that you write in the SAS Code node. SAS Enterprise Miner 5.1 includes a Merge node that enables users to merge input data sets.

SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 can register models to a SAS Metadata Repository. The information registered by using SAS Enterprise Miner 5.1 is more complete.


Documentation

The online documentation explains many of the details about installing, configuring, and using SAS Enterprise Miner 5.1. This document adds overview information about SAS Enterprise Miner 5.1, and a comparison of its features to the features in SAS Enterprise Miner 4.3.


SAS Enterprise Miner 4.3 vs. SAS Enterprise Miner 5.1

The following table shows the primary differences between SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1.

Configuration SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Server SAS 9.1 MVA SAS 9.1 MVA
Middleware N/A Java.  The middleware manages multiple users and model training can be disconnected.
Client SAS 9.1 Windows Java 1.4.1
Metadata Server Optional, but is needed for model registration. Required
Project Storage Client and Server Server
SAS Environment SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Program Editor, Log, Output windows Yes Yes
Viewer for SAS/GRAPH output Yes Yes
SAS/INSIGHT Yes No
SAS DMS-based solutions Yes No
Enterprise Miner Interfaces SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
SAS System GUI Yes No
Java client GUI No Yes
Batch project execution No Yes
Web application results viewer Yes Yes
Java API No Yes (experimental)
SAS Code node interface Yes Yes. It provides better support for macro variables, macros, code generation, and results definition.
SAS Code node-based custom nodes No Yes. XML definitions for node properties.
DMTOOL custom nodes Yes No
SAS Enterprise Miner Procedures Usage Unsupported Unsupported
Metadata SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Table and Column analytical metadata Yes Yes
Batch interface for creating table and column metadata No Yes
Batch interface for creating target decision profiles No Yes
Sample-based metadata calculation statistics Yes No
Complete data-based metadata calculation statistics No Yes
Configurable data advisory rules No Yes. You can set thresholds for missing percentages.
Extensible column attributes No Yes. You can add additional column attributes to be included in reports.
Report column attribute No Yes. You can include variables that have the report attribute in most reports such as score rankings and score distributions.
Hidden variables No Yes. You can hide rejected variables from the variable usage user interfaces but retain them in the data for score applications.
Data Model for entire project No Yes. It facilitates batch and GUI execution of common projects.
GUI Functionality SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
XML diagram exchange No Yes
Diagram copy/paste between projects No Yes
Open multiple diagrams No Yes
Open multiple results windows No Yes
Open multiple child windows in node results No Yes
Common property sheet No Yes
Individual property dialogs Yes No
Group processing Yes (stratified, bagging, and boosting) No
Job Execution SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Stop running diagram No Yes
Run multiple diagrams No Yes
Disconnect while diagram runs No Yes. Middleware configuration is required.
Continue work while diagrams run No Yes
Share projects with multiple users Yes Yes
Batch mode execution No Yes
Scheduling No No
Multi-threaded procedures Yes Yes
Multi-tasking projects No Yes
Reporting SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
HTML Reports Yes No
SAS Publish Packages (SPK) No Yes
Web application for viewing stored models Yes Yes
Interactive Analysis SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Tree growing and pruning Yes Yes
Neural network model Yes No
Association rule: WHERE clause Yes No
Transformation bin allocation Yes No
Filter outliers selection Yes No
Link analysis Yes No
Interactive grouping Yes Yes, if your site licenses SAS Credit Scoring. The SAS Enterprise Miner Interactive Grouping Desktop Application enables you to modify the groupings interactively.
Decision threshold charts Yes No
Decision Processing SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Class target profile matrix Yes Yes
Class target loss matrix Yes Yes
Cost values and cost variables Yes Yes
Class target variable number of decisions Yes No
Interval target decisions Yes No
Model Assessment SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Class probability score rankings: gain, lift, etc. Yes Yes
Class probability score distributions No Yes
Classification tables in node output listings Sometimes Always
Decision tables in node output listings No Yes
Type I and Type II error table in node output listings No Yes
Interval target score rankings No Yes
Interval target score distributions No Yes
Interval target prediction vs. actual target Yes. It is sample based and available in Model Manager. Yes. It is user-generated plots of exported data.
Score rankings printed in output listings No Yes
Score distributions printed in output listings No Yes
Post-model decision matrix what-if investigations Yes No
Decision threshold charts Yes No
Kolmogorov-Smirnov (KS), Receiver Operating Characteristic (ROC) index, GINI statistics No Yes
Validation data assessment Yes Yes
Train and Test data assessment Optional. You must enable it through Model Manager, Yes
ROC chart Yes Yes
Scoring SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
SAS score code Yes Yes
C and Java score code Yes Yes
Separation of residual and non-residual score code No Yes
PMML generation No Yes
Node Tools SAS Enterprise Miner 4.3 SAS Enterprise Miner 5.1
Input Data Yes Yes
Sampling Yes Yes
Partitioning Yes Yes
Time Series Yes Yes
Variable Selection Yes Yes
Clustering Yes. The procedure FASTCLUS is used. Yes. The procedure DMVQ is used to support class variables, score code, and PMML.
SOM Yes No
Link Analysis Yes No
Insight Yes No. Graphs can be generated from the Results window of any node.
Distribution Explorer Yes No. Graphs can be generated from the Results window of any node.
Multiplot Yes Yes
StatExplore No Yes. This node computes univariate and bivariate distribution statistics for interval and class variables. Target and segment variables are used as by variables and/or correlation terms.
Merge No Yes. This node merges training, test, and validation data sets by row number or by ID variable. It is useful for combining predictions from multiple models or for matching ID in multiple tables.
Association Yes Yes, but the Results window does not support filtering rules and scatter plot for items. This node supports network display of rules. It generates a transposed data set that has one row per customer and variables for rules. The transposed data set can be used to cluster or predict customer behavior by rules.
Path Analysis No Yes. This node uses the new PATH procedure that includes a referrer variable for Web log analysis.
Transform Yes Yes, but this node does not support user-specified equations. It supports the creation of dummy and interactive terms.
Drop No Yes. This node drops variables from temporary tables for processing efficiency.
Filter Yes Yes, but this node does not support graphical selection of filter ranges.
Impute Yes, this node is the Replacement node. Yes
Principal Components Yes. It is in the Princomp/Dmneural node. Yes. This node does not support the selection of number of components in the Results window.
Regression (linear and logistic) Yes Yes
Dmine Regression No. Dmine regression is available as an optional output from the Variable Selection node. Yes. This node uses the DMINE procedure to produce models that directly include the Analysis of Variance (AOV), group, and interaction effects for interval and binary targets.
Decision Tree Yes. Use the Tree Desktop Application for interactive training. Yes. Use the Tree Desktop Application for interactive training.
Neural Network Yes Yes. This node does not support interactive training or advanced user network configuration.
Rule Induction No Yes. This node uses an algorithm for building models by recursively identifying target events. It is useful for modeling rare events. This functionality was formerly included in DMTOOL.
Autoneural No Yes. This node uses an algorithm for automated MLP network building. It selects the type and number of activation functions from four different architectures. This functionality was formerly included in DMTOOL.
DMNeural Yes. This node is part of the Princomp/DMNeural node. Yes.
Two Stage Model Yes Yes. You can specify the options for the first and second stage models, and the neural network models that have two targets.
Memory-Based Reasoning Yes. This node is not recommended for score deployment, because it requires training table availability. Yes.
Ensemble Yes Yes. The node supports simple averaging and voting methods. It does not support bagging and boosting models.
Model Comparison Yes. It is the Assessment node. Yes. This node computes ROC and KS and automatically selects a model based on your selection.
Segment Profile No Yes. This node uses a designated segment variable to segment the data set based on input variables that discriminate between a segment and the entire data set. Additionally, the node generates graphics and summary information to profile the generated segments.
Group Processing Yes No
Subdiagram Yes No
Control Point Yes Yes
SAS Code Yes Yes. This node provides extended support through better organized macros and macro variables. It supports building model and model assessment functions, and the creation of report tables and plots.
Score Yes Yes. If score data is defined, the node always scores data to create output view and table.
Score Converter Yes No. C and Java code are included in the SPK results package. PMML code for decision trees is available on a request basis.
User Defined Model Yes No
Reporter Yes No. Reports in the SPK format can be generated from any node.
Data Set Attributes Yes Yes. The Metadata node in SAS Enterprise Miner 5.1 replaces the Data Set Attributes node in SAS Enterprise Miner 4.3.
Data Mining Database Yes N/A
Interactive Grouping Yes. This node supports user-driven grouping of variable levels and bins based on GINI, Information Gain, and Weight of Evidence (WOE) scores. Yes, if your site licenses SAS Credit Scoring.
Scorecard Yes, if your site licenses SAS Credit Scoring. Yes, if your site licenses SAS Credit Scoring.
Reject Inference Yes, if your site licenses SAS Credit Scoring. Yes, if your site licenses SAS Credit Scoring.
Credit Exchange Yes, if your site licenses SAS Credit Scoring. Yes, if your site licenses SAS Credit Scoring.

Network Performance

All SAS computations are performed on the server and are unaffected by the client/server network performance. All project information such as functional settings and intermediate data sets are stored on the SAS server. GUI operations (such as editing a process flow diagram, property sheets, and variables tables) depend on transferring data from the SAS server to the client GUI, and are affected by both the server's availability and client/server network performance. SAS Enterprise Miner 5.1 is designed to be tolerant of network disconnections and will clean up resources that include SAS sessions, accordingly. A reliable 512Kbps or greater network bandwidth is recommended for reasonable client/server performance.


Multi-Tasking

The SAS Enterprise Miner 5.1 client will start a SAS session for interacting with the server and submitting user-entered SAS code. In the default configuration, SAS Enterprise Miner 5.1 starts a separate SAS session for each parallel branch in the process flow diagram. Also, a user of SAS Enterprise Miner 5.1 can start several diagrams that run simultaneously. Each new diagram starts a new SAS session.

Note
: It is possible for a user on a single CPU desktop system to start so many SAS sessions that overall system performance is seriously degraded. 


Threading

Some tasks in SAS Enterprise Miner 5.1 (such as data sorting, variable selection, and regression modeling) have been rebuilt to distribute their work over multiple CPUs on the same system.


Models and Computation

SAS Enterprise Miner 4.3 and 5.1 use the same SAS procedures for core modeling and summarization, therefore, you should get the same regression results when you use either release of the software.

SAS Enterprise Miner 4.3 and 5.1 generate SAS code to manage the use of SAS procedures. Because the SAS code that is generated by each of these releases of SAS Enterprise Miner is different, you will get different results in analytical metadata management and derived functions.


Analytical Metadata

SAS Enterprise Miner 4.3 uses the following attributes to maintain analytical metadata for variables:

SAS Enterprise Miner 5.1 contains the following additional attributes:

Use the %EMDS macro to create a data source definition that has additional attributes for variables that are not used by Enterprise Miner but that are included in reports that Enterprise Miner creates.


Metadata Sample

SAS Enterprise Miner 4.3 creates a metadata sample (which has 2,000 observations or less, by default) for data summarization, graphics, and interactive modeling. The metadata sample is downloaded and persists on the client system. SAS Enterprise Miner 5.1 is a completely server-based system that does not use a metadata sample.


SAS Servers

SAS Enterprise Miner 4.3 and 5.1 do not require different server installation. However, a SAS Enterprise Miner 4.3 server requires the SAS connect spawner, and SAS Enterprise Miner 5.1 server requires the SAS object spawner.

The SAS Enterprise Miner 5.1 server is responsible for project storage and computations. Having more memory available improves both computational and GUI performance. A minimum of 512MB memory and 1GB or more is recommended for optimal performance.

Note: SAS Enterprise Miner 5.1 was not tested on z/OS operating systems. Currently, SAS Enterprise Miner 5.1 requires the HFS or UNIX environment under z/OS. 


Java Requirement

SAS Enterprise Miner 5.1 requires Java release 1.4.1. The SAS base language and procedures are not being ported to Java and will continue to run in a threaded SAS System.


Java Middleware

The Java middleware is a new feature in SAS Enterprise Miner 5.1, but it is not a required component. The middleware is useful when you want to disconnect the client from a running diagram, or to facilitate multiple users who are sharing SAS servers and a SAS Open Metadata Server.


Java Client

The Java client is based on Java Swing GUI libraries that are part of Java 1.4.1.  Following is a list of performance considerations for the SAS Enterprise Miner 5.1 Java client:


Enterprise Miner Model Repository

SAS Enterprise Miner 4.3 and SAS Enterprise Miner 5.1 can register models to the Enterprise Miner model repository. The repository is a member of the SAS Metadata Server (SMS). When you register a SAS Enterprise Miner model, details of the model are stored in the SMS. Additional details can be stored in a WEBDAV server, if that is available, and then configured.

The Model Repository Viewer is a Web application that queries and presents models that are found in the Enterprise Miner model repository. This function is useful in archiving models over long periods of time and in distributing models to people who are not users of Enterprise Miner such as business managers and database administrators. The components of the Model Repository Viewer can be used to build custom applications for Web delivery of data mining models.


Batch Processing

SAS Enterprise Miner 5.1 supports SAS language-based batch processing by using macros. You can use all these macros together in one or in many SAS jobs that conduct complete data-mining model building without using the GUI in SAS Enterprise Miner 5.1, or you can use the macros with the GUI in a complementary cycle.


Java API

SAS Enterprise Miner 5.1 supports a Java language API. See the SAS Enterprise Miner 5.1 documentation for usage information. Java API is useful when you are building a Java language application and want to include data mining functionality. For example, if you build a Web portal to neural networks, your Java servlet code can use the SAS Enterprise Miner 5.1 Java API to run the SAS neural network. The Enterprise Miner Java API is experimental in SAS 9.1.