What’s New in SAS Enterprise Miner 7.1

Overview

SAS Enterprise Miner 7.1 provides many improvements and new functions in the areas of administration, user interface, and modeling to enhance the overall data mining experience.

Administration

Installation, configuration, and administration have been significantly changed in SAS Enterprise Miner 7.1. The most important fact regards the required version of SAS. SAS Enterprise Miner 7.1 is a component of SAS 9.3 and will not function with any other SAS release.
System architecture changes aim to simplify the single user experience as well as to increase the scalability and conformity to standards of the multi-user experience. The foremost change regards the mid-tier technology: the SAS Analytics Platform server has been deprecated. The SAS Analytics Platform service is not used for any SAS 9.3 products or solutions. Existing deployments might disable and remove this service once the new installation is complete.
SAS Enterprise Miner 7.1 can be installed and configured in one of two modes. Both configurations are significantly changed for SAS 9.3:
  • In workstation mode, SAS Foundation 9.3 and SAS Enterprise Miner 7.1 are deployed on a Microsoft Windows system in a single user configuration. This configuration is indicated for SAS Enterprise Miner Desktop, SAS Enterprise Miner Classroom, and SAS Enterprise Miner Workstation licenses. This deployment does not require the configuration step of the SAS Deployment Wizard and installing users should not select a configuration plan option. The workstation mode configuration does not require the SAS Metadata Server or the SAS Application Server. Installations based on SAS 9.2 and earlier did require those services. However, they can be removed if they are not required for any other SAS software.
  • In client / server mode, SAS Foundation 9.3 and SAS Enterprise Miner 7.1 Server can be installed on a local or remote system for multi-user access. The SAS Web Infrastructure Platform is installed as mid-tier server. The SAS Enterprise Miner 7.1 client can be installed on a Microsoft Windows system, or can be started through Java Web Start by connecting your Internet browser to the SAS mid-tier.

Migration

SAS Enterprise Miner stores data in three potential locations. Data in each location can be migrated to SAS 9.3.
  • Configuration and user information stored in the SAS Metadata Server can be migrated using the SAS Migration Utility and the SAS Deployment Wizard.
  • Data Mining project data does not need to be migrated if the SAS Server platform is not changed. If the platform is changed, (for example, from Microsoft Windows XP to Microsoft Windows 7), users should make use of the SAS Enterprise Miner Project Migration Macro available at http://www.sas.com/apps/demosdownloads/emmigproj_PROD__sysdep.jsp?packageID=000738 on the SAS Web site.
  • Registered models can have included storage of the model package file on an industry standard WebDAV server. A client / server Enterprise Miner 7.1 installation includes the SAS Framework Server, which can be used for model package storage. If Enterprise Miner users change their WebDAV repository, they will need to archive and relocate their model package files manually.

Enterprise Miner User Interface Enhancements

Improved Integration

The main SAS Program Editor, Log, Output, and Graphs windows are integrated into a single tabbed dialog box interface. This change reduces window clutter inside the application.

Project Log Window

A new Project Log window has been added that will display SAS log lines that are generated by the main application. This feature separates the system-generated log lines from the user-generated log lines. The Project Log window will be especially useful for providing system information and for performing debugging tasks.

Library Explorer Window

The Library Explorer window now shows the contents of all diagram libraries in Read-Only mode. This change makes it easier for users to find detailed project data. The change also protects against accidental locking or alterations to system files.

Diagram Workspace Log Viewer

Each Diagram Workspace window now includes a log viewer that shows the log lines that were generated by the diagram process. This feature makes it easier to trace diagram activity.

Updated PMML

SAS Enterprise Miner 7.1 is now PMML 4.0 compliant.

System *.DMP File Association

Workstation mode Enterprise Miner 7.1 users can select and activate a data mining project file (*.dmp) from the file system to start Enterprise Miner and load the selected data mining project..

Local Project Model Import

In Enterprise Miner 7.1, the new local project model import feature enables you to move a project report package to a model import node in a diagram, in order to compare a new model to one that was previously packaged but not necessarily registered. In prior releases of Enterprise Miner, you could import only registered models.
You can import model result packages in one of two ways:
  • Drag and drop a model result package from the Enterprise Miner project tree to a process flow diagram, creating a model import node with the correct property values.
  • Place a model import node on a process flow diagram, and then select a property that enables you to choose a model package from the project tree. The model package retains its existing property configurations.

Mining Results Web Service

The Mining Results Web Service communicates with the SAS Metadata Server to get information about Enterprise Miner mining result models.
The Mining Results Web Service supports the following actions:
  • Get list of models
  • Search for model by partial value of some property
  • Get details of a selected model
  • Get details of list of models
  • Get SPK file if available
  • Register model from SPK file

Rapid Predictive Modeler

SAS Rapid Predictive Modeler is a component of SAS Enterprise Miner that packages standard and best practice predictive model building diagrams for many scenarios within the SAS Enterprise Guide and SAS Add-in for Microsoft Office frameworks. This function has been enhanced with options for integrated scoring and data set output.

Enhanced Enterprise Miner Nodes

LARs Node

The LARs (Least Angle Regression) node for Enterprise Miner 7.1 now can model both interval and binary targets. If the target is binary, a logistic regression based on the linear combination of the selected variable is fitted. The LASSO (Least Absolute Shrinkage and Selection Operator) method for LARs has been augmented to handle binary variables.

Decision Tree Node

The Decision Tree node for Enterprise Miner 7.1 has added two new properties to the Split Search grouping. The new properties determine whether to use PROC ARBOR decision information or PROC ARBOR prior information during tree split searches.
  • Use Decisions indicates whether to use decision information (if present) during the split search. The default value is No.
  • Use Priors indicates whether to use prior information (if present) during the split search. The default value is No.
  • NODEID information has been integrated into the Tree diagrams in the Decision Tree Results browser.
  • Decision Tree performs sampling before launching interactive training sessions. This makes a significant performance improvement during interactive training.
  • The Interactive Decision Tree application provides a new subtree sequence feature that lets users select a subtree from a Decision Tree Assessment plot and use it as the current model.

Scorecard Node

The Scorecard node for Enterprise Miner 7.1 adds a new property to the Adverse Characteristic grouping on the Scorecard property panel. The new property, Generate Report, is a binary setting that indicates whether the user wants adverse characteristics included in the score code. The Generate Report property identifies adverse characteristics for all exported observations.
If users set Generate Report to Yes, the additional report is included in the Scorecard node Results. The Adverse Characteristics report will be a bar chart. The report also generates three additional adverse_x columns in the scored training table that the Scorecard node exports.
The Scorecard node Properties also has a new Scaling Properties group. The Reverse Scorecard property is a simple Boolean property with a default of No.

IGN Node

The Interactive Grouping (IGN) Node for Enterprise Miner 7.1 includes a new method for performing grouping for input variables. The Constrained Optimal grouping method adds new values to both the Interval Grouping Method and Ordinal Grouping Method groups in the IGN Properties Panel, as well as several new supporting properties under Constrained Optimal Options and Advanced Constrained Options. This functionality extends previous grouping methods by surfacing several new constraints that must be met while determining the grouping definitions. It also provides users with the flexibility to assign constraints to individual variables one at a time.

RPM Node

The Rapid Prototype Modeling (RPM) node has been enhanced to allow users to specify the RPM project name.

New Enterprise Miner 7.1 Nodes

Survival Node

The Enterprise Miner 7.1 Survival node performs survival analysis on mining customer databases when there are time-dependent outcomes. The data mining survival analysis is designed to implement discrete time to event multinomial logistic regressions that are additive and define the hazard and sub-hazard functions. In discrete time to event modeling, the event time represents the duration from the inception (start) time until the outcome date (event). The resulting event time is always a positive integer quantity.
The time effect is modeled with cubic splines to allow for flexible shapes of hazard functions. The proportional hazard function is fitted with no time varying covariates.
The Survival node includes functional modules to perform data preparation which includes censoring, data expansion to expand the data to one record for each customer per discrete time unit, sampling to reduce the expanded data set size for optimal data mining without information loss, and survival modeling, validation, reporting, and scoring.

Insurance Rate Making Node

The new Ratemaking node uses a fast, highly scalable procedure that builds generalized linear models (GLMs). The node builds common distribution and link functions to build models for claim count (Poisson or negative binomial distribution with a log link function) and severity (gamma distribution with a log link function).
An implementation of the Tweedie distribution to model pure premium is available in the new Ratemaking node. There are several optimization techniques to choose from when using the Tweedie distribution. You can use an extended quasi-likelihood function to estimate the parameters of the model. A full likelihood implementation of the Tweedie distribution is available as well.
The analytical results that the Ratemaking node displays are specific to the insurance industry. For example, relativity plots for all log-link models are displayed for all input variables. Actual versus predicted count plots are available for count models such as the Poisson count model or a zero-inflated Poisson count model.

Experimental Enterprise Miner 7.1 Nodes

SVM Node

A support vector machine (SVM) is a supervised machine learning method that is used to perform classification and regression analysis. The SVM uses a hyperplane or a set of hyperplanes to separate points mapped on a higher dimensional space. The collections of data points that are used to construct the hyperplanes are called support vectors.
The Enterprise Miner 7.1 SVM node uses PRCC SVM and PROC SVMSCORE. The SVM node supports binary classification problems, including polynomial, radial basis function and sigmoid nonlinear kernels. The SVM node does not support multiclass problems or support vector regression.

Time Series Data Preparation Node

The new Time Series Data Preparation node in Enterprise Miner enables users to manipulate transaction and time series data to facilitate time series data mining. The new node provides several types of time series data manipulation tools, including time interval definitions, data transformations and transpositions, data differencing, and missing value assignments.

Time Series Similarity Node

The new Time Series Similarity node computes similarity measures for time-stamped data with respect to time using a dynamic time warping method. The tool does so by accumulating the data into a time series format, and then it computes similarity measures for sequentially ordered numeric data by respecting the ordering of the data.
The Time Series Similarity node also provides controls that enable modelers to specify parameters such as similarity measure, sequence sliding, normalization, interval, accumulation, similarity matrix, hierarchical clustering, as well as expanded and compressed sliding sequence ranges.

Time Series Exponential Smoothing Node

The Time Series Exponential Smoothing node generates forecasts by using exponential smoothing models that have optimized smoothing weights for time series data.
Time Series Exponential Smoothing node offers forecasting models which include single exponential smoothing, double exponential smoothing, linear exponential smoothing, Damped Trend exponential smoothing, additive seasonal exponential smoothing, multiplicative seasonal exponential smoothing,Winters multiplicative method, and Winters additive method.
The Time Series Exponential Smoothing node also provides modelers with the ability to detect and replace outliers, to export some distance matrices, and to extend input time series to future values.