What’s New in SAS Enterprise Miner 7.1
Overview
SAS Enterprise Miner
7.1 provides many improvements and new functions in the areas of administration,
user interface, and modeling to enhance the overall data mining experience.
Administration
Installation, configuration,
and administration have been significantly changed in SAS Enterprise
Miner 7.1. The most important fact regards the required version of
SAS. SAS Enterprise Miner 7.1 is a component of SAS 9.3 and will not
function with any other SAS release.
System architecture
changes aim to simplify the single user experience as well as to increase
the scalability and conformity to standards of the multi-user experience.
The foremost change regards the mid-tier technology: the SAS Analytics
Platform server has been deprecated. The SAS Analytics Platform service
is not used for any SAS 9.3 products or solutions. Existing deployments
might disable and remove this service once the new installation is
complete.
SAS Enterprise Miner
7.1 can be installed and configured in one of two modes. Both configurations
are significantly changed for SAS 9.3:
-
In
workstation mode,
SAS Foundation 9.3 and SAS Enterprise Miner 7.1 are deployed on a
Microsoft Windows system in a single user configuration. This configuration
is indicated for SAS Enterprise Miner Desktop, SAS Enterprise Miner
Classroom, and SAS Enterprise Miner Workstation licenses. This deployment
does not require the configuration step of the SAS Deployment Wizard
and installing users should not select a configuration plan option.
The workstation mode configuration does not require the SAS Metadata
Server or the SAS Application Server. Installations based on SAS
9.2 and earlier did require those services. However, they can be removed
if they are not required for any other SAS software.
-
In
client /
server mode, SAS Foundation 9.3 and SAS Enterprise
Miner 7.1 Server can be installed on a local or remote system for
multi-user access. The SAS Web Infrastructure Platform is installed
as mid-tier server. The SAS Enterprise Miner 7.1 client can be installed
on a Microsoft Windows system, or can be started through Java Web
Start by connecting your Internet browser to the SAS mid-tier.
Migration
SAS Enterprise Miner
stores data in three potential locations. Data in each location can
be migrated to SAS 9.3.
-
Configuration and user information
stored in the SAS Metadata Server can be migrated using the SAS Migration
Utility and the SAS Deployment Wizard.
-
Data Mining project data does not
need to be migrated if the SAS Server platform is not changed. If
the platform is changed, (for example, from Microsoft Windows XP to
Microsoft Windows 7), users should make use of the SAS Enterprise
Miner Project Migration Macro available at
http://www.sas.com/apps/demosdownloads/emmigproj_PROD__sysdep.jsp?packageID=000738
on
the SAS Web site.
-
Registered models can have included
storage of the model package file on an industry standard WebDAV server.
A client / server Enterprise Miner 7.1 installation includes the
SAS Framework Server, which can be used for model package storage.
If Enterprise Miner users change their WebDAV repository, they will
need to archive and relocate their model package files manually.
Enterprise Miner User Interface Enhancements
Improved Integration
The main SAS Program
Editor, Log, Output, and Graphs windows are integrated into a single
tabbed dialog box interface. This change reduces window clutter
inside the application.
Project Log Window
A new
Project
Log window has been added that will display SAS log lines
that are generated by the main application. This feature separates
the system-generated log lines from the user-generated log lines.
The
Project Log window will be especially
useful for providing system information and for performing debugging
tasks.
Library Explorer Window
The
Library
Explorer window now shows the contents of all diagram
libraries in Read-Only mode. This change makes it easier for users
to find detailed project data. The change also protects against accidental
locking or alterations to system files.
Diagram Workspace Log Viewer
Each Diagram Workspace
window now includes a log viewer that shows the log lines that were
generated by the diagram process. This feature makes it easier to
trace diagram activity.
Updated PMML
SAS Enterprise Miner
7.1 is now PMML 4.0 compliant.
System *.DMP File Association
Workstation mode Enterprise
Miner 7.1 users can select and activate a data mining project file
(*.dmp) from the file system to start Enterprise Miner and load the
selected data mining project..
Local Project Model Import
In Enterprise Miner
7.1, the new local project model import feature enables you to move
a project report package to a model import node in a diagram, in order
to compare a new model to one that was previously packaged but not
necessarily registered. In prior releases of Enterprise Miner, you
could import only registered models.
You can import model
result packages in one of two ways:
-
Drag and drop a model result package
from the Enterprise Miner project tree to a process flow diagram,
creating a model import node with the correct property values.
-
Place a model import node on a
process flow diagram, and then select a property that enables you
to choose a model package from the project tree. The model package
retains its existing property configurations.
Mining Results Web Service
The Mining Results Web
Service communicates with the SAS Metadata Server to get information
about Enterprise Miner mining result models.
The Mining Results Web
Service supports the following actions:
-
-
Search for model by partial value
of some property
-
Get details of a selected model
-
Get details of list of models
-
Get SPK file if available
-
Register model from SPK file
Rapid Predictive Modeler
SAS Rapid Predictive
Modeler is a component of SAS Enterprise Miner that packages standard
and best practice predictive model building diagrams for many scenarios
within the SAS Enterprise Guide and SAS Add-in for Microsoft Office
frameworks. This function has been enhanced with options for integrated
scoring and data set output.
Enhanced Enterprise Miner Nodes
LARs Node
The LARs (Least Angle
Regression) node for Enterprise Miner 7.1 now can model both interval
and binary targets. If the target is binary, a logistic regression
based on the linear combination of the selected variable is fitted.
The LASSO (Least Absolute Shrinkage and Selection Operator) method
for LARs has been augmented to handle binary variables.
Decision Tree Node
The Decision Tree node
for Enterprise Miner 7.1 has added two new properties to the Split
Search grouping. The new properties determine whether to use PROC
ARBOR
decision information
or PROC ARBOR
prior information
during tree split searches.
-
Use Decisions indicates
whether to use decision information (if present) during the split
search. The default value is No.
-
Use Priors indicates
whether to use prior information (if present) during the split search.
The default value is No.
-
NODEID information has been integrated
into the Tree diagrams in the Decision Tree Results browser.
-
Decision Tree performs sampling
before launching interactive training sessions. This makes a significant
performance improvement during interactive training.
-
The Interactive Decision Tree application
provides a new subtree sequence feature that lets users select a subtree
from a Decision Tree Assessment plot and use it as the current model.
Scorecard Node
The Scorecard node for
Enterprise Miner 7.1 adds a new property to the Adverse Characteristic
grouping on the Scorecard property panel. The new property,
Generate
Report, is a binary setting that indicates whether the
user wants adverse characteristics included in the score code. The
Generate
Report property identifies adverse characteristics for
all exported observations.
If users set
Generate
Report to Yes, the additional report is included in the
Scorecard node Results. The Adverse Characteristics report will be
a bar chart. The report also generates three additional
adverse_x columns
in the scored training table that the Scorecard node exports.
The Scorecard node Properties
also has a new
Scaling Properties group.
The
Reverse Scorecard property is a simple
Boolean property with a default of No.
IGN Node
The Interactive Grouping
(IGN) Node for Enterprise Miner 7.1 includes a new method for performing
grouping for input variables. The
Constrained Optimal grouping
method adds new values to both the
Interval Grouping Method and
Ordinal
Grouping Method groups in the IGN Properties Panel, as
well as several new supporting properties under
Constrained
Optimal Options and
Advanced Constrained
Options. This functionality extends previous grouping
methods by surfacing several new constraints that must be met while
determining the grouping definitions. It also provides users with
the flexibility to assign constraints to individual variables one
at a time.
RPM Node
The Rapid Prototype
Modeling (RPM) node has been enhanced to allow users to specify the
RPM project name.
New Enterprise Miner 7.1 Nodes
Survival Node
The Enterprise Miner
7.1 Survival node performs survival analysis on mining customer databases
when there are time-dependent outcomes. The data mining survival
analysis is designed to implement
discrete time
to event multinomial logistic regressions that
are additive and define the hazard and sub-hazard functions. In discrete
time to event modeling, the event time represents the duration from
the inception (start) time until the outcome date (event). The resulting
event time is always a positive integer quantity.
The time effect is modeled
with cubic splines to allow for flexible shapes of hazard functions.
The proportional hazard function is fitted with no time varying covariates.
The Survival node includes
functional modules to perform data preparation which includes censoring,
data expansion to expand the data to one record for each customer
per discrete time unit, sampling to reduce the expanded data set size
for optimal data mining without information loss, and survival modeling,
validation, reporting, and scoring.
Insurance Rate Making Node
The new Ratemaking node
uses a fast, highly scalable procedure that builds generalized linear
models (GLMs). The node builds common distribution and link functions
to build models for claim count (Poisson or negative binomial distribution
with a log link function) and severity (gamma distribution with a
log link function).
An implementation of
the Tweedie distribution to model pure premium is available in the
new Ratemaking node. There are several optimization techniques to
choose from when using the Tweedie distribution. You can use an extended
quasi-likelihood function to estimate the parameters of the model.
A full likelihood implementation of the Tweedie distribution is available
as well.
The analytical results
that the Ratemaking node displays are specific to the insurance industry.
For example, relativity plots for all log-link models are displayed
for all input variables. Actual versus predicted count plots are available
for count models such as the Poisson count model or a zero-inflated
Poisson count model.
Experimental Enterprise Miner 7.1 Nodes
SVM Node
A support vector machine
(SVM) is a supervised machine learning method that is used to perform
classification and regression analysis. The SVM uses a hyperplane
or a set of hyperplanes to separate points mapped on a higher dimensional
space. The collections of data points that are used to construct the
hyperplanes are called support vectors.
The Enterprise Miner
7.1 SVM node uses PRCC SVM and PROC SVMSCORE. The SVM node supports
binary classification problems, including polynomial, radial basis
function and sigmoid nonlinear kernels. The SVM node does not support
multiclass problems or support vector regression.
Time Series Data Preparation Node
The new Time Series
Data Preparation node in Enterprise Miner enables users to manipulate
transaction and time series data to facilitate time series data mining.
The new node provides several types of time series data manipulation
tools, including time interval definitions, data transformations and
transpositions, data differencing, and missing value assignments.
Time Series Similarity Node
The new Time Series
Similarity node computes similarity measures for time-stamped data
with respect to time using a dynamic time warping method. The tool
does so by accumulating the data into a time series format, and then
it computes similarity measures for sequentially ordered numeric data
by respecting the ordering of the data.
The Time Series Similarity
node also provides controls that enable modelers to specify parameters
such as similarity measure, sequence sliding, normalization, interval,
accumulation, similarity matrix, hierarchical clustering, as well
as expanded and compressed sliding sequence ranges.
Time Series Exponential Smoothing Node
The Time Series Exponential
Smoothing node generates forecasts by using exponential smoothing
models that have optimized smoothing weights for time series data.
Time Series Exponential
Smoothing node offers forecasting models which include single exponential
smoothing, double exponential smoothing, linear exponential smoothing,
Damped Trend exponential smoothing, additive seasonal exponential
smoothing, multiplicative seasonal exponential smoothing,Winters multiplicative
method, and Winters additive method.
The Time Series Exponential
Smoothing node also provides modelers with the ability to detect and
replace outliers, to export some distance matrices, and to extend
input time series to future values.
Copyright © SAS Institute Inc. All rights reserved.