What’s New In SAS LASR Analytic Server

Overview

SAS LASR Analytic Server 2.3 includes the following changes:
  • Analytic features for the IMSTAT procedure
  • The RECOMMEND procedure and support for recommender applications
  • Support for SAS In-Memory Statistics for Hadoop
  • Support for in-memory text analytics
  • Enhancements to data and server management for the IMSTAT procedure
  • Documentation enhancements

Analytic Features for the IMSTAT Procedure

The IMSTAT procedure is enhanced with numerous statements that enable in-memory analytics. These statements are included in IMSTAT Procedure (Analytics). They are licensed separately from the data and server management statements.
The GLM, LOGISTIC, and GENMODEL statements are available for performing a variety of modeling techniques. Each of these statements have the following features:
  • You can specify the CODE option so that the server generates and saves SAS scoring code.
  • You can assign a role variable to divide the data into training and validation sets. Alternatively, you can request that the server divides the data at random into these sets by specifying the proportion for validation data. See the ROLEVAR=, SEED=, and VALIDATE= options in these statements.
  • You can apply group-by filtering by passing the name of a temporary table generated by the GROUPBY statement and you can define rules for extracting groups from the group-by set. This enables you to restrict model fitting to groups of interest. For example, groups in which the average of some measure exceeds a particular value, or fitting models across groups in stages to prevent overwhelming the SAS session.
  • You can specify the INFORMATIVE option to account for missing values in the data with an informative missing algorithm. The option adds additional dummy effects to the model and replaces missing values with the effect mean. The coefficients for the dummy effects estimate the difference between the response predicted by the missing value grouping and the predicted response if the effect is evaluated at its mean. This enables you to use all the data in estimating and scoring a model, without additional imputation steps.
The RANDOMWOODS statement generates a collection of decision trees. Each tree is built from a bootstrap sample of the data and each tree is based on a random selection of variables. The collection of trees can be used generally as classification and regression trees. The current implementation is for use in classification problems. You can also specify a CODE option to generate and save SAS scoring code.
The ASSESS statement is used to assess the quality of one or more statistical models. For a set of classification models, you can compute model lift, receiver-operating characteristic (ROC), and concordance statistics. You can also apply the ASSESS statement to regression-type models.

RECOMMEND Procedure

The RECOMMEND procedure is a new procedure that enables you develop a recommender system. A common goal for a recommender system is to make personalized recommendations to individuals who lack the capacity, experience, or resources to select from a potentially overwhelming list of choices.
The procedure enables building content-based recommender systems and collaborative filtering systems.

SAS In-Memory Statistics for Hadoop

SAS LASR Analytic Server and the analytic statements for the IMSTAT procedure provide the core features that are available with SAS In-Memory Statistics for Hadoop. SAS/ACCESS Interface for Hadoop is included and enables you to access your data that is stored in HDFS in a variety of formats.
The bundle includes other products, such as SAS Studio and SAS/STAT. This document covers the features available in the server and the IMSTAT procedure.

Support for Text Analytics

SAS LASR Analytic Server can work on unstructured text, such as social media feeds, news articles, and arbitrary collection of documents. The server uses text analytics to turn the unstructured text into numbers and counts and weights, and terms and topics with relationships, suitable for visualization and exploration. For more information, see the TEXTPARSE Statement and Text Analytics in SAS LASR Analytic Server.

Enhancements to Data and Server Management for the IMSTAT Procedure

The statements that were introduced in previous releases are included in IMSTAT Procedure (Data and Server Management). Some of the enhancements are as follows:
  • A BATCHMODE option is added to the procedure. When the option is enabled and an error occurs, the procedure terminates and sets the SYSERR macro variable.
  • The SCORE statement is enhanced as follows:
    • You can specify the names of additional in-memory tables that contain information about key-value pairs for DATA step hash objects. For more information, see the HASHDATA option.
    • The DSRETAIN option is added. When the option is specified, scoring code behaves like the DATA step with respect to retention of output symbols.
  • The STORE statement is enhanced to support complex expressions. For example, you can build a string for a WHERE clause from the contents of IMSTAT procedure result tables.
  • The FETCH statement is enhanced as follows:
    • You can specify formats for the variables to retrieve.
    • You can specify an ORDERBY= option to retrieve rows from the server that are sorted.
    • You can specify format instructions for ORDERBY= variables and you can choose whether each variable is sorted by unformatted or formatted values. You can also specify whether the sort order is ascending or descending.
The TABLEINFO statement is enhanced with the PARTVARS option. This option enables displaying the names of the partition variables and order-by variables for tables.

Documentation Enhancements

Information about working with HDFS has been added. See Common HDFS Commands.