What’s New in SAS/STAT 14.1 High-Performance Procedures

Highlights of Enhancements in SAS/STAT 13.2 High-Performance Procedures

Some users might be unfamiliar with updates that were made in the previous release. Following are highlights of enhancements for the SAS/STAT 13.2 high-performance procedures:

  • The high-performance analytical infrastructure was enhanced to include BY-group processing support for data that are predistributed in databases, Hadoop, SAS LASR Analytic Servers, and SAP HANA; to remove the need to specify the GRIDDATASERVER= option in the PERFORMANCE statement or set the GRIDDATASERVER environment variable when you run high-performance analytical procedures alongside databases; to remove the need to specify the GRIDMODE= option in the PERFORMANCE statement or set the GRIDMODE environment variable; to automatically determine the mode in which each data set in a single procedure step is accessed; and to enable the high-performance analytical procedures to read data from and write data to SAP HANA asymmetrically in parallel.

  • Two new procedures were added:

    • The HPPLS procedure fits models by using any of several linear predictive methods, including partial least squares (PLS), to optimally address one or both of these two goals: explaining response variation and explaining predictor variation.

    • The HPQUANTSELECT procedure performs high-performance quantile regression analysis. PROC HPQUANTSELECT not only fits quantile regression models but also offers extensive capabilities for quantile regression model selection, and it supports statistical inferences with or without the assumption of independently and identically distributed (iid) errors.

  • Two procedures were enhanced:

    • The HPLOGISTIC procedure was enhanced to enable you to divide the observations in the input data set into disjoint subsets for model training, validation, and testing; control the classification of events and nonevents; use the validation data set during the selection process; input your own starting values for the optimization; create data for receiver operating characteristic (ROC) curves; specify population prevalences that are used to adjust displayed statistics; output the posterior probabilities; and output the partition to which the observation is assigned. In addition, the AIC, BIC, and AICC criteria were added to the SELECT= option in the SELECTION statement.

    • The HPGENSELECT procedure added the PARTITION statement, which enables you to specify how observations in the input data set are to be logically partitioned into disjoint subsets for model training, validation, and testing. Models are fit and selected based on the training data. You can use the validation and test sets to assess how the selected model generalizes on data that played no role in selecting the model.