The HPPLS Procedure

PROC HPPLS Features

The main features of the HPPLS procedure are as follows:

  • supports GLM and reference parameterization for classification effects

  • permits any degree of interaction effects that involve classification and continuous variables

  • supports partitioning of data into training and testing roles

  • supports test set validation to choose the number of extracted factors, where the model is fit to only part of the available data (the training set) and the fit is evaluated over the other part of the data (the test set)

  • produces an output data set that contains predicted values and other observationwise statistics

The HPPLS procedure implements the following techniques:

  • principal components regression, which extracts factors to explain as much predictor sample variation as possible

  • reduced rank regression, which extracts factors to explain as much response variation as possible. This technique, also known as (maximum) redundancy analysis, differs from multivariate linear regression only when there are multiple responses.

  • partial least squares regression, which balances the two objectives of explaining response variation and explaining predictor variation. Two different formulations for partial least squares are available: the original predictive method of Wold (1966) and the straightforward implementation of a statistically inspired modification of the partial least squares (SIMPLS) method of DeĀ Jong (1993).

Because the HPPLS procedure is a high-performance analytical procedure, it also does the following:

  • enables you to run in distributed mode on a cluster of machines that distribute the data and the computations when you license SAS High-Performance Statistics

  • enables you to run in single-machine mode on the server where SAS is installed

  • exploits all the available cores and concurrent threads, regardless of execution mode

For more information, see the section Processing Modes in ChapterĀ 3: Shared Concepts and Topics.