FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Regression Analysis

The SAS/STAT regression analysis procedures include the following:

NLIN Procedure


The NLIN procedure fits nonlinear regression models and estimates the parameters by nonlinear least squares or weighted nonlinear least squares. You specify the model with programming statements. This gives you great flexibility in modeling the relationship between the response variable and independent (regressor) variables. It does, however, require additional coding compared to model specifications in linear modeling procedures such as the REG, GLM, and MIXED procedures. The following are highlights of the NLIN procedure's features:

  • provides a high-quality automatic differentiator so that you do not need to specify first and second derivatives. You can, however, specify the derivatives if you wish.
  • solves the nonlinear least squares problem by one of the following four algorithms (methods):
    • steepest-descent or gradient method
    • Newton method
    • modified Gauss-Newton method
    • Marquardt method
  • enables you to confine the estimation procedure to a certain range of values of the parameters by imposing bounds on the estimates
  • computes Hougaard's measure of skewness
  • provides bootstrap estimates of confidence intervals for parameters and the covariance/correlation matrices of the parameter estimates
  • performs weighted estimation
  • creates an output data set that contains statistics that are calculated for each observation
  • creates a data set that contains the parameter estimates at each iteration
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a SAS data set that corresponds to any output table
  • automatically created graphs by using ODS Graphics
For further details, see NLIN Procedure

ORTHOREG Procedure


The ORTHOREG procedure fits general linear models by the method of least squares. Other SAS/STAT software procedures, such as the GLM and REG procedures, fit the same types of models, but PROC ORTHOREG can produce more accurate estimates than other regression procedures when your data are ill-conditioned. The following are highlights of the ORTHOREG procedure's features:

  • uses Gentleman-Givens transformations to update and compute the upper triangular matrix R of the QR decomposition of the data matrix
  • enables you to construct special collections of columns for design matrices
  • produces a display of the fitted model and provides options for changing and enhancing the displays
  • enables you to perform F tests for model effects that test Type I, Type II, or Type III hypotheses
  • enables you to obtain custom hypothesis tests
  • computes and compares least squares means (LS-means) of fixed effects
  • provides a general mechanism for performing a partitioned analysis of the LS-means for an interaction
  • enables you to save the context and results of the statistical analysis in an item store, which can be processed by the PLM procedure
  • performs weighted estimation
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates SAS data sets for analysis of variance, fit statistics, table of class variables, and parameter estimates
For further details, see ORTHOREG Procedure

PLM Procedure


The PLM procedure performs postfitting statistical analyses for the contents of a SAS item store that was previously created with the STORE statement in some other SAS/STAT procedure. An item store is a special SAS-defined binary file format used to store and restore information with a hierarchical structure. The following are highlights of the PLM procedure's features:

  • performs custom hypothesis tests
  • computes confidence intervals
  • produces prediction plots
  • scores a new data set
  • enables you to filter the results
  • offers the most advanced postprocessing techniques available in SAS/STAT including the following:
    • step-down multiplicity adjustments for p-values
    • F tests with order restrictions
    • analysis of means (ANOM)
    • sampling-based linear inference based on Bayes posterior estimates
For further details, see PLM Procedure

PLS Procedure


The PLS procedure fits models by using any one of a number of linear predictive methods including partial least squares (PLS). Ordinary least squares regression, as implemented in SAS/STAT procedures such as PROC GLM and PROC REG, has the single goal of minimizing sample response prediction error, seeking linear functions of the predictors that explain as much variation in each response as possible. The techniques implemented in the PLS procedure have the additional goal of accounting for variation in the predictors, under the assumption that directions in the predictor space that are well sampled should provide better prediction for new observations when the predictors are highly correlated. All of the techniques implemented in the PLS procedure work by extracting successive linear combinations of the predictors, called factors (also called components, latent vectors, or latent variables), which optimally address one or both of these two goals—explaining response variation and explaining predictor variation. In particular, the method of partial least squares balances the two objectives, seeking factors that explain both response and predictor variation. The following are highlights of the PLS procedure's features:

  • implements the following techniques:
    • principal components regression, which extracts factors to explain as much predictor sample variation as possible
    • reduced rank regression, which extracts factors to explain as much response variation as possible. This technique, also known as (maximum) redundancy analysis, differs from multivariate linear regression only when there are multiple responses.
    • partial least squares regression, which balances the two objectives of explaining response variation and explaining predictor variation. Two different formulations for partial least squares are available: the original predictive method of Wold (1966) and the SIMPLS method of de Jong (1993).
  • enables you to choose the number of extracted factors by cross validation
  • enables you to use the general linear modeling approach of the GLM procedure to specify a model for your design, allowing for general polynomial effects as well as classification or ANOVA effects
  • enables you to save the fitted model in a data set and apply it to new data by using the SCORE procedure
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates an output data set to receive quantities that can be computed for every input observation, such as extracted factors and predicted values
  • automatically creates graphs by using ODS Graphics
For further details, see PLS Procedure

REG Procedure


The REG procedure is a general purpose procedure for ordinary least squares regression. The following are highlights of the REG procedure's features:

  • supports multiple MODEL statements
  • provides nine model-selection methods
  • allows interactive changes both in the model and the data used to fit the model
  • supports linear equality restrictions on parameters
  • provides tests of linear hypotheses and multivariate hypotheses
  • provides collinearity diagnostics
  • computes predicted values, residuals, studentized residuals, confidence limits, and influence statistics
  • allows correlation or crossproduct input
  • saves requested statistics to SAS data sets
  • enables you to save the fitted model to an item store, which can be processed by the PLM procedure
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • perform weighted estimation
  • create a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see REG Procedure

RSREG Procedure


The RSREG procedure uses the method of least squares to fit quadratic response surface regression models. Response surface models are a kind of general linear model in which attention focuses on characteristics of the fit response function and in particular, where optimum estimated response values occur. The following are highlights of the RSREG procedure's features:

  • performs a lak of fit test
  • enables you to test for the significance of individual factors
  • enables you to analyze the canonical structure of the estimated response surface
  • computes the ridge of optimum response
  • predicts new values of the response
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • performs weighted estimation
  • creates a SAS data set that contains statistics for each observation in the input data set
  • creates a SAS data set that corresponds to any table
  • automatically produces graphs by using ODS Graphics
For further details, see RSREG Procedure

TRANSREG Procedure


The TRANSREG (transformation regression) procedure fits linear models, optionally with smooth, spline, Box-Cox, and other nonlinear transformations of the variables. The following are highlights of the TRANSREG procedure's features:

  • enables you to fit linear models including:
    • ordinary regression and ANOVA
    • metric and nonmetric conjoint analysis (Green and Wind 1975; de Leeuw, Young, and Takane 1976)
    • linear models with Box-Cox (1964) transformations of the dependent variables
    • regression with a smooth (Reinsch 1967), spline (de Boor 1978; van Rijckevorsel 1982), monotone spline (Winsberg and Ramsay 1980), or penalized B-spline (Eilers and Marx 1996) fit function
    • metric and nonmetric vector and ideal point preference mapping (Carroll 1972)
    • simple, multiple, and multivariate regression with variable transformations (Young, de Leeuw, and Takane 1976; Winsberg and Ramsay 1980; Breiman and Friedman 1985)
    • redundancy analysis (Stewart and Love 1968) with variable transformations (Israels 1984)
    • canonical correlation analysis with variable transformations (van der Burg and de Leeuw 1983)
    • response surface regression (Meyers 1976; Khuri and Cornell 1987) with variable transformations
  • enables you to use a data set that can contain variables measured on nominal, ordinal, interval, and ratio scales; you can specify any mix of these variable types for the dependent and independent variables
  • transform nominal variables by scoring the categories to minimize squared error (Fisher 1938), or treat nominal variables as classification variables
  • enables you to transform ordinal variables by monotonically scoring the ordered categories so that order is weakly preserved (adjacent categories can be merged) and squared error is minimized. Ties can be optimally untied or left tied (Kruskal 1964). Ordinal variables can also be transformed to ranks.
  • enables you to transform interval and ratio scale of measurement variables linearly or nonlinearly with spline (de Boor 1978; van Rijckevorsel 1982), monotone spline (Winsberg and Ramsay 1980), penalized B-spline (Eilers and Marx 1996), smooth (Reinsch 1967), or Box-Cox (Box and Cox 1964) transformations. In addition, logarithmic, exponential, power, logit, and inverse trigonometric sine transformations are available.
  • fits a curve through a scatter plot or fit multiple curves, one for each level of a classification variable
  • enables you to constrain the functions to be parallel or monotone or have the same intercept
  • enables you to code experimental designs and classification variables prior to their use in other analyses
  • perform sweighted estimation
  • generates output data sets including
    • ANOVA results
    • regression tables
    • conjoint analysis part-worth utilities
    • coefficients
    • marginal means
    • original and transformed variables, predicted values, residuals, scores, and more
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • automatically creates graphs by using ODS Graphics
For further details, see TRANSREG Procedure