Introduction to Regression Procedures


Introduction

In a linear regression model, the mean of a response variable $\bY $ is a function of parameters and covariates in a statistical model. The many forms of regression models have their origin in the characteristics of the response variable (discrete or continuous, normally or nonnormally distributed), assumptions about the form of the model (linear, nonlinear, or generalized linear), assumptions about the data-generating mechanism (survey, observational, or experimental data), and estimation principles. Some models contain classification (or CLASS) variables that enter the model not through their values but through their levels. For an introduction to linear regression models, see Chapter 3: Introduction to Statistical Modeling with SAS/STAT Software. For information that is common to many of the regression procedures, see Chapter 19: Shared Concepts and Topics. The following procedures, listed in alphabetical order, perform at least one type of regression analysis.

ADAPTIVEREG

fits multivariate adaptive regression spline models. This is a nonparametric regression technique that combines both regression splines and model selection methods. PROC ADAPTIVEREG produces parsimonious models that do not overfit the data and thus have good predictive power. PROC ADAPTIVEREG supports CLASS variables. For more information, see Chapter 25: The ADAPTIVEREG Procedure.

CATMOD

analyzes data that can be represented by a contingency table. PROC CATMOD fits linear models to functions of response frequencies, and it can be used for linear and logistic regression. PROC CATMOD supports CLASS variables. For more information, see Chapter 8: Introduction to Categorical Data Analysis Procedures, and Chapter 32: The CATMOD Procedure.

GAM

fits generalized additive models. Generalized additive models are nonparametric in that the usual assumption of linear predictors is relaxed. Generalized additive models consist of additive, smooth functions of the regression variables. PROC GAM can fit additive models to nonnormal data. PROC GAM supports CLASS variables. For more information, see Chapter 41: The GAM Procedure.

GENMOD

fits generalized linear models. PROC GENMOD is especially suited for responses that have discrete outcomes, and it performs logistic regression and Poisson regression in addition to fitting generalized estimating equations for repeated measures data. PROC GENMOD supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 8: Introduction to Categorical Data Analysis Procedures, and Chapter 44: The GENMOD Procedure.

GLIMMIX

uses likelihood-based methods to fit generalized linear mixed models. PROC GLIMMIX can perform simple, multiple, polynomial, and weighted regression, in addition to many other analyses. PROC GLIMMIX can fit linear mixed models, which have random effects, and models that do not have random effects. PROC GLIMMIX supports CLASS variables. For more information, see Chapter 45: The GLIMMIX Procedure.

GLM

uses the method of least squares to fit general linear models. PROC GLM can perform simple, multiple, polynomial, and weighted regression in addition to many other analyses. PROC GLM has many of the same input/output capabilities as PROC REG, but it does not provide as many diagnostic tools or allow interactive changes in the model or data. PROC GLM supports CLASS variables. For more information, see Chapter 5: Introduction to Analysis of Variance Procedures, and Chapter 46: The GLM Procedure.

GLMSELECT

performs variable selection in the framework of general linear models. PROC GLMSELECT supports CLASS variables (like PROC GLM) and model selection (like PROC REG). A variety of model selection methods are available, including forward, backward, stepwise, LASSO, and least angle regression. PROC GLMSELECT provides a variety of selection and stopping criteria. For more information, see Chapter 49: The GLMSELECT Procedure.

LIFEREG

fits parametric models to failure-time data that might be right-censored. These types of models are commonly used in survival analysis. PROC LIFEREG supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 13: Introduction to Survival Analysis Procedures, and Chapter 69: The LIFEREG Procedure.

LOESS

uses a local regression method to fit nonparametric models. PROC LOESS is suitable for modeling regression surfaces in which the underlying parametric form is unknown and for which robustness in the presence of outliers is required. For more information, see Chapter 71: The LOESS Procedure.

LOGISTIC

fits logistic models for binomial and ordinal outcomes. PROC LOGISTIC provides a wide variety of model selection methods and computes numerous regression diagnostics. PROC LOGISTIC supports CLASS variables. For more information, see Chapter 8: Introduction to Categorical Data Analysis Procedures, and Chapter 72: The LOGISTIC Procedure.

MIXED

uses likelihood-based techniques to fit linear mixed models. PROC MIXED can perform simple, multiple, polynomial, and weighted regression, in addition to many other analyses. PROC MIXED can fit linear mixed models, which have random effects, and models that do not have random effects. PROC MIXED supports CLASS variables. For more information, see Chapter 77: The MIXED Procedure.

NLIN

uses the method of nonlinear least squares to fit general nonlinear regression models. Several different iterative methods are available. For more information, see Chapter 81: The NLIN Procedure.

NLMIXED

uses the method of maximum likelihood to fit general nonlinear mixed regression models. PROC NLMIXED enables you to specify a custom objective function for parameter estimation and to fit models with or without random effects. For more information, see Chapter 82: The NLMIXED Procedure.

ORTHOREG

uses the Gentleman-Givens computational method to perform regression. For ill-conditioned data, PROC ORTHOREG can produce more-accurate parameter estimates than procedures such as PROC GLM and PROC REG. PROC ORTHOREG supports CLASS variables. For more information, see Chapter 84: The ORTHOREG Procedure.

PHREG

fits Cox proportional hazards regression models to survival data. PROC PHREG supports CLASS variables and provides Bayesian analysis capabilities. For more information, see Chapter 13: Introduction to Survival Analysis Procedures, and Chapter 85: The PHREG Procedure.

PLS

performs partial least squares regression, principal component regression, and reduced rank regression, along with cross validation for the number of components. PROC PLS supports CLASS variables. For more information, see Chapter 88: The PLS Procedure.

PROBIT

performs probit regression in addition to logistic regression and ordinal logistic regression. PROC PROBIT is useful when the dependent variable is either dichotomous or polychotomous and the independent variables are continuous. PROC PROBIT supports CLASS variables. For more information, see Chapter 93: The PROBIT Procedure.

QUANTREG

uses quantile regression to model the effects of covariates on the conditional quantiles of a response variable. PROC QUANTREG supports CLASS variables. For more information, see Chapter 95: The QUANTREG Procedure.

QUANTSELECT

provides variable selection for quantile regression models. Selection methods include forward, backward, stepwise, and LASSO. The procedure provides a variety of selection and stopping criteria. PROC QUANTSELECT supports CLASS variables. For more information, see Chapter 96: The QUANTSELECT Procedure.

REG

performs linear regression with many diagnostic capabilities. PROC REG produces fit, residual, and diagnostic plots; heat maps; and many other types of graphs. PROC REG enables you to select models by using any one of nine methods, and you can interactively change both the regression model and the data that are used to fit the model. For more information, see Chapter 97: The REG Procedure.

ROBUSTREG

uses Huber M estimation and high breakdown value estimation to perform robust regression. PROC ROBUSTREG is suitable for detecting outliers and providing resistant (stable) results in the presence of outliers. PROC ROBUSTREG supports CLASS variables. For more information, see Chapter 98: The ROBUSTREG Procedure.

RSREG

builds quadratic response-surface regression models. PROC RSREG analyzes the fitted response surface to determine the factor levels of optimum response and performs a ridge analysis to search for the region of optimum response. For more information, see Chapter 99: The RSREG Procedure.

SURVEYLOGISTIC

uses the method of maximum likelihood to fit logistic models for binary and ordinal outcomes to survey data. PROC SURVEYLOGISTIC supports CLASS variables. For more information, see Chapter 14: Introduction to Survey Sampling and Analysis Procedures, and Chapter 111: The SURVEYLOGISTIC Procedure.

SURVEYPHREG

fits proportional hazards models for survey data by maximizing a partial pseudo-likelihood function that incorporates the sampling weights. The SURVEYPHREG procedure provides design-based variance estimates, confidence intervals, and tests for the estimated proportional hazards regression coefficients. PROC SURVEYPHREG supports CLASS variables. For more information, see Chapter 14: Introduction to Survey Sampling and Analysis Procedures, Chapter 13: Introduction to Survival Analysis Procedures, and Chapter 113: The SURVEYPHREG Procedure.

SURVEYREG

uses elementwise regression to fit linear regression models to survey data by generalized least squares. PROC SURVEYREG supports CLASS variables. For more information, see Chapter 14: Introduction to Survey Sampling and Analysis Procedures, and Chapter 114: The SURVEYREG Procedure.

TPSPLINE

uses penalized least squares to fit nonparametric regression models. PROC TPSPLINE makes no assumptions of a parametric form for the model. For more information, see Chapter 116: The TPSPLINE Procedure.

TRANSREG

fits univariate and multivariate linear models, optionally with spline, Box-Cox, and other nonlinear transformations. Models include regression and ANOVA, conjoint analysis, preference mapping, redundancy analysis, canonical correlation, and penalized B-spline regression. PROC TRANSREG supports CLASS variables. For more information, see Chapter 117: The TRANSREG Procedure.

Several SAS/ETS procedures also perform regression. The following procedures are documented in the SAS/ETS User's Guide:

ARIMA

uses autoregressive moving-average errors to perform multiple regression analysis. For more information, see Chapter 8: The ARIMA Procedure in SAS/ETS 14.1 User's Guide.

AUTOREG

implements regression models that use time series data in which the errors are autocorrelated. For more information, see Chapter 9: The AUTOREG Procedure in SAS/ETS 14.1 User's Guide.

COUNTREG

analyzes regression models in which the dependent variable takes nonnegative integer or count values. For more information, see Chapter 12: The COUNTREG Procedure in SAS/ETS 14.1 User's Guide.

MDC

fits conditional logit, mixed logit, heteroscedastic extreme value, nested logit, and multinomial probit models to discrete choice data. For more information, see Chapter 25: The MDC Procedure in SAS/ETS 14.1 User's Guide.

MODEL

handles nonlinear simultaneous systems of equations, such as econometric models. For more information, see Chapter 26: The MODEL Procedure in SAS/ETS 14.1 User's Guide.

PANEL

analyzes a class of linear econometric models that commonly arise when time series and cross-sectional data are combined. For more information, see Chapter 27: The PANEL Procedure in SAS/ETS 14.1 User's Guide.

PDLREG

fits polynomial distributed lag regression models. For more information, see Chapter 28: The PDLREG Procedure in SAS/ETS 14.1 User's Guide.

QLIM

analyzes limited dependent variable models in which dependent variables take discrete values or are observed only in a limited range of values. For more information, see Chapter 29: The QLIM Procedure in SAS/ETS 14.1 User's Guide.

SYSLIN

handles linear simultaneous systems of equations, such as econometric models. For more information, see Chapter 36: The SYSLIN Procedure in SAS/ETS 14.1 User's Guide.

VARMAX

performs multiple regression analysis for multivariate time series dependent variables by using current and past vectors of dependent and independent variables as predictors, with vector autoregressive moving-average errors, and with modeling of time-varying heteroscedasticity. For more information, see Chapter 42: The VARMAX Procedure in SAS/ETS 14.1 User's Guide.