Introduction to Regression Procedures

Linear Regression: The REG Procedure

PROC REG is a general-purpose procedure for linear regression that does the following:

  • handles simple and multiple regression models

  • provides nine model selection methods

  • allows interactive changes both in the model and in the data that are used to fit the model

  • allows linear equality restrictions on parameters

  • tests linear hypotheses and multivariate hypotheses

  • produces collinearity diagnostics, influence diagnostics, and partial regression leverage plots

  • saves estimates, predicted values, residuals, confidence limits, and other diagnostic statistics in output SAS data sets

  • generates plots of fit, of data, and of various statistics

  • uses data, correlations, or crossproducts for input

Regression with the REG and GLM Procedures

The REG and GLM procedures are closely related; they make the same assumptions about the basic model and use the same estimation principles. Both procedures estimate parameters by ordinary or weighted least squares and assume homoscedastic, uncorrelated model errors with zero mean. An assumption of normality of the model errors is not necessary for parameter estimation, but it is implied in confirmatory inference based on the parameter estimates—that is, the computation of tests, p-values, and confidence and prediction intervals.

PROC GLM provides a CLASS statement for the levelization of classification variables; see the section Parameterization of Model Effects in Chapter 19: Shared Concepts and Topics, on the parameterization of classification variables in statistical models. In most cases, you should directly use PROC GLM, PROC GLMSELECT, or some other procedure when you fit models that have classification variables. However, you can fit models that have classification variables in PROC REG by first coding the classification variables by using PROC GLMSELECT, PROC TRANSREG, PROC GLMMOD, or some other method, and then including the coded variables in the MODEL statement in PROC REG.

Most of the statistics based on predicted and residual values that are available in PROC REG are also available in PROC GLM. However, PROC REG provides more diagnostic information. In addition, PROC GLM allows only one model and does not provide model selection.

Both PROC REG and PROC GLM are interactive, in that they do not stop after processing a RUN statement. Both procedures accept statements until a QUIT statement is submitted. For more information about interactive procedures, see the section Interactive Features in the CATMOD, GLM, and REG Procedures.