Tutorial: A Module for Linear Regression


Overview of Linear Regression

You can use SAS/IML software to solve mathematical problems or implement new statistical techniques and algorithms. Formulas and matrix equations are easily translated in the SAS/IML language. For example, if $\bX $ is a data matrix and $\bY $ is a vector of observed responses, then you might be interested in the solution, $\bm {b}$, to the matrix equation $\bX \bm {b}=\bY $. In statistics, the data matrices that arise often have more rows than columns and so an exact solution to the linear system is impossible to find. Instead, the statistician often solves a related equation: $\bX ^{\prime }\bX \bm {b} = \bX ^{\prime }\bY $. The following mathematical formula expresses the solution vector in terms of the data matrix and the observed responses:

\[ \bm {b} = (\bX ^{\prime }\bX )^{-1} \bX ^{\prime }\bY \]

This mathematical formula can be translated into the following SAS/IML statement:

b = inv(X`*X) * X`*Y;      /* least squares estimates */

This assignment statement uses a built-in function (INV) and matrix operators (transpose and matrix multiplication). It is mathematically equivalent to (but less efficient than) the following alternative statement:

b = solve(X`*X, X`*Y);    /* more efficient computation */

If a statistical method has not been implemented directly in a SAS procedure, you can program it by using the SAS/IML language. The most commonly used mathematical and matrix operations are built directly into the language, so programs that require many statements in other languages require only a few SAS/IML statements.