Tutorial: A Module for Linear Regression


Overview of Linear Regression

You can use SAS/IML software to solve mathematical problems or implement new statistical techniques and algorithms. Formulas and matrix equations are easily translated in the SAS/IML language. For example, if $\bm {X}$ is a data matrix and $\bm {Y}$ is a vector of observed responses, then you might be interested in the solution, $\bm {b}$, to the matrix equation $\bm {X}\bm {b}=\bm {Y}$. In statistics, the data matrices that arise often have more rows than columns and so an exact solution to the linear system is impossible to find. Instead, the statistician often solves a related equation: $\bm {X}^{\prime }\bm {X} \bm {b} = \bm {X}^{\prime }\bm {Y}$. The following mathematical formula expresses the solution vector in terms of the data matrix and the observed responses:

\[  \bm {b} = (\bm {X}^{\prime }\bm {X})^{-1} \bm {X}^{\prime }\bm {Y}  \]

This mathematical formula can be translated into the following SAS/IML statement:

b = inv(X`*X) * X`*Y;      /* least squares estimates */

This assignment statement uses a built-in function (INV) and matrix operators (transpose and matrix multiplication). It is mathematically equivalent to (but less efficient than) the following alternative statement:

b = solve(X`*X, X`*Y);    /* more efficient computation */

If a statistical method has not been implemented directly in a SAS procedure, you can program it by using the SAS/IML language. The most commonly used mathematical and matrix operations are built directly into the language, so programs that require many statements in other languages require only a few SAS/IML statements.