Introduction to Statistical Modeling with SAS/STAT Software: Important Linear Algebra Concepts

Important Linear Algebra Concepts

A matrix $\text{[math]}$ is a rectangular array of numbers. The order of a matrix with $\text{[math]}$ rows and $\text{[math]}$ columns is $\text{[math]}$ . The element in row $\text{[math]}$ , column $\text{[math]}$ of $\text{[math]}$ is denoted as $\text{[math]}$ , and the notation $\text{[math]}$ is sometimes used to refer to the two-dimensional row-column array

$\text{[math]}$

A vector is a one-dimensional array of numbers. A column vector has a single column ( $\text{[math]}$ ). A row vector has a single row ( $\text{[math]}$ ). A scalar is a matrix of order $\text{[math]}$ —that is, a single number. A square matrix has the same row and column order, $\text{[math]}$ . A diagonal matrix is a square matrix where all off-diagonal elements are zero, $\text{[math]}$ if $\text{[math]}$ . The identity matrix $\text{[math]}$ is a diagonal matrix with $\text{[math]}$ for all $\text{[math]}$ . The unit vector $\text{[math]}$ is a vector where all elements are $\text{[math]}$ . The unit matrix $\text{[math]}$ is a matrix of all $\text{[math]}$ s. Similarly, the elements of the null vector and the null matrix are all $\text{[math]}$ .

Basic matrix operations are as follows:

Addition

If $\text{[math]}$ and $\text{[math]}$ are of the same order, then $\text{[math]}$ is the matrix of elementwise sums,

$\text{[math]}$

Subtraction

If $\text{[math]}$ and $\text{[math]}$ are of the same order, then $\text{[math]}$ is the matrix of elementwise differences,

$\text{[math]}$

Dot product

The dot product of two $\text{[math]}$ -vectors $\text{[math]}$ and $\text{[math]}$ is the sum of their elementwise products,

$\text{[math]}$

The dot product is also known as the inner product of $\text{[math]}$ and $\text{[math]}$ . Two vectors are said to be orthogonal if their dot product is zero.

Multiplication

Matrices $\text{[math]}$ and $\text{[math]}$ are said to be conformable for $\text{[math]}$ multiplication if the number of columns in $\text{[math]}$ equals the number of rows in $\text{[math]}$ . Suppose that $\text{[math]}$ is of order $\text{[math]}$ and that $\text{[math]}$ is of order $\text{[math]}$ . The product $\text{[math]}$ is then defined as the $\text{[math]}$ matrix of the dot products of the $\text{[math]}$ th row of $\text{[math]}$ and the $\text{[math]}$ th column of $\text{[math]}$ ,

$\text{[math]}$

Transposition

The transpose of the $\text{[math]}$ matrix $\text{[math]}$ is denoted as $\text{[math]}$ or $\text{[math]}$ or $\text{[math]}$ and is obtained by interchanging the rows and columns,

$\text{[math]}$

A symmetric matrix is equal to its transpose, $\text{[math]}$ . The inner product of two $\text{[math]}$ column vectors $\text{[math]}$ and $\text{[math]}$ is $\text{[math]}$ .

Matrix Inversion

Regular Inverses

The right inverse of a matrix $\text{[math]}$ is the matrix that yields the identity when $\text{[math]}$ is postmultiplied by it. Similarly, the left inverse of $\text{[math]}$ yields the identity if $\text{[math]}$ is premultiplied by it. $\text{[math]}$ is said to be invertible and $\text{[math]}$ is said to be the inverse of $\text{[math]}$ , if $\text{[math]}$ is its right and left inverse, $\text{[math]}$ . This requires $\text{[math]}$ to be square and nonsingular. The inverse of a matrix $\text{[math]}$ is commonly denoted as $\text{[math]}$ . The following results are useful in manipulating inverse matrices (assuming both $\text{[math]}$ and $\text{[math]}$ are invertible):

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

If $\text{[math]}$ is a diagonal matrix with nonzero entries on the diagonal—that is, $\text{[math]}$ —then $\text{[math]}$ . If $\text{[math]}$ is a block-diagonal matrix whose blocks are invertible, then

$\text{[math]}$

In statistical applications the following two results are particularly important, because they can significantly reduce the computational burden in working with inverse matrices.

Partitioned Matrix

Suppose $\text{[math]}$ is a nonsingular matrix that is partitioned as

$\text{[math]}$

Then, provided that all the inverses exist, the inverse of $\text{[math]}$ is given by

$\text{[math]}$

where $\text{[math]}$ , $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ .

Patterned Sum

Suppose $\text{[math]}$ is $\text{[math]}$ nonsingular, $\text{[math]}$ is $\text{[math]}$ nonsingular, and $\text{[math]}$ and $\text{[math]}$ are $\text{[math]}$ and $\text{[math]}$ matrices, respectively. Then the inverse of $\text{[math]}$ is given by

$\text{[math]}$

This formula is particularly useful if $\text{[math]}$ and $\text{[math]}$ has a simple form that is easy to invert. This case arises, for example, in mixed models where $\text{[math]}$ might be a diagonal or block-diagonal matrix, and $\text{[math]}$ .

Another situation where this formula plays a critical role is in the computation of regression diagnostics, such as in determining the effect of removing an observation from the analysis. Suppose that $\text{[math]}$ represents the crossproduct matrix in the linear model $\text{[math]}$ . If $\text{[math]}$ is the $\text{[math]}$ th row of the $\text{[math]}$ matrix, then $\text{[math]}$ is the crossproduct matrix in the same model with the $\text{[math]}$ th observation removed. Identifying $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ in the preceding inversion formula, you can obtain the expression for the inverse of the crossproduct matrix:

$\text{[math]}$

This expression for the inverse of the reduced data crossproduct matrix enables you to compute "leave-one-out" deletion diagnostics in linear models without refitting the model.

Generalized Inverse Matrices

If $\text{[math]}$ is rectangular (not square) or singular, then it is not invertible and the matrix $\text{[math]}$ does not exist. Suppose you want to find a solution to simultaneous linear equations of the form

$\text{[math]}$

If $\text{[math]}$ is square and nonsingular, then the unique solution is $\text{[math]}$ . In statistical applications, the case where $\text{[math]}$ is $\text{[math]}$ rectangular is less important than the case where $\text{[math]}$ is a $\text{[math]}$ square matrix of rank less than $\text{[math]}$ . For example, the normal equations in ordinary least squares (OLS) estimation in the model $\text{[math]}$ are

$\text{[math]}$

A generalized inverse matrix is a matrix $\text{[math]}$ such that $\text{[math]}$ is a solution to the linear system. In the OLS example, a solution can be found as $\text{[math]}$ , where $\text{[math]}$ is a generalized inverse of $\text{[math]}$ .

The following four conditions are often associated with generalized inverses. For the square or rectangular matrix $\text{[math]}$ there exist matrices $\text{[math]}$ that satisfy

$\text{[math]}$

The matrix $\text{[math]}$ that satisfies all four conditions is unique and is called the Moore-Penrose inverse, after the first published work on generalized inverses by Moore (1920) and the subsequent definition by Penrose (1955). Only the first condition is required, however, to provide a solution to the linear system above.

Pringle and Rayner (1971) introduced a numbering system to distinguish between different types of generalized inverses. A matrix that satisfies only condition (i) is a $\text{[math]}$ -inverse. The $\text{[math]}$ -inverse satistifes conditions (i) and (ii). It is also called a reflexive generalized inverse. Matrices satisfying conditions (i)–(iii) or conditions (i), (ii), and (iv) are $\text{[math]}$ -inverses. Note that a matrix that satisfies the first three conditions is a right generalized inverse, and a matrix that satisfies conditions (i), (ii), and (iv) is a left generalized inverse. For example, if $\text{[math]}$ is $\text{[math]}$ of rank $\text{[math]}$ , then $\text{[math]}$ is a left generalized inverse of $\text{[math]}$ . The notation $\text{[math]}$ -inverse for the Moore-Penrose inverse, satisfying conditions (i)–(iv), is often used by extension, but note that Pringle and Rayner (1971) do not use it; rather, they call such a matrix "the" generalized inverse.

If the $\text{[math]}$ matrix $\text{[math]}$ is rank-deficient—that is, $\text{[math]}$ —then the system of equations

$\text{[math]}$

does not have a unique solution. A particular solution depends on the choice of the generalized inverse. However, some aspects of the statistical inference are invariant to the choice of the generalized inverse. If $\text{[math]}$ is a generalized inverse of $\text{[math]}$ , then $\text{[math]}$ is invariant to the choice of $\text{[math]}$ . This result comes into play, for example, when you are computing predictions in an OLS model with a rank-deficient $\text{[math]}$ matrix, since it implies that the predicted values

$\text{[math]}$

are invariant to the choice of $\text{[math]}$ .

Matrix Differentiation

Taking the derivative of expressions involving matrices is a frequent task in statistical estimation. Objective functions that are to be minimized or maximized are usually written in terms of model matrices and/or vectors whose elements depend on the unknowns of the estimation problem. Suppose that $\text{[math]}$ and $\text{[math]}$ are real matrices whose elements depend on the scalar quantities $\text{[math]}$ and $\text{[math]}$ —that is, $\text{[math]}$ , and similarly for $\text{[math]}$ .

The following are useful results in finding the derivative of elements of a matrix and of functions involving a matrix. For more in-depth discussion of matrix differentiation and matrix calculus, see, for example, Magnus and Neudecker (1999) and Harville (1997).

The derivative of $\text{[math]}$ with respect to $\text{[math]}$ is denoted $\text{[math]}$ and is the matrix of the first derivatives of the elements of $\text{[math]}$ :

$\text{[math]}$

Similarly, the second derivative of $\text{[math]}$ with respect to $\text{[math]}$ and $\text{[math]}$ is the matrix of the second derivatives

$\text{[math]}$

The following are some basic results involving sums, products, and traces of matrices:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

The next set of results is useful in finding the derivative of elements of $\text{[math]}$ and of functions of $\text{[math]}$ , if $\text{[math]}$ is a nonsingular matrix:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Now suppose that $\text{[math]}$ and $\text{[math]}$ are column vectors that depend on $\text{[math]}$ and/or $\text{[math]}$ and that $\text{[math]}$ is a vector of constants. The following results are useful for manipulating derivatives of linear and quadratic forms:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Matrix Decompositions

To decompose a matrix is to express it as a function—typically a product—of other matrices that have particular properties such as orthogonality, diagonality, triangularity. For example, the Cholesky decomposition of a symmetric positive definite matrix $\text{[math]}$ is $\text{[math]}$ , where $\text{[math]}$ is a lower-triangular matrix. The spectral decomposition of a symmetric matrix is $\text{[math]}$ , where $\text{[math]}$ is a diagonal matrix and $\text{[math]}$ is an orthogonal matrix.

Matrix decomposition play an important role in statistical theory as well as in statistical computations. Calculations in terms of decompositions can have greater numerical stability. Decompositions are often necessary to extract information about matrices, such as matrix rank, eigenvalues, or eigenvectors. Decompositions are also used to form special transformations of matrices, such as to form a "square-root" matrix. This section briefly mentions several decompositions that are particularly prevalent and important.

LDU, LU, and Cholesky Decomposition

Every square matrix $\text{[math]}$ , whether it is positive definite or not, can be expressed in the form $\text{[math]}$ , where $\text{[math]}$ is a unit lower-triangular matrix, $\text{[math]}$ is a diagonal matrix, and $\text{[math]}$ is a unit upper-triangular matrix. (The diagonal elements of a unit triangular matrix are 1.) Because of the arrangement of the matrices, the decomposition is called the LDU decomposition. Since you can absorb the diagonal matrix into the triangular matrices, the decomposition

$\text{[math]}$

is also referred to as the LU decomposition of $\text{[math]}$ .

If the matrix $\text{[math]}$ is positive definite, then the diagonal elements of $\text{[math]}$ are positive and the LDU decomposition is unique. Furthermore, we can add more specificity to this result in that for a symmetric, positive definite matrix, there is a unique decomposition $\text{[math]}$ , where $\text{[math]}$ is unit upper-triangular and $\text{[math]}$ is diagonal with positive elements. Absorbing the square root of $\text{[math]}$ into $\text{[math]}$ , $\text{[math]}$ , the decomposition is known as the Cholesky decomposition of a positive-definite matrix:

$\text{[math]}$

where $\text{[math]}$ is upper triangular.

If $\text{[math]}$ is $\text{[math]}$ symmetric nonnegative definite of rank $\text{[math]}$ , then we can extend the Cholesky decomposition as follows. Let $\text{[math]}$ denote the lower-triangular matrix such that

$\text{[math]}$

Then $\text{[math]}$ .

Spectral Decomposition

Suppose that $\text{[math]}$ is an $\text{[math]}$ symmetric matrix. Then there exists an orthogonal matrix $\text{[math]}$ and a diagonal matrix $\text{[math]}$ such that $\text{[math]}$ . Of particular importance is the case where the orthogonal matrix is also orthonormal—that is, its column vectors have unit norm. Denote this orthonormal matrix as $\text{[math]}$ . Then the corresponding diagonal matrix— $\text{[math]}$ , say—contains the eigenvalues of $\text{[math]}$ . The spectral decomposition of $\text{[math]}$ can be written as

$\text{[math]}$

where $\text{[math]}$ denotes the $\text{[math]}$ th column vector of $\text{[math]}$ . The right-side expression decomposes $\text{[math]}$ into a sum of rank-1 matrices, and the weight of each contribution is equal to the eigenvalue associated with the $\text{[math]}$ th eigenvector. The sum furthermore emphasizes that the rank of $\text{[math]}$ is equal to the number of nonzero eigenvalues.

Harville (1997, p. 538) refers to the spectral decomposition of $\text{[math]}$ as the decomposition that takes the previous sum one step further and accumulates contributions associated with the distinct eigenvalues. If $\text{[math]}$ are the distinct eigenvalues and $\text{[math]}$ , where the sum is taken over the set of columns for which $\text{[math]}$ , then

$\text{[math]}$

You can employ the spectral decomposition of a nonnegative definite symmetric matrix to form a "square-root" matrix of $\text{[math]}$ . Suppose that $\text{[math]}$ is the diagonal matrix containing the square roots of the $\text{[math]}$ . Then $\text{[math]}$ is a square-root matrix of $\text{[math]}$ in the sense that $\text{[math]}$ , because

$\text{[math]}$

Generating the Moore-Penrose inverse of a matrix based on the spectral decomposition is also simple. Denote as $\text{[math]}$ the diagonal matrix with typical element

$\text{[math]}$

Then the matrix $\text{[math]}$ is the Moore-Penrose ( $\text{[math]}$ -generalized) inverse of $\text{[math]}$ .

Singular-Value Decomposition

The singular-value decomposition is related to the spectral decomposition of a matrix, but it is more general. The singular-value decomposition can be applied to any matrix. Let $\text{[math]}$ be an $\text{[math]}$ matrix of rank $\text{[math]}$ . Then there exist orthogonal matrices $\text{[math]}$ and $\text{[math]}$ of order $\text{[math]}$ and $\text{[math]}$ , respectively, and a diagonal matrix $\text{[math]}$ such that

$\text{[math]}$

where $\text{[math]}$ is a diagonal matrix of order $\text{[math]}$ . The diagonal elements of $\text{[math]}$ are strictly positive. As with the spectral decomposition, this result can be written as a decomposition of $\text{[math]}$ into a weighted sum of rank-1 matrices

$\text{[math]}$

The scalars $\text{[math]}$ are called the singular values of the matrix $\text{[math]}$ . They are the positive square roots of the nonzero eigenvalues of the matrix $\text{[math]}$ . If the singular-value decomposition is applied to a symmetric, nonnegative definite matrix $\text{[math]}$ , then the singular values $\text{[math]}$ are the nonzero eigenvalues of $\text{[math]}$ and the singular-value decomposition is the same as the spectral decomposition.

As with the spectral decomposition, you can use the results of the singular-value decomposition to generate the Moore-Penrose inverse of a matrix. If $\text{[math]}$ is $\text{[math]}$ with singular-value decomposition $\text{[math]}$ , and if $\text{[math]}$ is a diagonal matrix with typical element

$\text{[math]}$

then $\text{[math]}$ is the $\text{[math]}$ -generalized inverse of $\text{[math]}$ .