PROC MIXED: Residuals and Influence Diagnostics :: SAS/STAT(R) 9.22 User's Guide

The MIXED Procedure

Residuals and Influence Diagnostics

Residual Diagnostics

Consider a residual vector of the form $\text{[math]}$ , where $\text{[math]}$ is a projection matrix, possibly an oblique projector. A typical element $\text{[math]}$ with variance $\text{[math]}$ and estimated variance $\text{[math]}$ is said to be standardized as

$\text{[math]}$

and studentized as

$\text{[math]}$

External studentization uses an estimate of $\text{[math]}$ that does not involve the $\text{[math]}$ th observation. Externally studentized residuals are often preferred over internally studentized residuals because they have well-known distributional properties in standard linear models for independent data.

Residuals that are scaled by the estimated variance of the response, i.e., $\text{[math]}$ , are referred to as Pearson-type residuals.

Marginal and Conditional Residuals

The marginal and conditional means in the linear mixed model are $\text{[math]}$ and $\text{[math]}$ , respectively. Accordingly, the vector $\text{[math]}$ of marginal residuals is defined as

$\text{[math]}$

and the vector $\text{[math]}$ of conditional residuals is

$\text{[math]}$

Following Gregoire, Schabenberger, and Barrett (1995), let $\text{[math]}$ and $\text{[math]}$ . Then

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

For an individual observation the raw, studentized, and Pearson-type residuals computed by the MIXED procedure are given in Table 56.21.

Table 56.21 Residual Types Computed by the MIXED Procedure
Type of Residual	Marginal	Conditional
Raw	$\text{[math]}$	$\text{[math]}$
Studentized	$\text{[math]}$	$\text{[math]}$
Pearson	$\text{[math]}$	$\text{[math]}$

When the OUTPM= option is specified in addition to the RESIDUAL option in the MODEL statement, $\text{[math]}$ and $\text{[math]}$ are added to the data set as variables Resid, StudentResid, and PearsonResid, respectively. When the OUTP= option is specified, $\text{[math]}$ and $\text{[math]}$ are added to the data set. Raw residuals are part of the OUTPM= and OUTP= data sets without the RESIDUAL option.

Scaled Residuals

For correlated data, a set of scaled quantities can be defined through the Cholesky decomposition of the variance-covariance matrix. Since fitted residuals in linear models are rank-deficient, it is customary to draw on the variance-covariance matrix of the data. If $\text{[math]}$ and $\text{[math]}$ , then $\text{[math]}$ has uniform dispersion and its elements are uncorrelated.

Scaled residuals in a mixed model are meaningful for quantities based on the marginal distribution of the data. Let $\text{[math]}$ denote the Cholesky root of $\text{[math]}$ , so that $\text{[math]}$ , and define

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

By analogy with other scalings, the inverse Cholesky decomposition can also be applied to the residual vector, $\text{[math]}$ , although $\text{[math]}$ is not the variance-covariance matrix of $\text{[math]}$ .

To diagnose whether the covariance structure of the model has been specified correctly can be difficult based on $\text{[math]}$ , since the inverse Cholesky transformation affects the expected value of $\text{[math]}$ . You can draw on $\text{[math]}$ as a vector of (approximately) uncorrelated data with constant mean.

When the OUTPM= option in the MODEL statement is specified in addition to the VCIRY option, $\text{[math]}$ is added as variable ScaledDep and $\text{[math]}$ is added as ScaledResid to the data set.

Influence Diagnostics

Basic Idea and Statistics

The general idea of quantifying the influence of one or more observations relies on computing parameter estimates based on all data points, removing the cases in question from the data, refitting the model, and computing statistics based on the change between full-data and reduced-data estimation. Influence statistics can be coarsely grouped by the aspect of estimation that is their primary target:

overall measures compare changes in objective functions: (restricted) likelihood distance (Cook and Weisberg 1982, Ch. 5.2)
influence on parameter estimates: Cook’s $\text{[math]}$ (Cook 1977, 1979), MDFFITS (Belsley, Kuh, and Welsch 1980, p. 32)
influence on precision of estimates: CovRatio and CovTrace
influence on fitted and predicted values: PRESS residual, PRESS statistic (Allen 1974), DFFITS (Belsley, Kuh, and Welsch 1980, p. 15)
outlier properties: internally and externally studentized residuals, leverage

For linear models for uncorrelated data, it is not necessary to refit the model after removing a data point in order to measure the impact of an observation on the model. The change in fixed effect estimates, residuals, residual sums of squares, and the variance-covariance matrix of the fixed effects can be computed based on the fit to the full data alone. By contrast, in mixed models several important complications arise. Data points can affect not only the fixed effects but also the covariance parameter estimates on which the fixed-effects estimates depend. Furthermore, closed-form expressions for computing the change in important model quantities might not be available.

This section provides background material for the various influence diagnostics available with the MIXED procedure. See the section Mixed Models Theory for relevant expressions and definitions. The parameter vector $\text{[math]}$ denotes all unknown parameters in the $\text{[math]}$ and $\text{[math]}$ matrix.

The observations whose influence is being ascertained are represented by the set $\text{[math]}$ and referred to simply as "the observations in $\text{[math]}$ ." The estimate of a parameter vector, such as $\text{[math]}$ , obtained from all observations except those in the set $\text{[math]}$ is denoted $\text{[math]}$ . In case of a matrix $\text{[math]}$ , the notation $\text{[math]}$ represents the matrix with the rows in $\text{[math]}$ removed; these rows are collected in $\text{[math]}$ . If $\text{[math]}$ is symmetric, then notation $\text{[math]}$ implies removal of rows and columns. The vector $\text{[math]}$ comprises the responses of the data points being removed, and $\text{[math]}$ is the variance-covariance matrix of the remaining observations. When $\text{[math]}$ , lowercase notation emphasizes that single points are removed, such as $\text{[math]}$ .

Managing the Covariance Parameters

An important component of influence diagnostics in the mixed model is the estimated variance-covariance matrix $\text{[math]}$ . To make the dependence on the vector of covariance parameters explicit, write it as $\text{[math]}$ . If one parameter, $\text{[math]}$ , is profiled or factored out of $\text{[math]}$ , the remaining parameters are denoted as $\text{[math]}$ . Notice that in a model where $\text{[math]}$ is diagonal and $\text{[math]}$ , the parameter vector $\text{[math]}$ contains the ratios of each variance component and $\text{[math]}$ (see Wolfinger, Tobias, and Sall 1994). When ITER=0, two scenarios are distinguished:

If the residual variance is not profiled, either because the model does not contain a residual variance or because it is part of the Newton-Raphson iterations, then $\text{[math]}$ .
If the residual variance is profiled, then $\text{[math]}$ and $\text{[math]}$ . Influence statistics such as Cook’s $\text{[math]}$ and internally studentized residuals are based on $\text{[math]}$ , whereas externally studentized residuals and the DFFITS statistic are based on $\text{[math]}$ . In a random components model with uncorrelated errors, for example, the computation of $\text{[math]}$ involves scaling of $\text{[math]}$ and $\text{[math]}$ by the full-data estimate $\text{[math]}$ and multiplying the result with the reduced-data estimate $\text{[math]}$ .

Certain statistics, such as MDFFITS, CovRatio, and CovTrace, require an estimate of the variance of the fixed effects that is based on the reduced number of observations. For example, $\text{[math]}$ is evaluated at the reduced-data parameter estimates but computed for the entire data set. The matrix $\text{[math]}$ , on the other hand, has rows and columns corresponding to the points in $\text{[math]}$ removed. The resulting matrix is evaluated at the delete-case estimates.

When influence analysis is iterative, the entire vector $\text{[math]}$ is updated, whether the residual variance is profiled or not. The matrices to be distinguished here are $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ , with unambiguous notation.

Predicted Values, PRESS Residual, and PRESS Statistic

An unconditional predicted value is $\text{[math]}$ , where the vector $\text{[math]}$ is the $\text{[math]}$ th row of $\text{[math]}$ . The (raw) residual is given as $\text{[math]}$ , and the PRESS residual is

$\text{[math]}$

The PRESS statistic is the sum of the squared PRESS residuals,

$\text{[math]}$

where the sum is over the observations in $\text{[math]}$ .

If EFFECT=, SIZE=, or KEEP= is not specified, PROC MIXED computes the PRESS residual for each observation selected through SELECT= (or all observations if SELECT= is not given). If EFFECT=, SIZE=, or KEEP= is specified, the procedure computes $\text{[math]}$ .

Leverage

For the general mixed model, leverage can be defined through the projection matrix that results from a transformation of the model with the inverse of the Cholesky decomposition of $\text{[math]}$ , or through an oblique projector. The MIXED procedure follows the latter path in the computation of influence diagnostics. The leverage value reported for the $\text{[math]}$ th observation is the $\text{[math]}$ th diagonal entry of the matrix

$\text{[math]}$

which is the weight of the observation in contributing to its own predicted value, $\text{[math]}$ .

While $\text{[math]}$ is idempotent, it is generally not symmetric and thus not a projection matrix in the narrow sense.

The properties of these leverages are generalizations of the properties in models with diagonal variance-covariance matrices. For example, $\text{[math]}$ , and in a model with intercept and $\text{[math]}$ , the leverage values

$\text{[math]}$

are $\text{[math]}$ and $\text{[math]}$ . The lower bound for $\text{[math]}$ is achieved in an intercept-only model, and the upper bound is achieved in a saturated model. The trace of $\text{[math]}$ equals the rank of $\text{[math]}$ .

If $\text{[math]}$ denotes the element in row $\text{[math]}$ , column $\text{[math]}$ of $\text{[math]}$ , then for a model containing only an intercept the diagonal elements of $\text{[math]}$ are

$\text{[math]}$

Because $\text{[math]}$ is a sum of elements in the $\text{[math]}$ th row of the inverse variance-covariance matrix, $\text{[math]}$ can be negative, even if the correlations among data points are nonnegative. In case of a saturated model with $\text{[math]}$ , $\text{[math]}$ .

Internally and Externally Studentized Residuals

See the section Residual Diagnostics for the distinction between standardization, studentization, and scaling of residuals. Internally studentized marginal and conditional residuals are computed with the RESIDUAL option of the MODEL statement. The INFLUENCE option computes internally and externally studentized marginal residuals.

The computation of internally studentized residuals relies on the diagonal entries of $\text{[math]}$ , where $\text{[math]}$ . Externally studentized residuals require iterative influence analysis or a profiled residual variance. In the former case the studentization is based on $\text{[math]}$ ; in the latter case it is based on $\text{[math]}$ .

Cook’s $\text{[math]}$

Cook’s $\text{[math]}$ statistic is an invariant norm that measures the influence of observations in $\text{[math]}$ on a vector of parameter estimates (Cook 1977). In case of the fixed-effects coefficients, let

$\text{[math]}$

Then the MIXED procedure computes

$\text{[math]}$

where $\text{[math]}$ is the matrix that results from sweeping $\text{[math]}$ .

If $\text{[math]}$ is known, Cook’s $\text{[math]}$ can be calibrated according to a chi-square distribution with degrees of freedom equal to the rank of $\text{[math]}$ (Christensen, Pearson, and Johnson 1992). For estimated $\text{[math]}$ the calibration can be carried out according to an $\text{[math]}$ distribution. To interpret $\text{[math]}$ on a familiar scale, Cook (1979) and Cook and Weisberg (1982, p. 116) refer to the 50th percentile of the reference distribution. If $\text{[math]}$ is equal to that percentile, then removing the points in $\text{[math]}$ moves the fixed-effects coefficient vector from the center of the confidence region to the 50% confidence ellipsoid (Myers 1990, p. 262).

In the case of iterative influence analysis, the MIXED procedure also computes a $\text{[math]}$ -type statistic for the covariance parameters. If $\text{[math]}$ is the asymptotic variance-covariance matrix of $\text{[math]}$ , then MIXED computes

$\text{[math]}$

DFFITS and MDFFITS

A DFFIT measures the change in predicted values due to removal of data points. If this change is standardized by the externally estimated standard error of the predicted value in the full data, the DFFITS statistic of Belsley, Kuh, and Welsch (1980, p. 15) results:

$\text{[math]}$

The MIXED procedure computes DFFITS when the EFFECT= or SIZE= modifier of the INFLUENCE option is not in effect. In general, an external estimate of the estimated standard error is used. When ITER > 0, the estimate is

$\text{[math]}$

When ITER=0 and $\text{[math]}$ is profiled, then

$\text{[math]}$

When the EFFECT=, SIZE=, or KEEP= modifier is specified, the MIXED procedure computes a multivariate version suitable for the deletion of multiple data points. The statistic, termed MDFFITS after the MDFFIT statistic of Belsley, Kuh, and Welsch (1980, p. 32), is closely related to Cook’s $\text{[math]}$ . Consider the case $\text{[math]}$ so that

$\text{[math]}$

and let $\text{[math]}$ be an estimate of $\text{[math]}$ that does not use the observations in $\text{[math]}$ . The MDFFITS statistic is then computed as

$\text{[math]}$

If ITER=0 and $\text{[math]}$ is profiled, then $\text{[math]}$ is obtained by sweeping

$\text{[math]}$

The underlying idea is that if $\text{[math]}$ were known, then

$\text{[math]}$

would be $\text{[math]}$ in a generalized least squares regression with all but the data in $\text{[math]}$ .

In the case of iterative influence analysis, $\text{[math]}$ is evaluated at $\text{[math]}$ . Furthermore, a MDFFITS-type statistic is then computed for the covariance parameters:

$\text{[math]}$

Covariance Ratio and Trace

These statistics depend on the availability of an external estimate of $\text{[math]}$ , or at least of $\text{[math]}$ . Whereas Cook’s $\text{[math]}$ and MDFFITS measure the impact of data points on a vector of parameter estimates, the covariance-based statistics measure impact on their precision. Following Christensen, Pearson, and Johnson (1992), the MIXED procedure computes

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

where $\text{[math]}$ denotes the determinant of the nonsingular part of matrix $\text{[math]}$ .

In the case of iterative influence analysis these statistics are also computed for the covariance parameter estimates. If $\text{[math]}$ denotes the rank of $\text{[math]}$ , then

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Likelihood Distances

The log-likelihood function $\text{[math]}$ and restricted log-likelihood function $\text{[math]}$ of the linear mixed model are given in the section Estimating Covariance Parameters in the Mixed Model. Denote as $\text{[math]}$ the collection of all parameters, i.e., the fixed effects $\text{[math]}$ and the covariance parameters $\text{[math]}$ . Twice the difference between the (restricted) log-likelihood evaluated at the full-data estimates $\text{[math]}$ and at the reduced-data estimates $\text{[math]}$ is known as the (restricted) likelihood distance:

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Cook and Weisberg (1982, Ch. 5.2) refer to these differences as likelihood distances, Beckman, Nachtsheim, and Cook (1987) call the measures likelihood displacements. If the number of elements in $\text{[math]}$ that are subject to updating following point removal is $\text{[math]}$ , then likelihood displacements can be compared against cutoffs from a chi-square distribution with $\text{[math]}$ degrees of freedom. Notice that this reference distribution does not depend on the number of observations removed from the analysis, but rather on the number of model parameters that are updated. The likelihood displacement gives twice the amount by which the log likelihood of the full data changes if one were to use an estimate based on fewer data points. It is thus a global, summary measure of the influence of the observations in $\text{[math]}$ jointly on all parameters.

Unless METHOD=ML, the MIXED procedure computes the likelihood displacement based on the residual (=restricted) log likelihood, even if METHOD=MIVQUE0 or METHOD=TYPE1, TYPE2, or TYPE3.

Noniterative Update Formulas

Update formulas that do not require refitting of the model are available for the cases where $\text{[math]}$ , $\text{[math]}$ is known, or $\text{[math]}$ is known. When ITER=0 and these update formulas can be invoked, the MIXED procedure uses the computational devices that are outlined in the following paragraphs. It is then assumed that the variance-covariance matrix of the fixed effects has the form $\text{[math]}$ . When DDFM=KENWARDROGER, this is not the case; the estimated variance-covariance matrix is then inflated to better represent the uncertainty in the estimated covariance parameters. Influence statistics when DDFM=KENWARDROGER should iteratively update the covariance parameters (ITER > 0). The dependence of $\text{[math]}$ on $\text{[math]}$ is suppressed in the sequel for brevity.

Updating the Fixed Effects

Denote by $\text{[math]}$ the $\text{[math]}$ matrix that is assembled from $\text{[math]}$ columns of the identity matrix. Each column of $\text{[math]}$ corresponds to the removal of one data point. The point being targeted by the $\text{[math]}$ th column of $\text{[math]}$ corresponds to the row in which a $\text{[math]}$ appears. Furthermore, define

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

The change in the fixed-effects estimates following removal of the observations in $\text{[math]}$ is

$\text{[math]}$

Using results in Cook and Weisberg (1982, A2) you can further compute

$\text{[math]}$

If $\text{[math]}$ is $\text{[math]}$ of rank $\text{[math]}$ , then $\text{[math]}$ is deficient in rank and the MIXED procedure computes needed quantities in $\text{[math]}$ by sweeping (Goodnight 1979). If the rank of the $\text{[math]}$ matrix $\text{[math]}$ is less than $\text{[math]}$ , the removal of the observations introduces a new singularity, whether $\text{[math]}$ is of full rank or not. The solution vectors $\text{[math]}$ and $\text{[math]}$ then do not have the same expected values and should not be compared. When the MIXED procedure encounters this situation, influence diagnostics that depend on the choice of generalized inverse are not computed. The procedure also monitors the singularity criteria when sweeping the rows of $\text{[math]}$ and of $\text{[math]}$ . If a new singularity is encountered or a former singularity disappears, no influence statistics are computed.

Residual Variance

When $\text{[math]}$ is profiled out of the marginal variance-covariance matrix, a closed-form estimate of $\text{[math]}$ that is based on only the remaining observations can be computed provided $\text{[math]}$ is known. Hurtado (1993, Thm. 5.2) shows that

$\text{[math]}$

and $\text{[math]}$ . In the case of maximum likelihood estimation $\text{[math]}$ and for REML estimation $\text{[math]}$ . The constant $\text{[math]}$ equals the rank of $\text{[math]}$ for REML estimation and the number of effective observations that are removed if METHOD=ML.

Likelihood Distances

For noniterative methods the following computational devices are used to compute (restricted) likelihood distances provided that the residual variance $\text{[math]}$ is profiled.

The log likelihood function $\text{[math]}$ evaluated at the full-data and reduced-data estimates can be written as

	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$

Notice that $\text{[math]}$ evaluates the log likelihood for $\text{[math]}$ data points at the reduced-data estimates. It is not the log likelihood obtained by fitting the model to the reduced data. The likelihood distance is then

$\text{[math]}$

Expressions for $\text{[math]}$ in noniterative influence analysis are derived along the same lines.

Top of Page