Assessment of Models Based on Aggregates of Residuals |
Lin, Wei, and Ying (2002) present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates (such as covariates or linear predictors) or some related aggregates of residuals. The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and link function. You use the ASSESS|ASSESSMENT statement to perform this kind of model-checking with cumulative sums of residuals, moving sums of residuals, or LOESS smoothed residuals. See Example 39.8 and Example 39.9 for examples of model assessment.
Let the model for the mean be
where is the mean of the response and is the vector of covariates for the th observation. Denote the raw residual resulting from fitting the model as
and let be the value of the th covariate in the model for observation . Then to check the functional form of the th covariate, consider the cumulative sum of residuals with respect to ,
where is the indicator function. For any , is the sum of the residuals with values of less than or equal to .
Denote the score, or gradient vector, by
where , and
Let be the Fisher information matrix
Define
where
and are independent random variables. Then the conditional distribution of , given , under the null hypothesis that the model for the mean is correct, is the same asymptotically as as the unconditional distribution of (Lin, Wei, and Ying; 2002).
You can approximate realizations from the null hypothesis distribution of by repeatedly generating normal samples , while holding , at their observed values and computing for each sample.
You can assess the functional form of covariate by plotting a few realizations of on the same plot as the observed and visually comparing to see how typical the observed is of the null distribution samples.
You can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let be the observed value of . The -value is approximated by , where . is estimated by generating realizations of (1,000 is the default number of realizations).
You can check the link function instead of the th covariate by using values of the linear predictor in place of values of the th covariate . The graphical and numerical methods described previously are then sensitive to inadequacies in the link function.
An alternative aggregate of residuals is the moving sum statistic
If you specify the keyword WINDOW(), then the moving sum statistic with window size is used instead of the cumulative sum of residuals, with replacing in the earlier equation.
If you specify the keyword LOESS(), loess smoothed residuals are used in the preceding formulas, where is the fraction of the data to be used at a given point. If is not specified, is used. For data , define as the nearest integer to and as the th smallest among . Let
where
Define
where
Then the loess estimate of at is defined by
Loess smoothed residuals for checking the functional form of the th covariate are defined by replacing with and with . To implement the graphical and numerical assessment methods, is replaced with in the formulas for and .
You can perform the model checking described earlier for marginal models for dependent responses fit by generalized estimating equations (GEEs). Let denote the th measurement on the th cluster, , , and let denote the corresponding vector of covariates. The marginal mean of the response is assumed to depend on the covariate vector by
where is the link function.
Define the vector of residuals for the th cluster as
You use the following extension of defined earlier to check the functional form of the th covariate:
where is the th component of .
The null distribution of can be approximated by the conditional distribution of
where and are defined as in the section Generalized Estimating Equations with the unknown parameters replaced by their estimated values,
and , are independent random variables. You replace with the linear predictor in the preceding formulas to check the link function.