37107 - Comparing covariance structures in PROC MIXED

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 37107: Comparing covariance structures in PROC MIXED

When choosing a covariance structure in PROC MIXED, consider the covariance structures that are meaningful for your data and area of application. For example, when the time points at which measurements are taken are unequally spaced, and/or subjects are measured at different time points, the autoregressive (TYPE=AR(1)) structure is generally not appropriate. Guerin and Stroup (2000) provide more information on the effects of various covariance modeling decisions.

When there are several plausible covariance structures, it's desirable to choose one that best fits your data. Described below are three approaches for evaluating covariance structures — examining the fit statistics tables, constructing a likelihood ratio test, and using the COVTEST statement in PROC GLIMMIX. Comparison and selection of a covariance structure should be done before examining the fixed effects tests.

Examine the Fit Statistics Table

When examining the fit statistics, such as AICC or BIC, in the Fit Statistics table in the PROC MIXED output, smaller statistic values indicate a better fit to your data. Milliken and Johnson (1992) present a clinical trial study in which patients were randomly assigned to one of the four drugs. Heart rates were measured at four different time points following the administration of the drug. The following statements record the data and then restructure the data set to have one observation for each time point per patient.

      data heart;
        input drug $ person hr1 hr2 hr3 hr4 @@;
        cards;
      ax23 1  72 86 81 77 bww9  2 85 86 83 80 ctrl  3 69 73 72 74
      ax23 4  78 83 88 81 bww9  5 82 86 80 84 ctrl  6 66 62 67 73
      ax23 7  71 82 81 75 bww9  8 71 78 70 75 ctrl  9 84 90 88 87
      ax23 10 72 83 83 69 bww9 11 83 88 79 81 ctrl 12 80 81 77 72
      ax23 13 66 79 77 66 bww9 14 86 85 76 76 ctrl 15 72 72 69 70
      ax23 16 74 83 84 77 bww9 17 85 82 83 80 ctrl 18 65 62 65 61
      ax23 19 62 73 78 70 bww9 20 79 83 80 81 ctrl 21 75 69 69 68
      ax23 22 69 75 76 70 bww9 23 83 84 78 81 ctrl 24 71 70 65 63
      ;
      
      data heart2; 
        set heart;
        time=1; hr=hr1; output;
        time=2; hr=hr2; output;
        time=3; hr=hr3; output;
        time=4; hr=hr4; output;
        run;

Many covariance structures are reasonable for these data, such as unstructured (TYPE=UN), autoregressive (TYPE=AR(1)), compound symmetric (TYPE=CS), Toeplitz (TYPE=TOEP), etc. The unstructured covariance matrix is the most flexible since it imposes no pattern on the covariances. By using this structure and then examining the final covariance matrix for patterns characteristic of other structures, you may be able to select a simpler structure.

      proc mixed data=heart2;
        class drug person time;
        model hr=drug time time*drug / ddfm=kr;
        repeated time / type=un subject=person r;
        run;

Following are the results of the R option and the Fit Statistics Table.

Estimated R Matrix for person 1
Row	Col1	Col2	Col3	Col4
1	30.5238	28.6548	25.4881	20.0952
2	28.6548	39.2321	29.3095	25.5476
3	25.4881	29.3095	31.2321	26.7262
4	20.0952	25.5476	26.7262	32.6845

Fit Statistics
-2 Res Log Likelihood	477.4
AIC (smaller is better)	497.4
AICC (smaller is better)	500.4
BIC (smaller is better)	509.2

Based on the pattern of the covariances in the R matrix, the compound symmetric, autoregressive, and Toeplitz structures could be considered. Each of these structures is used in the following analyses. The ODS OUTPUT statement in each analysis saves the Fit Statistics and Dimensions tables to data sets.

      proc mixed data=heart2;
        class drug person time;
        model hr=drug time time*drug / ddfm=kr;
        repeated time / type=cs subject=person r;
        ods output FitStatistics=FitCS(rename=(value=CS))
                   Dimensions=ParmCS(rename=(value=NumCS));
        run;
      
      proc mixed data=heart2;
        class drug person time;
        model hr=drug time time*drug / ddfm=kr;
        repeated time / type=ar(1) subject=person r;
        ods output FitStatistics=FitAR1(rename=(value=AR1))
                   Dimensions=ParmAR1(rename=(value=NumAR1));
        run;
      
      proc mixed data=heart2;
        class drug person time;
        model hr=drug time time*drug / ddfm=kr;
        repeated time / type=toep subject=person r;
        ods output FitStatistics=FitToep(rename=(value=Toep))
                   Dimensions=ParmToep(rename=(value=NumToep)); 
        run;
      
      data all;
        merge FitCS FitAR1 FitToep;
        run;
      
      proc print data=all label noobs;
        run;

Data set ALL combines the fit statistics from the three models. The statistics are presented below. The small value of AICC (and BIC) for the AR(1) model suggests that it is the preferable model. The smaller-is-better rule applies even when values are negative. For example, -200 indicates a better model than -100.

Description	CS	AR1	Toep
-2 Res Log Likelihood	488.8	484.6	481.7
AIC (smaller is better)	492.8	488.6	489.7
AICC (smaller is better)	492.9	488.8	490.2
BIC (smaller is better)	495.2	491.0	494.4

This approach can be used whether the covariance structures are nested or not. As long as the model specification in the MODEL statement remains the same, different covariance structures for the model can be compared by this method. Comparison by the likelihood ratio test (presented next) requires that one structure be a special case of the other structure.

Construct a Likelihood Ratio Test

Suppose two models have the same MODEL statement, but different covariance structures in the REPEATED statement. If the covariance structure in one model is a special case of the covariance structure in the other model, you can construct a likelihood ratio test to compare the two models.

The following example compares the AR(1) and Toeplitz models above. Both structures have constant variance, but under the AR(1) structure the correlations change over time according to the power function. The Toeplitz structure does not have this requirement. Consequently, the AR(1) structure is a special case of the Toeplitz structure.

The DATA step below merges the data sets containing the -2 log likelihood values and the number of covariance parameters from the two models. The difference in the two -2 log likelihood values is the likelihood ratio statistic which is chi-square distributed. The difference in the number of covariance parameters between the two models is the degrees of freedom for the statistic. The p-value for the likelihood ratio test is computed using the PROBCHI function.

      data result;
        merge FitAR1 FitToep ParmAR1 ParmToep;
        if _n_ = 1 then do;
           ChiAR1Toep=AR1-Toep;
           dfAR1Toep=NumToep-NumAR1;
           pAR1Toep=1-probchi(ChiAR1Toep, dfAR1Toep);
           output;
           stop;
        end;
        run;
      
      title 'Likelihood Ratio Test: AR1 vs Toeplitz';
      proc print data=result label noobs;
        var ChiAR1Toep dfAR1Toep pAr1Toep;
        label ChiAR1Toep="Chi-Square"
              dfAR1Toep="DF"
              pAR1Toep="Pr > ChiSq";
        run;

The nonsignificant likelihood ratio test indicates that there is no evidence to prefer the more general Toeplitz structure over the simpler AR(1) structure.

Likelihood Ratio Test: AR1 vs Toeplitz

Chi-Square	DF	Pr > ChiSq
2.92525	2	0.23163

Use the COVTEST Statement in PROC GLIMMIX

PROC GLIMMIX is a procedure for generalized linear mixed models, which includes the linear mixed model as a special case. Most models that can be fit by PROC MIXED can also be fit using PROC GLIMMIX. In PROC GLIMMIX, the COVTEST statement enables you to compare covariance structures which are linearly nested. Two covariance matrices are linearly nested if you can specify coefficients in the GENERAL option of the COVTEST statement which reduce the more general matrix to the simpler matrix. For example, the COVTEST statement can be used to compare unstructured and compound symmetric covariance matrices, because the equal variances and equal covariances constraints needed to reduce the unstructured covariance matrix to the compound symmetric matrix are linear. However, the COVTEST statement cannot be used to compare unstructured and AR(1) matrices, or to compare Toeplitz and AR(1) matrices, because the constraints needed to reduce the unstructured and Toeplitz structures to the AR(1) structure are not linear (the power function of the correlation in AR(1) is not a linear constraint). The COVTEST statement computes a likelihood ratio test to compare the more complex covariance structure specified in the RANDOM statement with the constrained structure specified in the COVTEST statement. This note provides more information about using the COVTEST statement.

The following example compares the unstructured and compound symmetric structures in the above model. PROC GLIMMIX does not have a REPEATED statement as in PROC MIXED. Instead, it provides the RANDOM _RESIDUAL_ statement that can be used in place of the REPEATED statement in PROC MIXED. The first three rows of coefficients in the GENERAL option of the COVTEST statement constrain the four variances in the unstructured matrix to be equal. Note that the position of a coefficient in a row corresponds to the position of the parameter in the "Covariance Parameter Estimates" table. The remaining rows constrain all of the covariances to be equal. With these linear constraints, an unstructured covariance matrix becomes a compound symmetric matrix. See the GLIMMIX documentation for details of the COVTEST syntax.

      proc glimmix data=heart2;
        class drug person time;
        model hr=drug time time*drug / ddfm=kr;
        random _residual_ / type=un subject=person ;
        covtest 'CS' general  1 0 -1,
                              1 0 0 0 0 -1,
                              1 0 0 0 0 0 0 0 0 -1,
                              0 1 0 -1,
                              0 1 0 0 -1,
                              0 1 0 0 0 0 -1,
                              0 1 0 0 0 0 0 -1,
                              0 1 0 0 0 0 0 0 -1;
        run;

The results indicate that the compound symmetric structure fits your data adequately compared with the unstructured covariance matrix (p=0.1788).

Tests of Covariance Parameters Based on the Restricted Likelihood
Label	DF	-2 Res Log Like	ChiSq	Pr > ChiSq	Note
CS	8	488.80	11.42	0.1788	DF

DF: P-value based on a chi-square with DF degrees of freedom.

_____

Guerin, L. and Stroup, W.W. (2000), "A simulation study to evaluate proc mixed analysis of repeated measures data," Proceedings of the 12th Annual Conference on Applied Statistics in Agriculture, Kansas State University, Manhattan, KS.

Milliken, G. A. and Johnson, D. E. (1992), Analysis of Messy Data, Volume 1: Designed Experiments, New York: Chapman and Hall.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2008
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008 for x64
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Vista for x64

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Mixed Models SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> MIXED

Date Modified:	2010-11-10 12:46:59
Date Created:	2009-09-07 10:10:51