32927 - Comparing models or testing model significance in PROC GAM

Usage Note 32927: Comparing models or testing model significance in PROC GAM

As discussed and illustrated in this note, other procedures such as PROC GENMOD allow you to compare two nested models by using the CONTRAST statement to perform a joint test of the parameters of the full model that are deleted (set to zero) in the reduced model. An alternative way to do this is to fit both the full and reduced models and construct a test based on their likelihood or deviance values. Since a CONTRAST statement is not available in PROC GAM, the second method can be used.

Binomial and Poisson models — no scale parameter

For binomial and poisson models which have no scale parameter, an test statistic can be formed by computing the difference in deviance values. This statistic is approximately chi-square distributed with degrees of freedom equal to the difference in the degrees of freedom of the two models. For example, consider the model in the Generalized Additive Model with Binary Data example presented in the GAM procedure documentation:

   proc gam data=kyphosis; 
      model Kyphosis = spline(Age      ,df=3) 
                       spline(StartVert,df=3)  
                       spline(NumVert  ,df=3) / dist=binomial; 
      run;

The deviance of this model is 46.61 as shown in the Iteration Summary and Fit Statistics table. The model degrees of freedom for this model are ten, with one for the intercept and each of the linear components, plus two for each of the spline-smoothed predictors. Note that the degrees of freedom for each of the smoothed predictors should be determined from the Analysis of Deviance and Regression Model Analysis Parameter Estimates tables rather than using the requested degrees of freedom specified in the DF= option because the degrees of freedom in the final fitted model may be more or less than requested. Final degrees of freedom may not be integer-valued. The model degrees of freedom is the sum of the DF column in the Analysis of Deviance table plus one for each row of the Regression Model Analysis Parameter Estimates table that has a nonmissing standard error.

A joint test of all of the predictors in the model (that is, an overall test of the model) can be had by comparing the above model to a reduced model which contains only an intercept:

   proc gam data=kyphosis; 
      model Kyphosis =  / dist=binomial; 
      run;

The deviance of the intercept-only model is 86.80. The model degrees of freedom is 1. The difference in deviances of the two models is 40.19 and the difference in model degrees of freedom is 9. These statements display the p-value in the SAS log:

   data _null_; 
      p=1-probchi(40.19, 9);  
      put p= pvalue.; 
      run;

The small p-value (p<0.0001) indicates that some association exists between one or more of the smoothed predictors and the response.

Continuous distributions with scale parameter

For continuous distributions, an F statistic can be formed. Consider the model for diabetes in the Getting Started section of the GAM procedure documentation:

   proc gam data=diabetes;
       model logCP = spline(Age) spline(BaseDeficit);
   run;

The deviance for this model is 0.4181. The model degrees of freedom are 9 — the intercept, two linear components, plus three for each of the two spline-smoothed predictors.

Suppose you want to compare this model to a reduced model containing only Age. Since the difference between the full and reduced models is only the presence of BaseDeficit, a test of the full effect of this predictor is needed. An F test can be constructed as the difference in deviances of the full and reduced models divided by the product of the scale parameter estimate and the difference in the models' degrees of freedom. The scale parameter is estimated by the full model's deviance divided by the residual degrees of freedom. The residual degrees of freedom is the number of observations minus the model degrees of freedom. The residual degrees of freedom for the full model is 43 - 9 = 34, so the scale parameter estimate is 0.4181/34. The resulting F statistic has numerator degrees of freedom equal to the difference in the models' degrees of freedom and denominator degrees of freedom equal to the residual degrees of freedom.

The following statements fit the reduced model which drops the BaseDeficit predictor.

   proc gam data=diabetes;
       model logCP = spline(Age);
   run;

The reduced model has deviance of 0.6178 and 5 model degrees of freedom. The F statistic is then

   F = (0.6178 - 0.4181) / ((9-5) * 0.4181/34) = 4.0599

with 9 - 5 = 4 and 34 degrees of freedom. These statements display the p-value in the SAS log:

   data _null_; 
      p=1-probf(4.0599, 4, 34);  
      put p= pvalue.; 
      run;

The small p-value (p=0.0085) indicates that BaseDeficit is significantly associated with the response, and therefore that the reduced model is inferior to the full model.

Note that a test of BaseDeficit is provided in the Analysis of Deviance table of the full model (p=0.0854). However, this is a test only of the nonlinear component of BaseDeficit having three of its four degrees of freedom. The linear component, with one degree of freedom, is separately tested in the Regression Model Analysis Parameter Estimates table with significance level p=0.0025. In effect, the full model is being compared to the following model which includes only the linear component of BaseDeficit:

   proc gam data=diabetes;
       model logCP = param(BaseDeficit) spline(Age);
   run;

The significance of the linear component, along with the marginal significance of the nonlinear component, suggests that BaseDeficit is associated with the response. The partial prediction plot for BaseDeficit shown in the example shows a noticable quadratic effect. The significant F test, which tests the overall association (both linear and nonlinear) of BaseDeficit with the response, concurs that an association exists.

Comparing models using AIC

Models based on the same observations and distribution can be compared, whether nested as above or not, using the AIC statistic. This statistic can be used to order the models with respect to their fit with smaller values of AIC indicating better fit. However, there is no statistical test comparing AIC values.

AIC can be computed for GAM models as Deviance + 2p, where Deviance is reported in the Iteration Summary and Fit Statistics table, and p is the model degrees of freedom which are determined as described above.

This AIC statistic can be used to compare models fit by PROC GAM only. You cannot compare values of this AIC statistic with values of AIC statistics computed by other procedures such as PROC GENMOD (AIC is provided beginning in SAS 9.2) or PROC GLIMMIX. These procedures compute AIC as -2LL + 2p, where LL is the log likelihood of the fitted model.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	SAS Reference ==> Procedures ==> GAM Analytics ==> Nonparametric Analysis Analytics ==> Regression

Date Modified:	2019-05-03 14:57:05
Date Created:	2008-08-11 13:05:29

Support

Usage Note 32927: Comparing models or testing model significance in PROC GAM

Operating System and Release Information