Generalized Additive Models


Many nonparametric methods do not perform well when there are a large number of independent variables in the model. The sparseness of data in this setting inflates the variance of the estimates. The problem of rapidly increasing variance for increasing dimensionality is sometimes referred to as the "curse of dimensionality." Interpretability is another problem with nonparametric regression based on kernel and smoothing spline estimates (Hastie and Tibshirani 1990).

To overcome these difficulties, Stone (1985) proposed additive models. These models estimate an additive approximation to the multivariate regression function. The benefits of an additive approximation are at least twofold. First, because each of the individual additive terms is estimated using a univariate smoother, the curse of dimensionality is avoided, at the cost of not being able to approximate universally. Second, estimates of the individual terms explain how the dependent variable changes with the corresponding independent variables.

Hastie and Tibshirani (1990) proposed generalized additive models. These models assume that the mean of the dependent variable depends on an additive predictor through a nonlinear link function. Generalized additive models permit the response probability distribution to be a member of the exponential family of distributions. Many widely used statistical models belong to this general class, including additive models for Gaussian data, nonparametric logistic models for binary data, and nonparametric log-linear models for Poisson data.

The GAM Method

Suppose that $Y$ is a response random variable and $X_1, \ldots , X_ p$ is a set of predictor variables. A regression procedure can be viewed as a method of estimating how the value of $Y$ depends on the values of $X_1, \ldots , X_ p$. The standard linear regression model assumes that the expected value of $Y$ has a linear form:

\[ E(Y) = f(X_1, \ldots , X_ p) = \beta _0 + \beta _1 X_1 + \cdots + \beta _ p X_ p \]

Given a sample of values for $Y$ and $X$, estimates of $\beta _0, \beta _1, \ldots , \beta _ p$ are often obtained by the least squares method.

The additive model generalizes the linear model by modeling the expected value of Y as

\[ E(Y) = f(X_1, \ldots , X_ p) = s_0 + s_1(X_1) + \cdots + s_ p(X_ p) \]

where $s_ i(X)$, $i = 1, \ldots , p$, are smooth functions. These functions are estimated in a nonparametric fashion.

Generalized additive models extend traditional linear models in another way, namely by allowing for a link between $f(X_1, \ldots ,X_ p)$ and the expected value of $Y$. This amounts to allowing for an alternative distribution for the underlying random variation besides just the normal distribution. Although Gaussian models can be used in many statistical applications, there are types of problems for which they are not appropriate. For example, the normal distribution might not be adequate for modeling discrete responses such as counts or bounded responses such as proportions.

Generalized additive models consist of a random component, an additive component, and a link function that relates these two components to each other. The response $Y$, the random component, is assumed to have a density in the exponential family

\[ f_ Y(y;\theta ,\phi ) = \exp \left( \frac{y \theta - b(\theta )}{\alpha (\phi )} + c(y,\phi ) \right) \]

where $\theta $ is called the natural parameter and $\phi $ is the scale parameter. The normal, binomial, and Poisson distributions are all in this family. The quantity

\[ \eta = s_0 + \sum _{i=1}^{p} s_ i(X_ i) \]

where $s_1(\cdot ), \ldots , s_ p(\cdot )$ are smooth functions defines the additive component. Finally, the relationship between the mean $\mu $ of the response variable and $\eta $ is defined by a link function $g(\mu )=\eta $. The most commonly used link function is the canonical link, for which $\eta =\theta $.

A combination of backfitting and local scoring algorithms is used in the actual fitting of the model.

Generalized additive models and generalized linear models can be applied in similar situations, but they serve different analytic purposes. Generalized linear models emphasize estimation and inference for the parameters of the model; generalized additive models focus on exploring data nonparametrically.

SAS/STAT software provides two procedures that fit generalized additive models: the GAM procedure and the GAMPL procedure. The features, comparative differences, and an example that uses both procedures are presented in the discussion that follows.

The GAM Procedure

The GAM procedure implements the generalized additive model proposed by Hastie and Tibshirani (1990). PROC GAM does the following:

  • fits nonparametric or semiparametric additive models

  • supports the use of multidimensional data

  • supports multiple SCORE statements

  • enables you to specify the model degrees of freedom or smoothing parameter

PROC GAM can fit Gaussian, binomial, Poisson, gamma, and inverse Gaussian distributions. Although theoretically more than one link can exist for each distribution, PROC GAM always uses the canonical link. This is because the difference between link alternatives is less pronounced for nonparametric models, as a result of the flexibility of nonparametric model forms.

The GAMPL Procedure

The GAMPL procedure is a high-performance procedure that fits generalized additive models that are based on low-rank regression splines (Wood 2006). Each spline term is constructed by the thin-plate regression spline technique (Wood 2003). A roughness penalty is applied to each spline term by a smoothing parameter that controls the balance between goodness of fit and the roughness of the spline curve. PROC GAMPL fits models for standard distributions in the exponential family, such as binomial, gamma, Gaussian, inverse Gaussian, Poisson, and negative binomial distributions.

PROC GAMPL runs in either single-machine mode or distributed mode. Note: Distributed mode requires SAS High-Performance .

PROC GAMPL offers the following basic features:

  • estimates the regression parameters of a generalized additive model that has fixed smoothing parameters by using penalized likelihood estimation

  • estimates the smoothing parameters of a generalized additive model by using either the performance iteration method or the outer iteration method

  • estimates the regression parameters of a generalized linear model by using maximum likelihood techniques

  • tests the total contribution of each spline term based on the Wald statistic

  • provides model-building syntax in the CLASS statement and effect-based parametric effects in the MODEL statement, which are used in other SAS/STAT analytic procedures (in particular, the GLM, LOGISTIC, GLIMMIX, and MIXED procedures)

  • provides response-variable options

  • enables you to construct a spline term by using multiple variables

  • provides control options for constructing a spline term, such as fixed degrees of freedom, initial smoothing parameter, fixed smoothing parameter, smoothing parameter search range, user-supplied knot values, and so on

  • provides multiple link functions for any distribution

  • provides a WEIGHT statement for weighted analysis

  • provides a FREQ statement for grouped analysis

  • provides an OUTPUT statement to produce a data set that has predicted values and other observationwise statistics

  • produces graphs via ODS Graphics

The GAMPL procedure is a high-performance analytical procedure. It does everything in the preceding list, plus it also does the following:

  • enables you to run in distributed mode on a cluster of machines that distribute the data and the computations

  • enables you to run in single-machine mode on the server where SAS is installed

  • exploits all the available cores and concurrent threads, regardless of execution mode

PROC GAMPL Contrasted with PROC GAM

Both the GAMPL procedure and the GAM procedure in SAS/STAT software fit generalized additive models. However, the GAMPL procedure uses different approaches for constructing spline basis expansions, fitting generalized additive models, and testing smoothing components. The GAMPL procedure focuses on automatic smoothing parameter selection by using global model-evaluation criteria to find optimal models. The GAM procedure focuses on constructing models by fitting partial residuals against each smoothing term. In general, you should not expect similar results from the two procedures. The following sections summarize the differences.

Constructing Spline Basis Expansions

The GAMPL procedure uses thin-plate regression splines to construct basis expansions for each spline term, and each term allows multiple variables. The GAM procedure uses univariate or bivariate smoothing splines to construct basis expansions, and each term allows only one or two variables. The thin-plate regression splines that PROC GAMPL uses are low-rank approximations of multivariate smoothing splines. PROC GAM also allows loess smoothers.

Fitting Generalized Additive Models

The GAMPL procedure fits a generalized additive model that has fixed smoothing parameters by using a global design matrix and a roughness penalty matrix. The GAM procedure uses partial residuals to fit against single smoothing terms. For models that have unknown smoothing parameters, PROC GAMPL estimates smoothing parameters simultaneously by optimizing global criteria such as generalized cross validation (GCV) and the unbiased risk estimator (UBRE). PROC GAM estimates each smoothing parameter by optimizing the local criterion GCV one spline term at a time.

Distribution Families and Link Functions

The GAMPL procedure supports all the distribution families and all the link functions that the GAM procedure supports. In addition, PROC GAMPL fits models in the negative binomial family. PROC GAMPL supports any link function for each distribution, whereas PROC GAM supports only the canonical link for each distribution.

Testing Smoothing Components

The GAMPL procedure tests the total contribution for a spline term, including both linear and nonlinear trends. The GAM procedure tests the existence of nonlinearity for a spline term beyond the linear trend.

Model Inference

A global Bayesian posterior covariance matrix is available for models that are fitted by the GAMPL procedure. The confidence limits for prediction of each observation are available, in addition to componentwise confidence limits. For generalized additive models that are fitted by the GAM procedure, only the componentwise confidence limits are available, and they are based on the partial residuals for each smoothing term. The degrees of freedom for generalized additive models that are fitted by PROC GAMPL is defined as the trace of the degrees-of-freedom matrix. The degrees of freedom for generalized additive models that are fitted by PROC GAM is approximated by summing the trace of the smoothing matrix for each smoothing term.

Example: Kyphosis Study

The data in this example are based on a study by Bell et al. (1994). Bell and his associates studied the result of multiple-level thoracic and lumbar laminectomy, a corrective spinal surgery commonly performed on children. The data in the study consist of retrospective measurements on 83 patients. The specific outcome of interest is the presence (1) or absence (0) of kyphosis, defined as a forward flexion of the spine of at least 40 degrees from vertical. The available predictor variables are age in months at time of the operation (Age), the starting of vertebrae levels involved in the operation (StartVert), and the number of levels involved (NumVert). The goal of this analysis is to identify risk factors for kyphosis.

The following statements fit a logistic additive model by using the GAM procedure. The model consists of the binary dependent variable Kyphosis and the ordinary independent variables NumVert, Age, and StartVert. Each term is fit by using a univariate smoothing spline with three degrees of freedom. Of these three degrees of freedom, one is taken up by the linear portion of the fit and two are left for the nonlinear spline portion. Although this might seem to be an unduly modest amount of flexibility, it is better to be conservative with a data set this small.

ods graphics on;
proc gam data=kyphosis plots=components(clm commonaxes);
  model kyphosis (event='1')= spline(NumVert,df=3)

The PLOTS=COMPONENTS option in the PROC GAM statement produces plots of the individual smoothing components. The CLM suboption in the PLOTS=COMPONENTS option adds a curvewise Bayesian confidence band to each smoothing component, and the COMMONAXES suboption forces all three smoothing component plots to share the same vertical axis limits, allowing a visual judgment of the relative nonparametric effect sizes.

The MODEL statement requests an additive model with a univariate smoothing spline for each term. The response variable option EVENT= chooses Kyphosis=1 (presence) as the event so that the probability of the presence of kyphosis is modeled. The option DIST=BINARY with binary responses specifies a logistic model.

Output 1 displays the model summary statistics.

Output 1: Summary Statistics

The GAM Procedure
Dependent Variable: Kyphosis
Smoothing Model Component(s): spline(NumVert) spline(Age) spline(StartVert)
Summary of Input Data Set
Number of Observations 83
Number of Missing Observations 0
Distribution Binomial
Link Function Logit

Response Profile
Kyphosis Total
1 0 65
2 1 18

Note: PROC GAM is modeling the probability that Kyphosis=1. One way to change this to model the probability that Kyphosis=0 is to specify the response variable option EVENT='0'.
Iteration Summary and Fit Statistics
Number of local scoring iterations 9
Local scoring convergence criterion 2.6613916E-9
Final Number of Backfitting Iterations 1
Final Backfitting Criterion 5.2335391E-9
The Deviance of the Final Estimate 46.610920737

The local scoring algorithm converged.

The critical part of the PROC GAM results is the "Analysis of Deviance" table, shown in Output 2. For each smoothing effect in the model, this table gives a $\chi ^2$ test that compares the deviance between the full model and the model without the nonparametric component of this variable. The analysis of deviance results indicate that the nonparametric effect of Age is highly significant, the nonparametric effect of StartVert is nearly significant, and the nonparametric effect of NumVert is insignificant at the 5% level.

Output 2: Model Fit Statistics

Regression Model Analysis
Parameter Estimates
Parameter Parameter
t Value Pr > |t|
Intercept -2.01532 1.45620 -1.38 0.1706
Linear(NumVert) 0.38347 0.19102 2.01 0.0484
Linear(Age) 0.01213 0.00794 1.53 0.1308
Linear(StartVert) -0.18615 0.07628 -2.44 0.0171

Smoothing Model Analysis
Fit Summary for Smoothing Components
Component Smoothing
Spline(NumVert) 0.921756 2.000000 20.143745 10
Spline(Age) 0.999996 2.000000 328.510512 66
Spline(StartVert) 0.999551 2.000000 317.644335 16

Smoothing Model Analysis
Analysis of Deviance
Source DF Sum of Squares Chi-Square Pr > ChiSq
Spline(NumVert) 2.00000 2.184519 2.1845 0.3355
Spline(Age) 2.00000 10.494389 10.4944 0.0053
Spline(StartVert) 2.00000 5.494851 5.4949 0.0641

The plots in Figure 1 show that the partial predictions corresponding to both Age and StartVert have a quadratic pattern, whereas NumVert has a more complicated but ultimately nonsignificant pattern.

Figure 1: Partial Prediction for Each Predictor

Partial Prediction for Each Predictor

The following statements fit a generalized additive model to the kyphosis study data by using the GAMPL procedure:

proc gampl data=kyphosis plots seed=12345;
  model kyphosis (event='1')= spline(NumVert)

The PLOTS option in the PROC GAMPL statement produces a panel of plots of partial prediction curves or surfaces of smoothing components. The SEED= option specifies an integer that is used to start the pseudorandom number generator for subset sampling from observations to form knots if necessary, and for truncated eigendecomposition.

The MODEL statement requests an additive model with a univariate smoothing spline for each term. PROC GAMPL uses an optimization algorithm to determine the optimal smoothing parameter. The response variable option EVENT= chooses Kyphosis=1 (presence) as the event so that the probability of the presence of kyphosis is modeled. The option DIST=BINARY with binary responses specifies a logistic model.

Output 3 displays the model summary statistics.

Output 3: Summary Statistics

The GAMPL Procedure
Performance Information
Execution Mode Single-Machine
Number of Threads 4

Data Access Information
Data Engine Role Path
WORK.KYPHOSIS V9 Input On Client

Model Information
Response Variable Kyphosis
Distribution Binary
Link Function Logit
Fitting Method Performance Iteration
Fitting Criterion UBRE
Optimization Technique for Smoothing Newton-Raphson
Random Number Seed 12345

Number of Observations Read 83
Number of Observations Used 83

Response Profile
Kyphosis Total
1 0 65
2 1 18

You are modeling the probability that Kyphosis='1'.

The performance iteration converged after 29 steps.

Output 4 displays the model fit statistics, which include spline transformations of all variables.

Output 4: Fit Statistics

Fit Statistics
Penalized Log Likelihood -26.94994
Roughness Penalty 2.68899
Effective Degrees of Freedom 6.12778
Effective Degrees of Freedom for Error 75.87099
AIC (smaller is better) 63.46646
AICC (smaller is better) 64.61781
BIC (smaller is better) 78.28859
UBRE (smaller is better) -0.23419

Output 5 displays the parameter estimates and test for the smoothing components. PROC GAMPL reports that the effective degrees of freedom value for spline transformations of NumVert is quite close to 1. This suggests a strictly linear form for NumVert. For Age and StartVert, the degrees of freedom values are much larger than 1. This suggests a nonlinear pattern in the dependency of the response on Age and StartVert.

The "Tests for Smoothing Components" table shows that the spline transformations for NumVert and StartVert are significant in predicting diabetes testing results at the 5% level; the significance test for Age is borderline.

Output 5: Parameter Estimates and Tests

Parameter Estimates
Parameter DF Estimate Standard
Chi-Square Pr > ChiSq
Intercept 1 -2.158016 0.474521 20.6822 <.0001

Estimates for Smoothing Components
Component Effective
Number of
Rank of
Number of
Spline(NumVert) 1.00000 22215788 2.034E-7 9 10 10
Spline(Age) 2.11043 24744.0 1.6007 9 10 66
Spline(StartVert) 2.01735 24.3226 1.0883 9 10 16

Tests for Smoothing Components
Component Effective
DF for Test
Chi-Square Pr > ChiSq
Spline(NumVert) 1.00000 1 3.9377 0.0472
Spline(Age) 2.11043 3 7.7823 0.0507
Spline(StartVert) 2.01735 3 9.0739 0.0283

The smoothing component panel (which is produced by the PLOTS option and shown in Output 42.2.9) visualizes the spline transformations for the three variables in addition to 95% Bayesian curvewise confidence bands. For NumVert, the spline transformation is almost a straight line. For Age, the spline transformation shows a nonmonotonic nonlinear trend and that it is unambiguously nonzero only within a fairly narrow range of values. For NumVert, the dependency is slightly nonlinear and monotonically decreasing.

Output 6: Smoothing Components Panel

Smoothing Components Panel

For More Information

For more information, see the following:

  • The GAM Procedure in SAS/STAT User's Guide

  • The GAMPL Procedure in SAS/STAT User's Guide: High-Performance Procedures or SAS/STAT User's Guide


  • Hastie, T. J., and Tibshirani, R. J. (1990). Generalized Additive Models. New York: Chapman & Hall.

  • Stone, C. J. (1985). “Additive Regression and Other Nonparametric Models.” Annals of Statistics 13:689–705.

  • Wood, S. (2003). “Thin Plate Regression Splines.” Journal of the Royal Statistical Society, Series B 65:95–114.

  • Wood, S. (2006). Generalized Additive Models. Boca Raton, FL: Chapman & Hall/CRC.