Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The GAM Procedure

Additive Models and Generalized Additive Models

This section describes the methodology and the fitting procedure behind generalized additive models.

Let Y be a response random variable and X1, X2, ... , Xp be a set of predictor variables. A regression procedure can be viewed as a method for estimating the expected value of Y given the values of X1, X2, ... , Xp. The standard linear regression model assumes a linear form for the conditional expectation

E(Y| X_1, X_2, ... , X_p)=\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_p X_p
Given a sample, estimates of \beta_0, \beta_1, ... , \beta_p are usually obtained by the least squares method.

The additive model generalizes the linear model by modeling the conditional expectation as

E(Y| X_1, X_2, ... , X_p)=s_0 + s_1(X_1) + s_2(X_2) + ... + s_p(X_p)

where si(X), i = 1,2, ... , p are smooth functions.

In order to be estimable, the smooth functions si have to satisfy standardized conditions such as Esj(Xj) = 0. These functions are not given a parametric form but instead are estimated in a nonparametric fashion.

While traditional linear models and additive models can be used in most statistical data analysis, there are types of problems for which they are not appropriate. For example, the normal distribution may not be adequate for modeling discrete responses such as counts or bounded responses such as proportions.

Generalized additive models address these difficulties, extending additive models to many other distributions besides just the normal. Thus, generalized additive models can be applied to a much wider range of data analysis problems.

Similar to generalized linear models, generalized additive models consist of a random component, an additive component, and a link function relating the two components. The response Y, the random component, is assumed to have exponential family density

f_Y(y;\theta; \phi)=\exp \{ \frac{y\theta - b(\theta)}{a(\phi)} +c(y, \phi)\}

where \theta is called the natural parameter and \phi is the scale parameter. The mean of the response variable \mu is related to the set of covariates X1, X2, ... , Xp by g(\mu)=\eta. Here, \eta is defined as

\eta=s_0 + \sum_{i=1}^p s_i(X_i)
where s1(·), ... , sp(·) are smooth functions, the quantity \eta is the linear component, and g(·) is the link function. The most commonly used link for a given f is called the canonical link, for which \eta=\theta.

Generalized additive models and generalized linear models can be applied in similar situations, but they serve different analytic purposes. Generalized linear models emphasize estimation and inference for the parameters of the model, while generalized additive models focus on exploring data nonparametrically. Generalized additive models are more suitable for exploring the data set and visualizing the relationship between the dependent variable and the independent variables.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.