Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The GAM Procedure

Back-Fitting and Local Scoring Algorithms

Consider the estimation of the smoothing terms s0, s1(·), ... , sp(·) in the additive model

\mu(X)=s_0 + \sum_{i=1}^p s_j(X_j)
where E[sj(Xj)] = 0 for every j. Since the algorithm for additive models is the basis for fitting generalized additive models, the algorithm for additive models is discussed first.

Many ways are available to approach the formulation and estimation of additive models. The back-fitting algorithm is a general algorithm that can fit an additive model using any regression-type fitting mechanisms.

Define the partial residual as

R_j=Y - s_0 - \sum_{k \ne j} s_k(X_k)
Then E(Rj|Xj) = sj(Xj). This observation provides a way for estimating each smoothing function sj(·) given estimates \{\hat{s}_i(\cdot), i \ne j\} for all the others. The resulting iterative procedure is known as the back-fitting algorithm (Friedman and Stuetzle 1981).

The Back-Fitting Algorithm

  1. Initialization:
    s0 = E(Y), s11 = s21 = ... = sp1 = 0, m = 0.
  2. Iterate:
    m = m+1
    for j=1 to p do:
    R_j=Y-s_0-\sum_{k=1}^{j-1}s_k^m(X_k) -\sum_{k=j+1}^p s_k^{m-1}(X_k)
    Sjm = E(Rj|Xj).
  3. Until:
    {RSS}={Avg}(Y-s_0-\sum^p_{j=1}s_j^m(X_j))^2 fails to decrease.

In the above notation, sjm(·) denotes the estimate of sj(·) at the mth iteration. It can be shown that RSS never increases at any step, which implies that the algorithm always converges. However, the individual functions need not be unique, since dependence among the covariates can lead to more than one representation for the same fitted surface.

A weighted back-fitting algorithm has the same form as for the unweighted case, except that the smoothers are weighted. The weights might represent the relative precision of each observation or might arise as part of another iterative procedure. For example, weights are used in the local scoring procedure described later in this section.

The algorithm so far described fits just additive models. The algorithm for generalized additive models is a little more complicated. Generalized additive models extend generalized linear models in the same manner that additive models extend linear regression models, that is, by replacing form \alpha+\sum_j X_j \beta_j with the additive form \alpha+\sum_j f_j(X_j). Thus, it is helpful to review the iteratively reweighted least-square procedure for computing the maximum likelihood estimates in a generalized linear model.

For generalized linear models, the maximum likelihood estimate of \beta is defined by the score equations

\sum_{i=1}^n x_{ij} (\frac{\partial \mu_i}{\partial\eta_i}) V^{-1}_i (y_i-\mu_i),j=0, 1, ... ,p
where Vi is the variance matrix for Yi. The Fisher scoring procedure is the standard method for solving these equations. It involves a Newton-Raphson algorithm using the expected (as opposed to the observed) information matrix. An equivalent procedure that is convenient for this problem is called dependent variable regression and is a form of iteratively reweighted least squares. Given a current coefficient vector \beta^0,with corresponding linear predictor \eta^0 and fitted values \mu^0, construct the adjusted dependent variable
z_i=\eta^0_i + (y_i - \mu_i^0) (\frac{ \partial \eta_i}{\partial \mu_i})_0
Define weights wi by
w_i ^{-1}=(\frac{\partial \eta_i}{\partial \mu_i})_0^2 V_i^0
The algorithm proceeds by regressing zi on x with weight wi to obtain a revised estimate \beta. Then a new \mu^0 and \eta^0 are computed, new zis are computed, and the process is repeated until the change in the deviance
D(y; \hat{\mu})=2(l(\mu_{max};y)-l(\hat{\mu}'y))
is sufficiently small.

Some adjusted dependent variables and weights for commonly used models are listed in the following table.

Distribution Link Adjusted Dependent(Z) Weights(w)
Normalidentityy1
{Bin}(n,\mu)logit\eta+(y-\mu)/n\mu(1-\mu)n\mu(1-\mu)
Gammalog\eta+(y-\mu)/\mu1
Poissonlog\eta+(y-\mu)/\mu\mu

Generalized additive models differ from generalized linear models in that an additive predictor replaces the linear predictor. Estimation of the additive terms is accomplished by replacing the weighted linear regression in the adjusted dependent variable regression by the weighted back-fitting algorithm for fitting a weighted additive mode. This results in the algorithm described below as the local scoring algorithm. The name "local scoring" derives from the fact that local averaging is used to generalize the Fisher scoring procedure.

The General Local Scoring Algorithm

  1. Initialization:
    si = g(E(y)), s10 = s20 = ... = sp0 = 0, m = 0.
  2. Iterate:
    m = m+1
    Form the adjusted dependent variable, predicator, and mean based on the previous iteration
     Z=\eta^{m-1}+(Y-\mu^{m-1}) (\partial\eta/\partial\mu^{m-1})
    \eta^{m-1}=s_0+\sum_{j=1}^p s_j^{m-1}(X_j)
    \mu^{m-1}=g^{-1}(\eta^{m-1}).

    Form the weights
    w_i=(\partial\mu ^{m-1}/\partial \eta^{m-1})^2V^{-1}_i.
    Fit an additive model to Z using the back-fitting algorithm with weights W to obtain estimated functions sjm(·).

  3. Until:
    Avg(D(Y,\mu^m)) fails to decrease, where Avg(D(Y,\mu^m)) is an average of the deviance of estimate \mu^m.

The estimating procedure for generalized additive models consists of two loops. Inside each step of the local scoring algorithm (outer loop), a weighted back-fitting algorithm (inner loop) is used until convergence, that is, until the RSS fails to decrease. Then, based on the estimates from this weighted back-fitting algorithm, a new set of weights is calculated and the next iteration of the scoring algorithm starts. The scoring algorithm stops when the deviance of the estimates ceases to decrease.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.