The GAM Procedure

Distribution Family and Canonical Link

In general, there is not just one reasonable link function for a given response variable distribution. For parametric models, the choice of link function can lead to substantively different estimates and tests. However, the inherent flexibility of nonparametric models makes them less likely to be sensitive to the precise choice of link function. Thus, for simplicity and computational efficiency, the GAM procedure uses only the canonical link for each distribution, as discussed in the following sections.

The Gaussian Model

For a Gaussian model, the link function is the identity function, and the generalized additive model is the same as the additive model. The Gaussian model is selected by default or when you specify the DIST=GAUSSIAN option in the MODEL statement.

The Binomial Model

The binomial model is selected by specifying the DIST=BINOMIAL option in the MODEL statement. A binomial response model assumes that the proportion of successes Y is such that Y has a $\mr {Bi}(n, p(x))$ distribution. $\mr {Bi}(n, p(x))$ refers to the binomial distribution with the parameters n and $p(x)$. Often the data are binary, in which case n = 1. The canonical link is

\[  g(p) = \log \frac{p}{n-p} = \eta  \]

By default, PROC GAM models the probability of the response level with the lower ordered value. Ordered values are assigned to response levels in ascending sorted order and are displayed in the Response Profiles table. For binary data, if your event category has a higher Ordered Value, then by default the nonevent is modeled. The effect of modeling the nonevent is to change the signs of the estimated coefficients for linear terms in the model for the event. You can change which probability is modeled by specifying the EVENT=, DESCENDING, or ORDER= response variable options in the MODEL statement.

The Poisson Model

The Poisson model is selected by specifying the DIST=POISSON option in the MODEL statement. The link function for the Poisson model is the log function. Assuming that the mean of the Poisson distribution is $\mu (x)$, the dependence of $\mu (x)$ and independent variables $x_1,\cdots ,x_ k$ is

\[  g(\mu ) = \log (\mu ) = \eta  \]

The Gamma Model

The gamma model is selected by specifying the DIST=GAMMA option in the MODEL statement. Let the mean of the gamma distribution be $\mu (x)$. The canonical link function for the gamma distribution is $-1/\mu (x)$. Note that this link function is the negative of the default link function in PROC GENMOD for a gamma model. The relationship between $\mu (x)$ and the independent variables $x_1,\cdots ,x_ k$ is

\[  g(\mu ) = -\frac{1}{\mu } = \eta  \]

The Inverse Gaussian Model

The inverse Gaussian model is selected by specifying the DIST=IGAUSSIAN option in the MODEL statement. Let the mean of the inverse Gaussian distribution be $\mu (x)$. The canonical link function for inverse Gaussian distribution is $1/\mu ^2$. Therefore, the relationship between $\mu (x)$ and the independent variables $x_1,\cdots ,x_ k$ is

\[  g(\mu ) = \frac{1}{\mu ^2} = \eta  \]