All probability distributions that the GAMPL procedure fits are members of an exponential family of distributions. For the specification of an exponential family, see the section "Generalized Linear Models Theory" in Chapter 44: The GENMOD Procedure.
Table 42.11 lists and defines some common notation that is used in the context of generalized linear models and generalized additive models.
Table 42.11: Common Notation
Notation 
Meaning 


Loglikelihood 

Penalized loglikelihood 

Deviance 

Penalized deviance 

Link function 

Inverse link function 

Response mean 

Linear predictor 

Dispersion parameter 

Column of adjusted response variable 

Column of response variance 

Prior weight for each observation 

Adjusted weight for each observation 

Diagonal matrix of adjusted weights 
The GAMPL procedure supports the following distributions: binary, binomial, gamma, inverse Gaussian, negative binomial, normal (Gaussian), and Poisson.
For forms of loglikelihood functions for each of the probability distributions, see the section "LogLikelihood Functions" in Chapter 44: The GENMOD Procedure. For forms of the deviance for each of the probability distributions, see the section "Goodness of Fit" in Chapter 44: The GENMOD Procedure.
Generalized additive models are extensions of generalized linear models (Nelder and Wedderburn 1972). For each observation that has a response and a row vector of the model matrix , both generalized additive models and generalized linear models assume the model additivity
where and is independently distributed in some exponential family. Generalized linear models further assume model linearity by for . Generalized additive models relax the linearity assumption by allowing some smoothing functions to characterize the dependency. The GAMPL procedure constructs the smoothing functions by using thinplate regression splines. For more information about generalized additive models and other type of smoothing functions, see Chapter 41: The GAM Procedure.
Consider a generalized additive model that has some linear terms with coefficients and p smoothing functions . Each smoothing function can be constructed by thinplate regression splines with a smoothing parameter . Using the notations described in the section LowRank Approximation, you can characterize each smoothing function by
Notice that each smoothing function representation contains a zerodegree polynomial that corresponds to a constant. Having multiple constant terms makes the smoothing functions unidentifiable. The solution is to include a global constant term (that is, the intercept) in the model and enforce the centering constraint to each smoothing function. You can write the constraint as
By using a similar approach as the linear constraint for thinplate regression splines, you obtain the orthogonal column basis via the QR decomposition of such that . Each smoothing function can be reparameterized as .
Let and . Then the generalized additive model can be represented as . The roughness penalty matrix is represented as a block diagonal matrix:
Then the roughness penalty is measured in the quadratic form .
Given a fixed vector of smoothing parameters, , you can fit the generalized additive models by the penalized likelihood estimation. In contrast to the maximum likelihood estimation, penalized likelihood estimation obtains an estimate for by maximizing the penalized log likelihood,
Any optimization technique that you can use for maximum likelihood estimation can also be used for penalized likelihood estimation. If firstorder derivatives are required for the optimization technique, you can compute the gradient as
If secondorder derivatives are required for the optimization technique, you can compute the Hessian as
In the gradient and Hessian forms, and are the corresponding gradient and Hessian, respectively, for the loglikelihood for generalized linear models. For more information about optimization techniques, see the section Choosing an Optimization Technique.