The HPGENSELECT Procedure

Exponential Family Distributions

Many of the probability distributions that the HPGENSELECT procedure fits are members of an exponential family of distributions, which have probability distributions that are expressed as follows for some functions $b$ and $c$ that determine the specific distribution:

\[  f(y) = \exp \left\{  \frac{y\theta - b(\theta )}{\phi } + c(y,\phi ) \right\}   \]

For fixed $\phi $, this is a one-parameter exponential family of distributions. The response variable can be discrete or continuous, so $f(y)$ represents either a probability mass function or a probability density function. A more useful parameterization of generalized linear models is by the mean and variance of the distribution:

\begin{eqnarray*}  \mr {E}(Y) &  = &  b^{\prime }(\theta ) \\ \mr {Var}(Y) &  = &  b^{\prime \prime }(\theta ) \phi \\ \end{eqnarray*}

In generalized linear models, the mean of the response distribution is related to linear regression parameters through a link function,

\[  g(\mu _ i) = \mb {x}_ i^\prime \bbeta  \]

for the ith observation, where $\mb {x}_ i$ is a fixed known vector of explanatory variables and $\bbeta $ is a vector of regression parameters. The HPGENSELECT procedure parameterizes models in terms of the regression parameters $\bbeta $ and either the dispersion parameter $\phi $ or a parameter that is related to $\phi $, depending on the model. For exponential family models, the distribution variance is $\mr {Var}(Y) = \phi \mr {V}(\mu )$ where $\mr {V}(\mu )$ is a variance function that depends only on $\mu $.

The zero-inflated models and the multinomial models are not exponential family models, but they are closely related models that are useful and are included in the HPGENSELECT procedure.