# The HPGENSELECT Procedure

### Log-Likelihood Functions

Subsections:

The HPGENSELECT procedure forms the log-likelihood functions of the various models as

where is the log-likelihood contribution of the th observation that has weight , and is the value of the frequency variable. For the determination of and , see the WEIGHT and FREQ statements. The individual log-likelihood contributions for the various distributions are as follows.

In the following, the mean parameter for each observation is related to the regression parameters through the linear predictor by

where g is the link function.

There are two link functions and linear predictors that are associated with zero-inflated Poisson and zero-inflated negative binomial distributions: one for the zero-inflation probability , and another for the parameter , which is the Poisson or negative binomial mean if there is no zero-inflation. Each of these parameters is related to regression parameters through an individual link function,

where h is one of the following link functions that are associated with binary data: complementary log-log, log-log, logit, or probit. These link functions are also shown in Table 7.8.

#### Binary Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th binary observation as

Here, is the probability of an event, and the variable takes on the value 1 for an event and the value 0 for a non-event. The inverse link function maps from the scale of the linear predictor to the scale of the mean. For example, for the logit link (the default),

You can control which binary outcome in your data is modeled as the event by specifying the response-options in the MODEL statement, and you can choose the link function by specifying the LINK= option in the MODEL statement.

If a WEIGHT statement is specified and denotes the weight for the current observation, the log-likelihood function is computed as

#### Binomial Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th binomial observation as

where and are the values of the events and trials of the th observation, respectively. measures the probability of events (successes) in the underlying Bernoulli distribution whose aggregate follows the binomial distribution.

#### Gamma Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

For the gamma distribution, is the estimated dispersion parameter that is displayed in the output.

#### Inverse Gaussian Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

where is the dispersion parameter.

#### Multinomial Distribution

The multinomial distribution that is modeled by the HPGENSELECT procedure is a generalization of the binary distribution; it is the distribution of a single draw from a discrete distribution with possible values. The log-likelihood function for the th observation is

In this expression, denotes the number of response categories (the number of possible outcomes) and is the probability that the th observation takes on the response value that is associated with category . The category probabilities must satisfy

and the constraint is satisfied by modeling categories. In models that have ordered response categories, the probabilities are expressed in cumulative form, so that the last category is redundant. In generalized logit models (multinomial models that have unordered categories), one category is chosen as the reference category and the linear predictor in the reference category is set to 0.

#### Negative Binomial Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

where k is the negative binomial dispersion parameter that is displayed in the output.

#### Normal Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

where is the dispersion parameter.

#### Poisson Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

#### Tweedie Distribution

The Tweedie distribution does not in general have a closed form log-likelihood function in terms of the mean, dispersion, and power parameters. The form of the log likelihood is

where

and is the Tweedie probability distribution, which is described in the section Tweedie Distribution. Evaluation of the Tweedie log-likelihood for model fitting is performed numerically as described in Dunn and Smyth (2005, 2008).

##### Quasi-likelihood

The extended quasi-likelihood (EQL) is constructed according to the definition of McCullagh and Nelder (1989, Chapter 9) as

where the contribution from an observation is

where . This EQL is used in computing initial values for the iterative maximization of the Tweedie log likelihood, as specified using the OPTMETHOD= Tweedie option in Table 7.5. If you specify the OPTMETHOD=EQL Tweedie-optimization-option in Table 7.5, then the parameter estimates are computed by using the EQL instead of the log likelihood.

#### Zero-Inflated Negative Binomial Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as

where k is the zero-inflated negative binomial dispersion parameter that is displayed in the output.

#### Zero-Inflated Poisson Distribution

The HPGENSELECT procedure computes the log-likelihood function for the th observation as