Poisson Regression |

The **SHIP** data shown in Figure 17.2 represent damage caused by waves to the forward section of certain cargo-carrying vessels. The purpose of the investigation was to set standards for future hull construction. In order to do so, the investigators needed to know the risk of damage associated with five ship types (**TYPE**), year of construction (**YEAR**), and period of operation (**PERIOD**). These three variables are the classification variables. **MONTHS** is the aggregate number of months in service and is an explanatory variable. **Y** is the response variable and represents the number of damage incidents (McCullagh and Nelder 1989).

**Figure 17.2:** **SHIP** Data Set

Recall from Chapter 16 that the generalized linear model has three basic components:

- a linear function of explanatory variables. For this example, the function is
_{1}is believed to be 1. An effect such as this is commonly referred to as an*offset*. is the effect of the*i*th level of**TYPE**, is the effect of the*j*th level of**YEAR**, is the effect of the*k*th level of**PERIOD**, is the effect of the*ij*th level of the**TYPE**by**YEAR**interaction, is the effect of the*ik*th level of the**TYPE**by**PERIOD**interaction, and is the effect of the*jk*th level of the**YEAR**by**PERIOD**interaction. - a probability function for the response variable that depends on the mean and sometimes other parameters as well. For this example, the probability function of the response variable is Poisson.
- a link function that relates the mean to the linear function of explanatory variables. For this example, the link function is the log

- <<I>br>log( expected number of damage incidents)

Open the SHIP data set. |

Recall from the previous equation that **Y** is assumed to be directly proportional to **MONTHS**. Since *log*(*Y*) is being modeled, you need to carry out a log transformation on **MONTHS**. Follow these steps to create a new variable that represents the log of **MONTHS**.

Select MONTHS in the data window. |

Choose Edit:Variables:log( Y ). |

**Figure 17.3:** Edit:Variables Menu

A new variable, **L_MONTHS**, now appears in the data window.

**Figure 17.4:** Data Window with **L_MONTHS** Added

Deselect L_MONTHS in the data window. |

Some values of **MONTHS** are **0**, meaning that this kind of ship has not seen service. You need to restrict these observations from entering into the model fit. The log transformation does this automatically since *log*(**MONTHS**) becomes a missing value for the observations with a value of **0** for **MONTH**. Observations with missing values for the explanatory variables or the response variable are not used in the model fit.

Now you are ready to begin the analysis.

Choose Analyze:Fit ( Y X ) to display the fit variables dialog |

Select Y in the list at the left, then click the Y button. |

**Y** appears in the **Y** variables list.

Select TYPE, YEAR, and PERIOD, then click the Expand button. |

**TYPE, YEAR**, and **PERIOD**, along with all two-way interaction effects, appear in the **X** variables list. Your variables dialog should now appear as shown in Figure 17.5.

**Figure 17.5:** Fit Variables Dialog with Variable Roles Assigned

The **Expand** button provides a convenient way to specify interactions of any order. The order **2** is the default. You can change the order by entering a different value to replace the **2** or by clicking on the buttons to the right or left of the **2** to increase or decrease the order, respectively.

Click the Method button to display the fit method dialog |

This dialog enables you to specify the probability function or the quasi-likelihood function for the response variable and the link function.

*Overdispersion* is a phenomenon that occurs occasionally with binomial and Poisson data. For Poisson data, it occurs when the variance of the response Y exceeds the Poisson variance Var(y)=.To account for the overdispersion that might occur in the **SHIP** data set, a quasi-likelihood function with variance function Var()= (Poisson variance) will be used for the response variable. The variance is given by

where ^{2} is the dispersion parameter with value greater than 1 for overdispersion.

Select the check box for Quasi-Likelihood. |

Click on Poisson under Response Dist. |

This uses the Poisson variance function Var() = for the quasi-likelihood function.

Click on Pearson under Scale. |

This uses the scale parameter based on the Pearson ^{2} statistic.

Select L_MONTHS in the list at the left, then click the Offset button. |

**L_MONTHS** appears in the **Offset** variables list. Your method dialog should now appear as shown in Figure 17.6.

**Figure 17.6:** Fit Method Dialog

It is not necessary to specify a **Link Function**. **Canonical** is the default and allows

**Fit ( Y X )** to choose an appropriate link. For this example, it is equivalent to choosing **Log** as the **Link Function**.

Click the OK button to close both dialogs and display the analysis. |

**Figure 17.7:** Fit Window

By default, the window includes many tables, but only a few are shown in Figure 17.7. These tables are described in the following sections. For more information about the other tables and graphs in the window, see Chapter 39, "Fit Analyses."

Note |
A warning message -The negative of the Hessian is not positive definite. The convergence is questionable -appears when the specified model does not converge, as in this example. The output tables, graphs, and variables are based on the results from the last iteration. |

Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.