Iteratively Reweighted Least Squares Algorithm (Fisher Scoring)

Let Y be the response variable that takes values $1, \ldots , k, k+1$ $(k\ge 1)$. Let j index all observations and $Y_ j$ be the value of response for the jth observation. Consider the multinomial variable $\mb {Z}_ j=(Z_{1j}, \ldots ,Z_{kj})’$ such that

\begin{eqnarray*}  Z_{ij}= &  \left\{  \begin{array}{ll} 1 &  \mbox{if } Y_ j=i \\ 0 &  \mbox{otherwise} \end{array} \right. \\ \end{eqnarray*}

and $Z_{(k+1)j}=1-\sum _{i=1}^ k Z_{ij}$. With $\pi _{ij}$ denoting the probability that the jth observation has response value i, the expected value of $\mb {Z}_ j$ is $\bpi _ j=(\pi _{1j}, \ldots ,\pi _{kj})^\prime $, and $\pi _{(k+1)j}=1-\sum _{i=1}^ k\pi _{ij}$. The covariance matrix of $\mb {Z}_ j$ is $\mb {V}_ j$, which is the covariance matrix of a multinomial random variable for one trial with parameter vector $\bpi _ j$. Let $\btheta $ be the vector of regression parameters—for example, $\btheta = (\alpha _1, \ldots , \alpha _ k, \bbeta ’)’$ for cumulative logit model. Let $\mb {D}_ j$ be the matrix of partial derivatives of $\bpi _ j$ with respect to $\btheta $. The estimating equation for the regression parameters is

\[  \sum _ j\mb {D}’_ j\mb {W}_ j(\mb {Z}_ j-\bpi _ j)=\mb {0}  \]

where $ {\bW }_ j=w_ j f_ j {\bV }_ j^{-1}$, and $w_ j$ and $f_ j$ are the WEIGHT and FREQ values of the jth observation.

With a starting value of $\btheta ^{(0)}$, the pseudo-estimate of $\btheta $ is obtained iteratively as

\[  \btheta ^{(i+1)}=\btheta ^{(i)}+(\sum _ j \mb {D}’_ j\mb {W}_ j\mb {D}_ j)^{-1} \sum _ j\mb {D}’_ j\mb {W}_ j(\mb {Z}_ j-\bpi _ j)  \]

where $\mb {D}_ j$, $\mb {W}_ j$, and $\bpi _ j$ are evaluated at the ith iteration $\btheta ^{(i)}$. The expression after the plus sign is the step size. If the log likelihood evaluated at $\btheta ^{(i+1)}$ is less than that evaluated at $\btheta ^{(i)}$, then $\btheta ^{(i+1)}$ is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained—that is, until $\btheta ^{(i+1)}$ is sufficiently close to $\btheta ^{(i)}$. Then the maximum likelihood estimate of $\btheta $ is $\hat{\btheta }=\btheta ^{(i+1)}$.

By default, starting values are zero for the slope parameters, and starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response) for the intercept parameters. Alternatively, the starting values can be specified with the INEST= option in the PROC SURVEYLOGISTIC statement.