Example 9.5 Categorical Linear Models

This example fits a linear model to a function of the response probabilities

\[  \mb {K} \log \pi = \mb {X \beta } + e  \]

where $\mb {K}$ is a matrix that compares each response category to the last. Data are from Kastenbaum and Lamphiear (1959). First, the Grizzle-Starmer-Koch (1969) approach is used to obtain generalized least squares estimates of $\mb {\beta }$. These form the initial values for the Newton-Raphson solution for the maximum likelihood estimates. The CATMOD procedure can also be used to analyze these binary data (see Cox (1970)). Here is the program.

   /* Categorical Linear Models                    */
   /* by Least Squares and Maximum Likelihood      */
   /*  CATLIN                                      */
   /*  Input:                                      */
   /*     n the s by p matrix of response counts   */
   /*     x the s by r design matrix               */

proc iml ;
start catlin;

   /*---find dimensions---*/
   s = nrow(n);                     /* number of populations */
   r = ncol(n);                       /* number of responses */
   q = r-1;                     /* number of function values */
   d = ncol(x);               /* number of design parameters */
   qd = q*d;                   /* total number of parameters */

   /*---get probability estimates---*/
   rown = n[,+];                               /* row totals */
   pr = n/(rown*repeat(1,1,r));     /* probability estimates */
   p = shape(pr[,1:q] ,0,1);     /* cut and shaped to vector */
   print "INITIAL PROBABILITY ESTIMATES" ,pr; 

      /*   estimate by the GSK method   */

      /* function of probabilities */
   f = log(p)-log(pr[,r])@repeat(1,q,1);

      /* inverse covariance of f */
   si = (diag(p)-p*p`)#(diag(rown)@repeat(1,q,q));
   z = x@i(q);                     /* expanded design matrix */
   h = z`*si*z;                      /* crossproducts matrix */
   g = z`*si*f;                              /* cross with f */
   beta = solve(h,g);              /* least squares solution */
   stderr = sqrt(vecdiag(inv(h)));        /* standard errors */
   run prob;
   print ,"GSK ESTIMATES" , beta stderr ,pi;

      /*   iterations for ML solution   */
   crit = 1;
   do it = 1 to 8 while(crit>.0005);/* iterate until converge*/

      /* block diagonal weighting  */
      si = (diag(pi)-pi*pi`)#(diag(rown)@repeat(1,q,q));
      g = z`*(rown@repeat(1,q,1)#(p-pi));        /* gradient */
      h = z`*si*z;                                /* hessian */
      delta = solve(h,g);            /* solve for correction */
      beta = beta+delta;             /* apply the correction */
      run prob;                    /* compute prob estimates */
      crit = max(abs(delta));       /* convergence criterion */
   end;
   stderr = sqrt(vecdiag(inv(h)));        /* standard errors */
   print , "ML Estimates", beta stderr, pi;
   print , "Iterations" it "Criterion" crit;
finish catlin;

   /*   subroutine to compute new prob estimates @ parameters  */
start prob;
   la = exp(x*shape(beta,0,q));
   pi = la/((1+la[,+] )*repeat(1,1,q));
   pi = shape(pi,0,1);
finish prob;

   /*---prepare frequency data and design matrix---*/
n= { 58 11 05,
     75 19 07,
     49 14 10,
     58 17 08,
     33 18 15,
     45 22 10,
     15 13 15,
     39 22 18,
     04 12 17,
     05 15 08};     /* frequency counts*/

x= { 1 1 1 0 0 0,
     1 -1 1 0 0 0,
     1 1 0 1 0 0,
     1 -1 0 1 0 0,
     1 1 0 0 1 0,
     1 -1 0 0 1 0,
     1 1 0 0 0 1,
     1 -1 0 0 0 1,
     1 1 -1 -1 -1 -1,
     1 -1 -1 -1 -1 -1};    /* design matrix*/

run catlin;

The maximum likelihood estimates are shown in Output 9.5.1.

Output 9.5.1: Maximum Likelihood Estimates

INITIAL PROBABILITY ESTIMATES

pr
0.7837838 0.1486486 0.0675676
0.7425743 0.1881188 0.0693069
0.6712329 0.1917808 0.1369863
0.6987952 0.2048193 0.0963855
0.5 0.2727273 0.2272727
0.5844156 0.2857143 0.1298701
0.3488372 0.3023256 0.3488372
0.4936709 0.278481 0.2278481
0.1212121 0.3636364 0.5151515
0.1785714 0.5357143 0.2857143

GSK ESTIMATES

beta stderr
0.9454429 0.1290925
0.4003259 0.1284867
-0.277777 0.1164699
-0.278472 0.1255916
1.4146936 0.267351
0.474136 0.294943
0.8464701 0.2362639
0.1526095 0.2633051
0.1952395 0.2214436
0.0723489 0.2366597
-0.514488 0.2171995
-0.400831 0.2285779

pi
0.7402867
0.1674472
0.7704057
0.1745023
0.6624811
0.1917744
0.7061615
0.2047033
0.516981
0.2648871
0.5697446
0.2923278
0.3988695
0.2589096
0.4667924
0.3034204
0.1320359
0.3958019
0.1651907
0.4958784

ML Estimates

beta stderr
0.9533597 0.1286179
0.4069338 0.1284592
-0.279081 0.1156222
-0.280699 0.1252816
1.4423195 0.2669357
0.4993123 0.2943437
0.8411595 0.2363089
0.1485875 0.2635159
0.1883383 0.2202755
0.0667313 0.236031
-0.527163 0.216581
-0.414965 0.2299618

pi
0.7431759
0.1673155
0.7723266
0.1744421
0.6627266
0.1916645
0.7062766
0.2049216
0.5170782
0.2646857
0.5697771
0.292607
0.3984205
0.2576653
0.4666825
0.3027898
0.1323243
0.3963114
0.165475
0.4972044

  it   crit
Iterations 3 Criterion 0.0004092