• Print  |
  • Feedback  |

Knowledge Base


TS-131

Some Examples of Using the IPF function in IML


IPF users have occasionally faced the following problem.  You have one or more
sets of marginal frequencies available from one source, and you want to use
these margins to adjust the frequencies in a contigency table derived from a
second source.  The IPF function requires that the marginal tables be combined
into a single "complete" contigency table (called TABLE in the IPF parameter
list).  The question is:  how do you form this "complete" table on the basis
of marginal subtables?

This document demonstrates that the IML procedure makes it easy to build a
"complete" table based on the margins of the available tables, and that IML
can then proceed in a straghtforward manner to get the ML estimates from the
IPF function.  Two examples are used to illustrate the method.

1.   The first example is a simple one in which you have 2 sets of margins,
     but no data from a second source.  (In IPF notation, there is no INIT
     matrix because there are no higher-order interactions to preserve from a
     second source of data).  Suppose you have 2 two-dimensional tables:  X*Y
     and Y*Z, and you want to infer the ML estimates of the X*Z table.

2.   The second example is the 5-variable model given in Bishop, Fienberg,
     and Holland (hereafter BF&H), pp. 107-111.  You have a 4-way table
     (SEX*SCHOOL*AGE*REGION), for which you want to preserve the given
     interactions.  You have 3 new marginal tables:  each one a cross-
     classification of a fifth variable (PARTY) with one of the original
     variables (SEX, SCHOOL, or AGE).  You want to infer the ML estimates of
     a subtable of PARTY*REGION, preserving all of the given margins in all
     of the marginal tables.


******************************************************************************


EXAMPLE 1:

The data and the corresponding analysis are on pages 3 and 4.  The program has
numerous comments and is pretty much self-explanatory, but I would note the
following:

1.   The computation of the "complete" table requires 7 lines, and uses
     nothing more complicated than the min() function.  It is done under Step
     2 in the section entitled "COMBINE THE 2 JOINT DISTRIBUTIONS INTO A 3-
     WAY TABLE." 

2.   I included a number of extra sections in the program that are not
     strictly necessary.  The only purpose of the end of Step 2 is to set up
     anm output data set so that one can verify easily (with proc FREQ in
     Step 3) that the given margins are preserved.  Clearly, one could skip
     that, and go directly to the IPF step without leaving IML.

EXAMPLE 2:

The code to input the four marginal tables is on page 5.  The IML step to
compute the fitted values is on page 7.  The results are on page 7.  Note the
following:

1.   The computation of the "complete" table requires 9 lines, and uses
     nothing more complicated than the min() function.  It is done under Step
     2 in the section entitled "COMBINE THE 3 MARGINS INTO A 3-WAY TABLE."
2.   The method of computation in Method 1 in BF&H, page 109.

3.   The TABLE matrix contains the new 1-way margins for PARTY members.  The
     IPF function specifies only these 1-way marginals so that any higher-
     order interactions inadvertently built into TABLE are not propogated
     into the fitted estimates.  Instead, the higher-order interactions are
     obtained from the INIT matrix, which is nothing more than the original
     4-way table.

4.   The results on page 7 are different from the results in BF&H, page 111. 
     This discrepancy undoubtedly arises from the following fact.  Strictly
     speaking, the method of IPF is invalidated when zero marginals must be
     propogated into the internal cells of a table that originally had
     nonzero margins.  BF&H allude to this problem in the second sentence on
     page 101, but fail to point out that this exact problem occurs in the
     current data due to the zero marginal for party members in the table of
     PARTY*SCHOOL (Table 3.7-5 in BF&H, p. 109).

     IPF is supposed to maintain the higher-order interactions of the initial
     table (in this case, the initial 4-way table).  However,  one glance at
     the estimates on page 111 makes it clear that this is not possible; any
     odds ratio comparing the first 2 levels of SCHOOL must be undefined
     because of the zeros in column 1.  The best that one can do is to
     preserve the odds ratios for those comparisons that do not involve zero
     frequencies.  The SAS IPF algorithm does that, as shown in the following
     item.

5.   To verify that IPF is preserving interactions to the extent possible
     with zero frequencies, the CATMOD procedure was used to compute the 6
     logg odds ratios for each AGE*SCHOOL table within the 4 categories of
     SEX*REGION.  This was done for (1) the original 4-way table, (2) the SAS
     IPF estimates for party members, and (3) the BF&H estimates for party
     members.  Log 0 is defined to be 0.

6.   The results are shown on page 8 (there are no BF&H estimates available
     for rural region).  The results show that the original log odds ratios
     are preserved by the SAS IPF function estimates, except for the ones
     that are restricted to be zero because of the zero marginal.  It is not
     clear how BF&H arrived at their results, which preserve none of the
     original logg-odss ratios. 

****************************************************************;
*             Example 1 for IPF                                 ;
****************************************************************;
****************************************************************;
*   Step 1: Input the data                                      ;
****************************************************************;
/*---Table of A*B---*/
data ab;
   input a b wt @@;
   cards;
1 1 8  1 2 2  2 1 4  2 2 4
;
run;

proc freq;
   weight wt;
   table a*b / list;
run;


/*---Table of C*B---*/
data cb;
   input c b wt @@;
   cards;
1 1 6  1 2 0  2 1 6  2 2 6
;
run;

proc freq;
   weight wt;
   table c*b / list;
run;


****************************************************************;
*   Step 2: Use IML to create the "complete" table              ;
****************************************************************;
proc iml;

   /*----compute joint distribution for a by b----*/
   use ab;
   read all into abyb;
   ab=shape(abyb[,3], 2, 2);

   /*----compute joint distribution for c by b---*/
   use cb;
   read all into cbyb;
   cb=shape(cbyb[,3], 2, 2);

   /*----combine the 2 joint distributions into a 3-way table----*/
   i=1;
   tab=shape(0,1,8);
   do a=1 to 2;
      do b=1 to 2;
         do c=1 to 2;
            tab[i]=min(ab[a,b], cb[c,b]);
            ab[a,b]=ab[a,b]-tab[i];
            cb[c,b]=cb[c,b]-tab[i];
            i=i+1;
         end;
      end;
   end;

   /*----set up the variable profiles and the variable names----*/
   r1={1 1 1 1 2 2 2 2};
   r2={1 1 2 2 1 1 2 2};
   r3={1 2 1 2 1 2 1 2};
   cname={ a b c wt };
   matrix=r1//r2//r3//tab;
   matrix=matrix`;

   /*---output to a sas data set---*/
   create out from matrix[colname=cname];
   append from matrix;

quit;

****************************************************************;
* Step 3: verify that "complete" table has correct 2-way margins;
****************************************************************;
proc print data=out;
run;

proc freq data=out;
   weight wt;
   table a*b c*b / list;
run;

***************************************************************;
*   Step 4: use IML to do IPF for model (ab, cb)               ;
***************************************************************;
proc iml;
   use out;
   read all into matrix;

   /*  var3     var2    var1  */
   /* -----    -----   -----  */
   /*   c        b       a    */

   table=matrix[,4];
   dim={2 2 2};
   config={1 3, 2 2};
   call ipf(fit, status, dim, table, config);

   matrix[,4]=fit;
   cname={ a b c wt };
   create fit from matrix[colname=cname];
   append from matrix;

quit;

***************************************************************;
*   Step 5: verify that the "fitted" table has correct 2-way   ;
*       margins.  Project the fitted table onto the A*C plane. ;
***************************************************************;
proc print data=fit;
run;

proc freq data=fit;
   weight wt;
   table a*b c*b a*c / list;
run;

******************************************************************;
*             Example 2 for IPF                                   ;
******************************************************************;
******************************************************************;
*   Step 1: Input the data                                        ;
******************************************************************;

data t374;
   do sex=1 to 2;
      do age=1 to 3;
         do region=1 to 2;
            do school=1 to 4;
               input wt @;
               output;
            end;
         end;
      end;
   end;
   cards;
 631571  4381454  7557356   915307  1719046  5266179  5718852  153669
1176623  4174797  5384310  1055949  2386652  4297528  3050642  316853
2376214  1807879  1187537   501741  5378665  1862132   376084   68963
 664308  2843925  9180042  1274867  2536078  4246961  5886728  232464
3825831  4481894  6873450  1070910  7181386  4653138  3089527  241507
7432953  1888644  1312793   275948 13089113  1071037   207498   24283
;
run;

/*---a table containing 1 or more old vars, & 1 or more new vars---*/
data t375;
   do party=1 to 2;
      do school=1 to 4;
         input wt @;
         output;
      end;
   end;
   cards;
       0   2290478   4580957   1367696
48398440  38685090  45243862   4764765
;
run;

data t376;
   do party=1 to 2;
      do sex=1 to 2;
         input wt @;
         output;
      end;
   end;
   cards;
 6632500   1606631
55113503  81978654
;
run;

data t377;
   do party=1 to 2;
      do age=1 to 3;
         input wt @;
         output;
      end;
   end;
   cards;
 1623109   4663348   1952674
51585698  48597649  36908810
;
run;

/*---reorder the initial data in a data set called init---*/
proc freq data=t374;
   table school*sex*age*region / noprint out=init;
   weight wt;
run;

******************************************************************;
*   Step 2: run IML to do iterative proportional fitting          ;
******************************************************************;
proc iml;
   /*---compute marginals for school (party members only)---*/
   use t375;
   read all into t375;
   schoolp=shape(t375[,3], 2, 4);
   schpy=schoolp[1,];

   /*---compute marginals for sex (party members only)---*/
   use t376;
   read all into t376;
   sexp=shape(t376[,3], 2, 2);
   sexpy=sexp[1,];

   /*---compute marginals for age (party members only)---*/
   use t377;
   read all into t377;
   agep=shape(t377[,3], 2, 3);
   agepy=agep[1,];

   /*---combine marginals into a 3-way table---*/
   i=1;
   tab=shape(0,1,24);
   do school=1 to 4;
      do sex=1 to 2;
         do age=1 to 3;
            a=min(schpy[school], sexpy[sex]);
            a=min(a,agepy[age]);
            tab[i]=a;
            i=i+1;
            schpy[school]=schpy[school]-a;
            sexpy[sex]=sexpy[sex]-a;
            agepy[age]=agepy[age]-a;
         end;
      end;
   end;

   /*---introduce region, creating a 4-way table---*/
   table=tab`@{.5, .5};

   /*---read in 1959 initial 4-way table, & adjust for new margins---*/
   use init;
   read all into mat;
   init=mat[,5];
   dim={ 2 3 2 4 };
   /*       var4  var3  var2   var1        */
   /*      ------ ----  ----  ------       */
   cname={ school  sex   age  region count };
   config={ 2 3 4 };
   call ipf(fit, status, dim, table, config, init);

   /*---output the profiles and the fitted frequencies---*/
   mat[,5]=fit;
   mat=mat[,1:5];
   create party from mat[colname=cname];
   append from mat;

quit;


******************************************************************;
*   Step 3: print the results for region=1 (urban) only          ;
******************************************************************;
options ls=71;
data ex2step3;
   set party;
   if region=1;
   output;
   drop region;
run;

proc print;
run;

               OBS    SCHOOL    SEX    AGE         COUNT

                 1       1       1      1           0.00
                 2       1       1      2           0.00
                 3       1       1      3           0.00
                 4       1       2      1           0.00
                 5       1       2      2           0.00
                 6       1       2      3           0.00
                 7       2       1      1      130144.93
                 8       2       1      2      469654.23
                 9       2       1      3      314525.54
                10       2       2      1       19866.07
                11       2       2      2      118573.93
                12       2       2      3       77271.87
                13       3       1      1      502528.35
                14       3       1      2     1355984.17
                15       3       1      3      462503.89
                16       3       2      1      143555.59
                17       3       2      2      407084.06
                18       3       2      3      120240.08
                19       4       1      1      108406.69
                20       4       1      2      473659.36
                21       4       1      3      348053.85
                22       4       2      1       35509.04
                23       4       2      2      112969.42
                24       4       2      3       45017.24





                      Example 2, Continued
           Estimated Log Odds Ratios for AGE * SCHOOL
                        By SEX and REGION

                                    Orignal        SAS IPF         BF&H
    Sex      Region       #       4-way Table     Estimates      Estimates
   ------  -----------   ----     -----------     ---------      ---------
     1         1          1        -0.67051        0.00000        0.00000
     1         1          2        -2.21028        0.00000        0.00000
     1         1          3        -0.29072       -0.29072       -0.44965
     1         1          4        -0.96541       -0.96541       -1.39497
     1         1          5         0.48197        0.48197        0.50422
     1         1          6         1.24946        1.24946        1.19495
                                                              
     1         2          1        -0.53139        0.00000        -------
     1         2          2        -2.18025        0.00000        -------
     1         2          3        -0.42515       -0.42515        -------
     1         2          4        -1.68213       -1.68213        -------
     1         2          5         1.35205        1.35205        -------
     1         2          6         1.92048        1.92048        -------
                                                         
     2         1          1        -1.29592        0.00000        0.00000
     2         1          2        -2.82426        0.00000        0.00000
     2         1          3        -0.74423       -0.74423       -0.69594
     2         1          4        -1.53555       -1.53555       -1.47673
     2         1          5         0.11503        0.11503        0.21765
     2         1          6         0.41449        0.41449        0.57172
                                                               
     2         2          1        -0.94954        0.00000        -------
     2         2          2        -3.01874        0.00000        -------
     2         2          3        -0.73602       -0.73602        -------
     2         2          4        -1.96776       -1.96776        -------
     2         2          5         0.68285        0.68285        -------
     2         2          6         1.08638        1.08638        -------