Input Data Sets

Data to be analyzed by PROC CATMOD must be in a SAS data set containing one of the following:

  • raw data values (variable values for every subject)

  • frequency counts and the corresponding variable values

  • response function values and their covariance matrix

If you specify a WEIGHT statement, then PROC CATMOD uses the values of the WEIGHT variable as the frequency counts. If the READ function is specified in the RESPONSE statement, then the procedure expects the input data set to contain the values of response functions and their covariance matrix. Otherwise, PROC CATMOD assumes that the SAS data set contains raw data values.

Raw Data Values

If you use raw data, PROC CATMOD first counts the number of observations having each combination of values for all variables specified in the MODEL or POPULATION statement. For example, suppose the variables A and B each take on the values 1 and 2, and their frequencies can be represented as follows:

   

A

   

1

2

B

1

2

1

 

2

3

1

The SAS data set Raw containing the raw data might be as follows:

Observation

A

B

1

1

1

2

1

1

3

1

2

4

1

2

5

1

2

6

2

1

7

2

2

And the statements for PROC CATMOD are as follows:

proc catmod data=Raw;
   model A=B;
run;

For discussions of how to handle structural and random zeros with raw data as input data, see the section Zero Frequencies and Example 29.5.

Frequency Counts

If your data set contains frequency counts, then use the WEIGHT statement to specify the variable containing the frequencies. For example, you could create and analyze the Summary data set as follows:

data Summary;
   input A B Count;
   datalines;
1 1 2
1 2 3
2 1 1
2 2 1
;
proc catmod data=Summary;
   weight Count;
   model A=B;
run;

The data set Summary can also be created from the data set Raw by using the FREQ procedure:

proc freq data=Raw;
   tables A*B / out=Summary;
run;

Inputting Response Functions and Covariances Directly

If you want to read in the response functions and their covariance matrix, rather than have PROC CATMOD compute them, create a TYPE=EST data set. In addition to having one variable name for each function, the data set should have two additional variables: _TYPE_ and _NAME_, both character variables of length 8. The variable _TYPE_ should have the value ’PARMS’ when the observation contains the response functions; it should have the value ’COV’ when the observation contains elements of the covariance matrix of the response functions. The variable _NAME_ is used only when _TYPE_=COV, in which case it should contain the name of the variable that has its covariance elements stored in that observation. In the following data set, for example, the covariance between the second and fourth response functions is 0.000102:

data direct(type=est);
   input b1-b4 _type_ $ _name_ $8.;
   datalines;
0.590463   0.384720   0.273269   0.136458   PARMS     .
0.001690   0.000911   0.000474   0.000432   COV       B1
0.000911   0.001823   0.000031   0.000102   COV       B2
0.000474   0.000031   0.001056   0.000477   COV       B3
0.000432   0.000102   0.000477   0.000396   COV       B4
;

In order to tell PROC CATMOD that the input data set contains the values of response functions and their covariance matrix, do the following:

  • specify the READ function in the RESPONSE statement

  • specify _F_ as the dependent variable in the MODEL statement

For example, suppose the response functions correspond to four populations that represent the cross-classification of two age groups by two race groups. You can use the FACTORS statement to identify these two factors and to name the effects in the model. The following statements are required to fit a main-effects model to these data:

proc catmod data=direct;
   response read b1-b4;
   model _f_=_response_;
   factors age 2, race 2 / _response_=age race;
run;