Data to be analyzed by PROC CATMOD must be in a SAS data set containing one of the following:
raw data values (variable values for every subject)
frequency counts and the corresponding variable values
response function values and their covariance matrix
If you specify a WEIGHT statement, then PROC CATMOD uses the values of the WEIGHT variable as the frequency counts. If the READ function is specified in the RESPONSE statement, then the procedure expects the input data set to contain the values of response functions and their covariance matrix. Otherwise, PROC CATMOD assumes that the SAS data set contains raw data values.
If you use raw data, PROC CATMOD first counts the number of observations having each combination of values for all variables
specified in the MODEL or POPULATION
statement. For example, suppose the variables A
and B
each take on the values 1 and 2, and their frequencies can be represented as follows:
|
|||
---|---|---|---|
1 |
2 |
||
|
1 |
2 |
1 |
2 |
3 |
1 |
The SAS data set Raw
containing the raw data might be as follows:
Observation |
|
|
---|---|---|
1 |
1 |
1 |
2 |
1 |
1 |
3 |
1 |
2 |
4 |
1 |
2 |
5 |
1 |
2 |
6 |
2 |
1 |
7 |
2 |
2 |
And the statements for PROC CATMOD are as follows:
proc catmod data=Raw; model A=B; run;
For discussions of how to handle structural and random zeros with raw data as input data, see the section Zero Frequencies and Example 32.5.
If your data set contains frequency counts, then use the WEIGHT
statement to specify the variable containing the frequencies. For example, you could create and analyze the Summary
data set as follows:
data Summary; input A B Count; datalines; 1 1 2 1 2 3 2 1 1 2 2 1 ;
proc catmod data=Summary; weight Count; model A=B; run;
The data set Summary
can also be created from the data set Raw
by using the FREQ procedure:
proc freq data=Raw; tables A*B / out=Summary; run;
If you want to read in the response functions and their covariance matrix, rather than have PROC CATMOD compute them, create
a TYPE=EST data set. In addition to having one variable name for each function, the data set should have two additional variables:
_TYPE_
and _NAME_
, both character variables of length 8. The variable _TYPE_
should have the value ’PARMS’ when the observation contains the response functions; it should have the value ’COV’ when the
observation contains elements of the covariance matrix of the response functions. The variable _NAME_
is used only when _TYPE_
=COV, in which case it should contain the name of the variable that has its covariance elements stored in that observation.
In the following data set, for example, the covariance between the second and fourth response functions is 0.000102:
data direct(type=est); input b1-b4 _type_ $ _name_ $8.; datalines; 0.590463 0.384720 0.273269 0.136458 PARMS . 0.001690 0.000911 0.000474 0.000432 COV B1 0.000911 0.001823 0.000031 0.000102 COV B2 0.000474 0.000031 0.001056 0.000477 COV B3 0.000432 0.000102 0.000477 0.000396 COV B4 ;
In order to tell PROC CATMOD that the input data set contains the values of response functions and their covariance matrix, do the following:
For example, suppose the response functions correspond to four populations that represent the cross-classification of two age groups by two race groups. You can use the FACTORS statement to identify these two factors and to name the effects in the model. The following statements are required to fit a main-effects model to these data:
proc catmod data=direct; response read b1-b4; model _f_=_response_; factors age 2, race 2 / _response_=age race; run;