Input Data Sets |
Data to be analyzed by PROC CATMOD must be in a SAS data set containing one of the following:
raw data values (variable values for every subject)
frequency counts and the corresponding variable values
response function values and their covariance matrix
If you specify a WEIGHT statement, then PROC CATMOD uses the values of the WEIGHT variable as the frequency counts. If the READ function is specified in the RESPONSE statement, then the procedure expects the input data set to contain the values of response functions and their covariance matrix. Otherwise, PROC CATMOD assumes that the SAS data set contains raw data values.
If you use raw data, PROC CATMOD first counts the number of observations having each combination of values for all variables specified in the MODEL or POPULATION statement. For example, suppose the variables A and B each take on the values 1 and 2, and their frequencies can be represented as follows:
A |
|||
---|---|---|---|
1 |
2 |
||
B |
1 |
2 |
1 |
2 |
3 |
1 |
The SAS data set Raw containing the raw data might be as follows:
Observation |
A |
B |
---|---|---|
1 |
1 |
1 |
2 |
1 |
1 |
3 |
1 |
2 |
4 |
1 |
2 |
5 |
1 |
2 |
6 |
2 |
1 |
7 |
2 |
2 |
And the statements for PROC CATMOD are as follows:
proc catmod data=Raw; model A=B; run;
For discussions of how to handle structural and random zeros with raw data as input data, see the section Zero Frequencies and Example 29.5.
If your data set contains frequency counts, then use the WEIGHT statement to specify the variable containing the frequencies. For example, you could create and analyze the Summary data set as follows:
data Summary; input A B Count; datalines; 1 1 2 1 2 3 2 1 1 2 2 1 ;
proc catmod data=Summary; weight Count; model A=B; run;
The data set Summary can also be created from the data set Raw by using the FREQ procedure:
proc freq data=Raw; tables A*B / out=Summary; run;
If you want to read in the response functions and their covariance matrix, rather than have PROC CATMOD compute them, create a TYPE=EST data set. In addition to having one variable name for each function, the data set should have two additional variables: _TYPE_ and _NAME_, both character variables of length 8. The variable _TYPE_ should have the value ’PARMS’ when the observation contains the response functions; it should have the value ’COV’ when the observation contains elements of the covariance matrix of the response functions. The variable _NAME_ is used only when _TYPE_=COV, in which case it should contain the name of the variable that has its covariance elements stored in that observation. In the following data set, for example, the covariance between the second and fourth response functions is 0.000102:
data direct(type=est); input b1-b4 _type_ $ _name_ $8.; datalines; 0.590463 0.384720 0.273269 0.136458 PARMS . 0.001690 0.000911 0.000474 0.000432 COV B1 0.000911 0.001823 0.000031 0.000102 COV B2 0.000474 0.000031 0.001056 0.000477 COV B3 0.000432 0.000102 0.000477 0.000396 COV B4 ;
In order to tell PROC CATMOD that the input data set contains the values of response functions and their covariance matrix, do the following:
specify _F_ as the dependent variable in the MODEL statement
For example, suppose the response functions correspond to four populations that represent the cross-classification of two age groups by two race groups. You can use the FACTORS statement to identify these two factors and to name the effects in the model. The following statements are required to fit a main-effects model to these data:
proc catmod data=direct; response read b1-b4; model _f_=_response_; factors age 2, race 2 / _response_=age race; run;