The purpose of the SUBJECT= option in the REPEATED statement of PROC GENMOD is simply to distinguish those observations that are correlated from those that aren't. That is, it defines the clusters of correlated observations. Observations with the same value of the SUBJECT= effect belong to the same cluster and are assumed to be correlated. The SUBJECT= option does not determine the correlation structure of the correlated measurements — that's what the TYPE= option does. And it has no effect on the model structure — that's done entirely in the MODEL statement.
For example, in a study of physicians, if physicians are considered independent and multiple measurements are obtained from each in the form of a measurement on each of several of their patients, then observations within physicians are correlated and observations from different physicians are not. Therefore, you need to specify SUBJECT=PHYSICIAN, where PHYSICIAN is a variable in the data set that uniquely identifies each physician.
Now suppose only a set of patients from one physician is observed. If patients are considered independent and if each patient is repeatedly measured (such as over time), then you need to specify SUBJECT=PATIENT, where PATIENT is a variable that uniquely identifies each patient.
If repeated measurements are taken on each patient, and if several patients from many physicians are observed, then all observations within a given physician are correlated and you would again specify SUBJECT=PHYSICIAN. You might believe that measurements within a patient are more correlated than observations from different patients of the same physician and that SUBJECT=PATIENT(PHYSICIAN) should be specified, but this is describing the correlation structure of the correlated measurements. The SUBJECT= effect only needs to distinguish the correlated observations from the uncorrelated. See below for more on nested correlation structures.
SUBJECT=PATIENT(PHYSICIAN) specifies that patients are independent, even if from the same physician, because observations from different patients of the same physician will have different values of PATIENT(PHYSICIAN). If this is the case, then you could equivalently specify SUBJECT=PATIENT if all patients in the study have unique values of the PATIENT variable. You would only need to specify SUBJECT=PATIENT(PHYSICIAN) if the PATIENT variable does not uniquely identify patients throughout the data set. If the variable only identifies patients uniquely within a physician and reuses the values for the other physicians (for example, if patients are numbered 1, 2, 3, ... within every physician), then you would specify SUBJECT=PATIENT(PHYSICIAN) or SUBJECT=PATIENT*PHYSICIAN. One of these effects involving both the physician and patient identifiers is needed in this situation to uniquely identify the patients (clusters). In this case, SUBJECT=PATIENT groups together as a cluster all measurements for the several patients with PATIENT=1. The same would happen for all PATIENT=2 patients, and PATIENT=3 patients, and so on.
Nested correlation structures
The correlation structures available with the GEE method in GENMOD (specified by the TYPE= option in the REPEATED statement) do not include any nested structures such as patients within physicians within hospitals. However, the GEE method is robust to the choice of correlation structure. So, even if you select a structure that doesn't match the true structure, you still get statistically consistent estimators. As a result, you can use the method without needing to precisely define the nested correlation structure. Many analysts routinely use simple structures such as TYPE=IND or TYPE=EXCH.
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.