The following DATA step creates an artificial data set, Test
, to be used in this section. There are four variables in Test
: the variable T
contains the failure times; the variable Status
is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time;
the variable A
is a categorical variable with values 1, 2, and 3 representing three different categories; and the variable MirrorT
is an exact copy of T
.
data Test; input T Status A @@; MirrorT = T; datalines; 23 1 1 7 0 1 23 1 1 10 1 1 20 0 1 13 0 1 24 1 1 10 1 1 18 1 2 6 1 2 18 0 2 6 1 2 13 0 2 13 1 2 9 0 2 15 1 2 8 1 3 6 1 3 12 0 3 4 1 3 11 1 3 8 1 1 6 1 3 7 1 3 7 1 3 12 1 3 9 1 2 15 1 2 3 1 2 14 0 3 6 1 1 13 1 2 ;
When the time variable is explicitly used in an explanatory effect in the MODEL statement, the effect is not time-dependent. In the following specification, T
is the time variable, but T
does not play the role of the time variable in the explanatory effect T
*A
:
proc phreg data=Test; class A; model T*Status(0)=T*A; run;
The parameter estimates of this model are shown in Figure 73.12.
To verify that the effect T
*A
in the MODEL statement is not time-dependent, T
is replaced by MirrorT
, which is an exact copy of T
, as in the following statements:
proc phreg data=Test; class A; model T*Status(0)=A*MirrorT; run;
The results of fitting this model (Figure 73.13) are identical to those of the previous model (Figure 73.12), except for the parameter names and labels. The effect A
*MirrorT
is not time-dependent, so neither is A
*T
.
In PROC PHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected
by user-supplied programming statements. Consider the following statements, which produce the results in Figure 73.14. Variable A
is declared as a CLASS variable in the CLASS statement. By default, the reference parameterization is used with A
=3 as the reference level. Two regression coefficients are estimated for the two dummy variables of A
.
proc phreg data=Test; class A; model T*Status(0)=A; run;
Figure 73.14 shows the dummy variables of A
and the regression coefficients estimates.
Now consider the programming statement that attempts to change the value of the CLASS variable A
as in the following specification:
proc phreg data=Test; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 73.15 and are identical to those in Figure 73.14. The if A=3 then A=2
programming statement has no effects on the design variables for A, which have already been determined.
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc phreg data=Test; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A
generates two design variables as explanatory variables. The variable X
created by the X=T*A
programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in
the data, not the dummy-coded values that represent the levels of A
. In data set Test
, A
assumes the values of 1, 2, and 3, and these are the exact values that are used in producing X
. If A
were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X
=T
*A
would have produced an error in the attempt to multiply a number with a character value.
To generalize the simple test of proportional hazard assumption for the design variables of A (as in the section the Classical Method of Maximum Likelihood), you specify the following statements, which are not the same as in the preceding program or as in the specification in the section Time Variable on the Right Side of the MODEL Statement:
proc phreg data=Test; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A
=1) and (A
=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 73.17.
Figure 73.17: Simple Test of Proportional Hazards Assumption
Analysis of Maximum Likelihood Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | DF | Parameter Estimate |
Standard Error |
Chi-Square | Pr > ChiSq | Hazard Ratio |
Label | |
A | 1 | 1 | -0.00766 | 1.69435 | 0.0000 | 0.9964 | 0.992 | A 1 |
A | 2 | 1 | -0.88132 | 1.64298 | 0.2877 | 0.5917 | 0.414 | A 2 |
X1 | 1 | -0.15522 | 0.20174 | 0.5920 | 0.4417 | 0.856 | ||
X2 | 1 | 0.01155 | 0.18858 | 0.0037 | 0.9512 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable. Alternatively, you might consider using the counting-process style of input where you break up the covariate history of an individual into a number of records with nonoverlapping start and stop times and declare the categorical time-dependent variable in the CLASS statement.