The PHREG Procedure |
Clarification of the Time and CLASS Variables Usage |
The following DATA step creates an artificial data set, Foo, to be used in this section. There are four variables in Foo: the variable T contains the failure times; the variable Status is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time; the variable A is a categorical variable with values 1, 2, and 3 representing three different categories; and the variable MirrorT is an exact copy of T.
Data Foo; input T Status A @@; MirrorT = T; datalines; 23 1 1 7 0 1 23 1 1 10 1 1 20 0 1 13 0 1 24 1 1 10 1 1 18 1 2 6 1 2 18 0 2 6 1 2 13 0 2 13 1 2 9 0 2 15 1 2 8 1 3 6 1 3 12 0 3 4 1 3 11 1 3 8 1 1 6 1 3 7 1 3 7 1 3 12 1 3 9 1 2 15 1 2 3 1 2 14 0 3 6 1 1 13 1 2 ;
When the time variable is explicitly used in an explanatory effect in the MODEL statement, the effect is NOT time dependent. In the following specification, T is the time variable, but T does not play the role of the time variable in the explanatory effect T*A:
proc phreg data=Foo; class A; model T*Status(0)=T*A; run;
The parameter estimates of this model are shown in Figure 64.12.
To verify that the effect T*A in the MODEL statement is not time dependent, T is replaced by MirrorT, which is an exact copy of T, as in the following statements:
proc phreg data=Foo; class A; model T*Status(0)=A*MirrorT; run;
The results of fitting this model (Figure 64.13) are identical to those of the previous model (Figure 64.12), except for the parameter names and labels. The effect A*MirrorT is not time dependent, so neither is A*T.
In PROC PHREG, the levelization of CLASS variables is determined by the CLASS statement and the input data and is not affected by user-supplied programming statements. Consider the following statements, which produce the results in Figure 64.14. Variable A is declared as a CLASS variable in the CLASS statement. By default, the reference parameterization is used with A=3 as the reference level. Two regression coefficients are estimated for the two dummy variables of A.
proc phreg data=Foo; class A; model T*Status(0)=A; run;
Figure 64.14 shows the dummy variables of A and the regression coefficients estimates.
Now consider the programming statement that attempts to change the value of the CLASS variable A as in the following specification:
proc phreg data=Foo; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 64.15 and are identical to those in Figure 64.14. The if A=3 then A=2 programming statement has no effects on the design variables for A, which have already been determined.
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc phreg data=Foo; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A generates two design variables as explanatory variables. The variable X created by the X=T*A programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in the data, not the dummy coded values that represent A. In data set Foo, A assumes the values of 1, 2, and 3, and these are the exact values that are used in producing X. If A were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X=T*A would have produced an error in the attempt to multiply a number with a character value.
To generalize the simple test of proportional hazard assumption for the design variables of A (as in the section the Classical Method of Maximum Likelihood), you specify the following statements, which are not the same as in the preceding program or the specification in the section Time Variable on the Right Side of the MODEL Statement:
proc phreg data=Foo; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A=1) and (A=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 64.17.
Analysis of Maximum Likelihood Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | DF | Parameter Estimate |
Standard Error |
Chi-Square | Pr > ChiSq | Hazard Ratio |
Label | |
A | 1 | 1 | -0.00766 | 1.69435 | 0.0000 | 0.9964 | 0.992 | A 1 |
A | 2 | 1 | -0.88132 | 1.64298 | 0.2877 | 0.5917 | 0.414 | A 2 |
X1 | 1 | -0.15522 | 0.20174 | 0.5920 | 0.4417 | 0.856 | ||
X2 | 1 | 0.01155 | 0.18858 | 0.0037 | 0.9512 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable. Alternatively, you might consider using the counting-process style of input where you break up the covariate history of an individual into a number of records with nonoverlapping start and stop times and declare the categorical time-dependent variable in the CLASS statement.
Copyright © SAS Institute, Inc. All Rights Reserved.