The following DATA step creates an artificial data set, Test
, to be used in this section. There are six variables in Test
: the variable T
contains the failure times; the variable Status
is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time;
the variable A
is a categorical variable with values 1, 2, and 3 representing three different categories; the variable MirrorT
is an exact copy of T
; the variable W
is the observation weight; and the variable S
is the strata indicator.
data Test; input T Status A W S @@; MirrorT = T; datalines; 23 1 1 10 1 7 0 1 20 2 23 1 1 10 1 10 1 1 20 2 20 0 1 10 1 13 0 1 20 2 24 1 1 10 1 10 1 1 20 2 18 1 2 10 1 6 1 2 20 2 18 0 2 10 1 6 1 2 20 2 13 0 2 10 1 13 1 2 20 2 9 0 2 10 1 15 1 2 20 2 8 1 3 10 1 6 1 3 20 2 12 0 3 10 1 4 1 3 20 2 11 1 3 10 1 8 1 1 20 2 6 1 3 10 1 7 1 3 20 2 7 1 3 10 1 12 1 3 20 2 9 1 2 10 1 15 1 2 20 2 3 1 2 10 1 14 0 3 20 2 6 1 1 10 1 13 1 2 20 2 ;
The time variable cannot be used explicitly as an explanatory effect in the MODEL statement. The following statements produce an error message:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=T*A; run;
To use the time variable as an explanatory effect, replace T
by MirrorT
as an effect, which is an exact copy of T
, as in the following statements:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A*MirrorT; run;
Note that neither T
*A
nor MirrorT
*A
in the MODEL statement is time-dependent. The results of fitting this model are shown in Figure 100.3.
In PROC SURVEYPHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected
by user-supplied programming statements. Consider the following statements, which produce the results in Figure 100.4. Variable A
is declared as a CLASS variable in the CLASS statement.
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; run;
Figure 100.4 shows the parameters that correspond to A
and their respective regression coefficients estimates.
Now consider the programming statement that attempts to change the value of the CLASS variable A
as in the following specification:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 100.5 and are identical to those in Figure 100.4. The if A=3 then A=2
programming statement has no effect on the explanatory variable for A, which have already been determined.
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc surveyphreg data=Test; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A
generates two design variables as explanatory variables. The variable X
created by the X=T*A
programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in
the data, not the dummy coded values that represent A
. In the data set Test
, A
has the values of 1, 2, and 3, and these values are multiplied by the values of T
to produce X
. If A
were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X
=T
*A
would have produced an error in the attempt to multiply a number with a character value.
The following statements are not the same as in the preceding program. If you want to create time-dependent covariates from the values of a CLASS variable, you could use syntax like the following:
proc surveyphreg data=Test; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A
=1) and (A
=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 100.7.
Figure 100.7: Simple Test of Proportional Hazards Assumption
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | -0.007655 | 1.284875 | -0.01 | 0.9953 | 0.992 |
A 2 | 31 | -0.881383 | 1.834533 | -0.48 | 0.6343 | 0.414 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X1 | 31 | -0.155220 | 0.172914 | -0.90 | 0.3763 | 0.856 |
X2 | 31 | 0.011554 | 0.198796 | 0.06 | 0.9540 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable.