The following DATA step creates an artificial data set, Test
, to be used in this section. There are six variables in Test
: the variable T
contains the failure times; the variable Status
is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time;
the variable A
is a categorical variable with values 1, 2, and 3 representing three different categories; the variable MirrorT
is an exact copy of T
; the variable W
is the observation weight; and the variable S
is the strata indicator.
data Test; input T Status A W S @@; MirrorT = T; datalines; 23 1 1 10 1 7 0 1 20 2 23 1 1 10 1 10 1 1 20 2 20 0 1 10 1 13 0 1 20 2 24 1 1 10 1 10 1 1 20 2 18 1 2 10 1 6 1 2 20 2 18 0 2 10 1 6 1 2 20 2 13 0 2 10 1 13 1 2 20 2 9 0 2 10 1 15 1 2 20 2 8 1 3 10 1 6 1 3 20 2 12 0 3 10 1 4 1 3 20 2 11 1 3 10 1 8 1 1 20 2 6 1 3 10 1 7 1 3 20 2 7 1 3 10 1 12 1 3 20 2 9 1 2 10 1 15 1 2 20 2 3 1 2 10 1 14 0 3 20 2 6 1 1 10 1 13 1 2 20 2 ;
The time variable cannot be used explicitly as an explanatory effect in the MODEL statement. The following statements produce an error message:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=T*A; run;
To use the time variable as an explanatory effect, replace T
by MirrorT
as an effect, which is an exact copy of T
, as in the following statements:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A*MirrorT; run;
Note that neither T
*A
nor MirrorT
*A
in the MODEL statement is time-dependent. The results of fitting this model are shown in Figure 113.3.
Figure 113.3: T
*A
Effect
In PROC SURVEYPHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected
by user-supplied programming statements. Consider the following statements, which produce the results in Figure 113.4. Variable A
is declared as a CLASS variable in the CLASS statement.
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; run;
Figure 113.4 shows the parameters that correspond to A
and their respective regression coefficients estimates.
Figure 113.4: Design Variable and Regression Coefficient Estimates
Now consider the programming statement that attempts to change the value of the CLASS variable A
as in the following specification:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 113.5 and are identical to those in Figure 113.4. The if A=3 then A=2
programming statement has no effect on the explanatory variable for A, which have already been determined.
Figure 113.5: Design Variable and Regression Coefficient Estimates
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc surveyphreg data=Test; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A
generates two design variables as explanatory variables. The variable X
created by the X=T*A
programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in
the data, not the dummy coded values that represent A
. In the data set Test
, A
has the values of 1, 2, and 3, and these values are multiplied by the values of T
to produce X
. If A
were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X
=T
*A
would have produced an error in the attempt to multiply a number with a character value.
Figure 113.6: Single Time-Dependent Variable X*A
The following statements are not the same as in the preceding program. If you want to create time-dependent covariates from the values of a CLASS variable, you could use syntax like the following:
proc surveyphreg data=Test; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A
=1) and (A
=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 113.7.
Figure 113.7: Simple Test of Proportional Hazards Assumption
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | -0.007655 | 1.221122 | -0.01 | 0.9950 | 0.992 |
A 2 | 31 | -0.881383 | 1.743507 | -0.51 | 0.6168 | 0.414 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X1 | 31 | -0.155220 | 0.164334 | -0.94 | 0.3522 | 0.856 |
X2 | 31 | 0.011554 | 0.188932 | 0.06 | 0.9516 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable.