The following DATA step creates an artificial data set, Test
, to be used in this section. There are six variables in Test
: the variable T
contains the failure times; the variable Status
is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time;
the variable A
is a categorical variable with values 1, 2, and 3 representing three different categories; the variable MirrorT
is an exact copy of T
; the variable W
is the observation weight; and the variable S
is the strata indicator.
data Test; input T Status A W S @@; MirrorT = T; datalines; 23 1 1 10 1 7 0 1 20 2 23 1 1 10 1 10 1 1 20 2 20 0 1 10 1 13 0 1 20 2 24 1 1 10 1 10 1 1 20 2 18 1 2 10 1 6 1 2 20 2 18 0 2 10 1 6 1 2 20 2 13 0 2 10 1 13 1 2 20 2 9 0 2 10 1 15 1 2 20 2 8 1 3 10 1 6 1 3 20 2 12 0 3 10 1 4 1 3 20 2 11 1 3 10 1 8 1 1 20 2 6 1 3 10 1 7 1 3 20 2 7 1 3 10 1 12 1 3 20 2 9 1 2 10 1 15 1 2 20 2 3 1 2 10 1 14 0 3 20 2 6 1 1 10 1 13 1 2 20 2 ;
The time variable cannot be used explicitly as an explanatory effect in the MODEL statement. The following statements produce an error message:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=T*A; run;
To use the time variable as an explanatory effect, replace T
by MirrorT
as an effect, which is an exact copy of T
, as in the following statements:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A*MirrorT; run;
Note that neither T
*A
nor MirrorT
*A
in the MODEL statement is time-dependent. The results of fitting this model are shown in Figure 97.3.
Figure 97.3: T
*A
Effect
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
MirrorT*A 1 | 30 | -17.560699 | 0.337239 | -52.07 | <.0001 | 0.000 |
MirrorT*A 2 | 30 | -17.424235 | 0.331186 | -52.61 | <.0001 | 0.000 |
MirrorT*A 3 | 30 | -17.448672 | 0.290159 | -60.13 | <.0001 | 0.000 |
In PROC SURVEYPHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected
by user-supplied programming statements. Consider the following statements, which produce the results in Figure 97.4. Variable A
is declared as a CLASS variable in the CLASS statement.
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; run;
Figure 97.4 shows the parameters that correspond to A
and their respective regression coefficients estimates.
Figure 97.4: Design Variable and Regression Coefficient Estimates
Class Level Information | ||
---|---|---|
Class | Levels | Values |
A | 3 | 1 2 3 |
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 30 | -1.162184 | 0.655136 | -1.77 | 0.0862 | 0.313 |
A 2 | 30 | -0.616962 | 0.521841 | -1.18 | 0.2464 | 0.540 |
A 3 | 30 | 0 | . | . | . | 1.000 |
Now consider the programming statement that attempts to change the value of the CLASS variable A
as in the following specification:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 97.5 and are identical to those in Figure 97.4. The if A=3 then A=2
programming statement has no effect on the explanatory variable for A, which have already been determined.
Figure 97.5: Design Variable and Regression Coefficient Estimates
Class Level Information | ||
---|---|---|
Class | Levels | Values |
A | 3 | 1 2 3 |
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 30 | -1.162184 | 0.655136 | -1.77 | 0.0862 | 0.313 |
A 2 | 30 | -0.616962 | 0.521841 | -1.18 | 0.2464 | 0.540 |
A 3 | 30 | 0 | . | . | . | 1.000 |
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc surveyphreg data=Test; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A
generates two design variables as explanatory variables. The variable X
created by the X=T*A
programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in
the data, not the dummy coded values that represent A
. In the data set Test
, A
has the values of 1, 2, and 3, and these values are multiplied by the values of T
to produce X
. If A
were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X
=T
*A
would have produced an error in the attempt to multiply a number with a character value.
Figure 97.6: Single Time-Dependent Variable X*A
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | 0.158010 | 1.222654 | 0.13 | 0.8980 | 1.171 |
A 2 | 31 | 0.008993 | 0.674629 | 0.01 | 0.9894 | 1.009 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X | 31 | 0.092679 | 0.073746 | 1.26 | 0.2182 | 1.097 |
The following statements are not the same as in the preceding program. If you want to create time-dependent covariates from the values of a CLASS variable, you could use syntax like the following:
proc surveyphreg data=Test; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A
=1) and (A
=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 97.7.
Figure 97.7: Simple Test of Proportional Hazards Assumption
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | -0.007655 | 1.284875 | -0.01 | 0.9953 | 0.992 |
A 2 | 31 | -0.881383 | 1.834533 | -0.48 | 0.6343 | 0.414 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X1 | 31 | -0.155220 | 0.172914 | -0.90 | 0.3763 | 0.856 |
X2 | 31 | 0.011554 | 0.198796 | 0.06 | 0.9540 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable.