Time and CLASS Variables Usage |
The following DATA step creates an artificial data set, Test, to be used in this section. There are six variables in Test: the variable T contains the failure times; the variable Status is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time; the variable A is a categorical variable with values 1, 2, and 3 representing three different categories; the variable MirrorT is an exact copy of T; the variable W is the observation weight; and the variable S is the strata indicator.
data Test; input T Status A W S @@; MirrorT = T; datalines; 23 1 1 10 1 7 0 1 20 2 23 1 1 10 1 10 1 1 20 2 20 0 1 10 1 13 0 1 20 2 24 1 1 10 1 10 1 1 20 2 18 1 2 10 1 6 1 2 20 2 18 0 2 10 1 6 1 2 20 2 13 0 2 10 1 13 1 2 20 2 9 0 2 10 1 15 1 2 20 2 8 1 3 10 1 6 1 3 20 2 12 0 3 10 1 4 1 3 20 2 11 1 3 10 1 8 1 1 20 2 6 1 3 10 1 7 1 3 20 2 7 1 3 10 1 12 1 3 20 2 9 1 2 10 1 15 1 2 20 2 3 1 2 10 1 14 0 3 20 2 6 1 1 10 1 13 1 2 20 2 ;
The time variable cannot be used explicitly as an explanatory effect in the MODEL statement. The following statements produce an error message:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=T*A; run;
To use the time variable as an explanatory effect, replace T by MirrorT as an effect, which is an exact copy of T, as in the following statements:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A*MirrorT; run;
Note that neither T*A nor MirrorT*A in the MODEL statement is time-dependent. The results of fitting this model are shown in Figure 89.3.
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
MirrorT*A 1 | 30 | -17.560700 | 57689160 | -0.00 | 1.0000 | 0.000 |
MirrorT*A 2 | 30 | -17.424235 | 57689159 | -0.00 | 1.0000 | 0.000 |
MirrorT*A 3 | 30 | -17.448673 | 57689160 | -0.00 | 1.0000 | 0.000 |
In PROC SURVEYPHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected by user-supplied programming statements. Consider the following statements, which produce the results in Figure 89.4. Variable A is declared as a CLASS variable in the CLASS statement.
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; run;
Figure 89.4 shows the parameters that correspond to A and their respective regression coefficients estimates.
Class Level Information | ||
---|---|---|
Class | Levels | Values |
A | 3 | 1 2 3 |
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 30 | -1.162184 | 0.655136 | -1.77 | 0.0862 | 0.313 |
A 2 | 30 | -0.616962 | 0.521841 | -1.18 | 0.2464 | 0.540 |
A 3 | 30 | 0 | . | . | . | 1.000 |
Now consider the programming statement that attempts to change the value of the CLASS variable A as in the following specification:
proc surveyphreg data=Test; weight W; strata S; class A; model T*Status(0)=A; if A=3 then A=2; run;
Results of this analysis are shown in Figure 89.5 and are identical to those in Figure 89.4. The if A=3 then A=2 programming statement has no effect on the explanatory variable for A, which have already been determined.
Class Level Information | ||
---|---|---|
Class | Levels | Values |
A | 3 | 1 2 3 |
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 30 | -1.162184 | 0.655136 | -1.77 | 0.0862 | 0.313 |
A 2 | 30 | -0.616962 | 0.521841 | -1.18 | 0.2464 | 0.540 |
A 3 | 30 | 0 | . | . | . | 1.000 |
Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:
proc surveyphreg data=Test; class A; model T*Status(0)=A X; X=T*A; run;
The CLASS variable A generates two design variables as explanatory variables. The variable X created by the X=T*A programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in the data, not the dummy coded values that represent A. In the data set Test, A has the values of 1, 2, and 3, and these values are multiplied by the values of T to produce X. If A were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X=T*A would have produced an error in the attempt to multiply a number with a character value.
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | 0.158010 | 95.546316 | 0.00 | 0.9987 | 1.171 |
A 2 | 31 | 0.008993 | 43.630439 | 0.00 | 0.9998 | 1.009 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X | 31 | 0.092679 | 5.905522 | 0.02 | 0.9876 | 1.097 |
The following statements are not the same as in the preceding program. If you want to create time-dependent covariates from the values of a CLASS variable, you could use syntax like the following:
proc surveyphreg data=Test; class A; model T*Status(0)=A X1 X2; X1= T*(A=1); X2= T*(A=2); run;
The Boolean parenthetical expressions (A=1) and (A=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.
Results of this test are shown in Figure 89.7.
Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | t Value | Pr > |t| | Hazard Ratio |
A 1 | 31 | -0.007655 | 5.411713 | -0.00 | 0.9989 | 0.992 |
A 2 | 31 | -0.881383 | 4.263923 | -0.21 | 0.8376 | 0.414 |
A 3 | 31 | 0 | . | . | . | 1.000 |
X1 | 31 | -0.155220 | 0.602329 | -0.26 | 0.7983 | 0.856 |
X2 | 31 | 0.011554 | 0.454220 | 0.03 | 0.9799 | 1.012 |
In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable.