## Example 66.3 Modeling with Categorical Predictors

Consider the data for the Veterans Administration lung cancer trial presented in Appendix 1 of Kalbfleisch and Prentice (1980). In this trial, males with advanced inoperable lung cancer were randomized to a standard therapy and a test chemotherapy. The primary endpoint for the therapy comparison was time to death in days, represented by the variable Time. Negative values of Time are censored values. The data include information about a number of explanatory variables: Therapy (type of therapy: standard or test), Cell (type of tumor cell: adeno, large, small, or squamous), Prior (prior therapy: 0=no, 10=yes), Age (age, in years), Duration (months from diagnosis to randomization), and Kps (Karnofsky performance scale). A censoring indicator variable, Censor, is created from the data, with the value 1 indicating a censored time and the value 0 indicating an event time. The following DATA step saves the data in the data set VALung.

```proc format;
value yesno 0='no' 10='yes';
run;

data VALung;
drop check m;
retain Therapy Cell;
infile cards column=column;
length Check \$ 1;
label Time='time to death in days'
Kps='Karnofsky performance scale'
Duration='months from diagnosis to randomization'
Age='age in years'
Prior='prior therapy'
Cell='cell type'
Therapy='type of treatment';
format Prior yesno.;
M=Column;
input Check \$ @@;
if M>Column then M=1;
if Check='s'|Check='t' then do;
input @M Therapy \$ Cell \$;
delete;
end;
else do;
input @M Time Kps Duration Age Prior @@;
Status=(Time>0);
Time=abs(Time);
end;
datalines;
standard squamous
72 60  7 69  0   411 70  5 64 10   228 60  3 38  0   126 60  9 63 10
118 70 11 65 10    10 20  5 49  0    82 40 10 69 10   110 80 29 68  0
314 50 18 43  0  -100 70  6 70  0    42 60  4 81  0     8 40 58 63 10
144 30  4 63  0   -25 80  9 52 10    11 70 11 48 10
standard small
30 60  3 61  0   384 60  9 42  0     4 40  2 35  0    54 80  4 63 10
13 60  4 56  0  -123 40  3 55  0   -97 60  5 67  0   153 60 14 63 10
59 30  2 65  0   117 80  3 46  0    16 30  4 53 10   151 50 12 69  0
22 60  4 68  0    56 80 12 43 10    21 40  2 55 10    18 20 15 42  0
139 80  2 64  0    20 30  5 65  0    31 75  3 65  0    52 70  2 55  0
287 60 25 66 10    18 30  4 60  0    51 60  1 67  0   122 80 28 53  0
27 60  8 62  0    54 70  1 67  0     7 50  7 72  0    63 50 11 48  0
392 40  4 68  0    10 40 23 67 10
8 20 19 61 10    92 70 10 60  0    35 40  6 62  0   117 80  2 38  0
132 80  5 50  0    12 50  4 63 10   162 80  5 64  0     3 30  3 43  0
95 80  4 34  0
standard large
177 50 16 66 10   162 80  5 62  0   216 50 15 52  0   553 70  2 47  0
278 60 12 63  0    12 40 12 68 10   260 80  5 45  0   200 80 12 41 10
156 70  2 66  0  -182 90  2 62  0   143 90  8 60  0   105 80 11 66  0
103 80  5 38  0   250 70  8 53 10   100 60 13 37 10
test squamous
999 90 12 54 10   112 80  6 60  0   -87 80  3 48  0  -231 50  8 52 10
242 50  1 70  0   991 70  7 50 10   111 70  3 62  0     1 20 21 65 10
587 60  3 58  0   389 90  2 62  0    33 30  6 64  0    25 20 36 63  0
357 70 13 58  0   467 90  2 64  0   201 80 28 52 10     1 50  7 35  0
30 70 11 63  0    44 60 13 70 10   283 90  2 51  0    15 50 13 40 10
test small
25 30  2 69  0  -103 70 22 36 10    21 20  4 71  0    13 30  2 62  0
87 60  2 60  0     2 40 36 44 10    20 30  9 54 10     7 20 11 66  0
24 60  8 49  0    99 70  3 72  0     8 80  2 68  0    99 85  4 62  0
61 70  2 71  0    25 70  2 70  0    95 70  1 61  0    80 50 17 71  0
51 30 87 59 10    29 40  8 67  0
24 40  2 60  0    18 40  5 69 10   -83 99  3 57  0    31 80  3 39  0
51 60  5 62  0    90 60 22 50 10    52 60  3 43  0    73 60  3 70  0
8 50  5 66  0    36 70  8 61  0    48 10  4 81  0     7 40  4 58  0
140 70  3 63  0   186 90  3 60  0    84 80  4 62 10    19 50 10 42  0
45 40  3 69  0    80 40  4 63  0
test large
52 60  4 45  0   164 70 15 68 10    19 30  4 39 10    53 60 12 66  0
15 30  5 63  0    43 60 11 49 10   340 80 10 64 10   133 75  1 65  0
111 60  5 64  0   231 70 18 67 10   378 80  4 65  0    49 30  3 37  0
;
```

The following statements use the PHREG procedure to fit the Cox proportional hazards model to these data. The variables Prior, Cell, and Therapy, which are categorical variables, are declared in the CLASS statement. By default, PROC PHREG parameterizes the CLASS variables by using the reference coding with the last category as the reference category. However, you can explicitly specify the reference category of your choice. Here, Prior=no is chosen as the reference category for prior therapy, Cell=large is chosen as the reference category for type of tumor cell, and Therapy=standard is chosen as the reference category for the type of therapy. In the MODEL statement, the term Prior|Therapy is just another way of specifying the main effects Prior, Therapy, and the Prior*Therapy interaction.

```proc phreg data=VALung;
class Prior(ref='no') Cell(ref='large') Therapy(ref='standard');
model Time*Status(0) = Kps Duration Age Cell Prior|Therapy;
run;
```

Coding of the CLASS variables is displayed in Output 66.3.1. There is one dummy variable for Prior and one for Therapy, since both variables are binary. The dummy variable has a value of 0 for the reference category (Prior=no, Therapy=standard). The variable Cell has four categories and is represented by three dummy variables. Note that the reference category, Cell=large, has a value of 0 for all three dummy variables.

Output 66.3.1 Reference Coding of CLASS Variables
The PHREG Procedure

Class Level Information
Class Value Design Variables
Prior no 0
yes 1
large 0 0 0
small 0 1 0
squamous 0 0 1
Therapy standard 0
test 1

The test results of individual model effects are shown in Output 66.3.2. There is a strong prognostic effect of Kps on patient’s survivivorship (), and the survival times for patients of different Cell types differ significantly (). The Prior*Therapy interaction is marginally significant (p=0.0416)—that is, prior therapy might play a role in whether one treatment is more effective than the other.

Output 66.3.2 Wald Tests of Individual Effects
Type 3 Tests
Effect DF Wald Chi-Square Pr > ChiSq
Kps 1 35.5051 <.0001
Duration 1 0.1159 0.7335
Age 1 1.9772 0.1597
Cell 3 18.5339 0.0003
Prior 1 2.5296 0.1117
Therapy 1 5.2349 0.0221
Prior*Therapy 1 4.1528 0.0416

In the Cox proportional hazards model, the effects of the covariates are to act multiplicatively on the hazard of the survival time, and therefore it is a little easier to interpret the corresponding hazard ratios than the regression parameters. For a parameter that corresponds to a continuous variable, the hazard ratio is the ratio of hazard rates for a increase of one unit of the variable. From Output 66.3.3, the hazard ratio estimate for Kps is 0.968, meaning that an increase of 10 units in Karnofsky performance scale will shrink the hazard rate by =28%. For a CLASS variable parameter, the hazard ratio presented in the Output 66.3.3 is the ratio of the hazard rates between the given category and the reference category. The hazard rate of Cell=adeno is 219% that of Cell=large, the hazard rate of Cell=small is 62% that of Cell=large, and the hazard rate of Cell=squamous is only 66% that of Cell=large. Hazard ratios for Prior and Therapy are missing since the model contains the Prior*Therapy interaction. You can use the HAZARDRATIO statement to obtain the hazard ratios for a main effect in the presence of interaction as shown later in this example.

Output 66.3.3 Parameters Estimates with Reference Coding
Analysis of Maximum Likelihood Estimates
Parameter     DF Parameter
Estimate
Standard
Error
Chi-Square Pr > ChiSq Hazard
Ratio
Label
Kps     1 -0.03300 0.00554 35.5051 <.0001 0.968 Karnofsky performance scale
Duration     1 0.00323 0.00949 0.1159 0.7335 1.003 months from diagnosis to randomization
Age     1 -0.01353 0.00962 1.9772 0.1597 0.987 age in years
Cell small   1 0.48230 0.26537 3.3032 0.0691 1.620 cell type small
Cell squamous   1 -0.40770 0.28363 2.0663 0.1506 0.665 cell type squamous
Prior yes   1 0.45914 0.28868 2.5296 0.1117 . prior therapy yes
Therapy test   1 0.56662 0.24765 5.2349 0.0221 . type of treatment test
Prior*Therapy yes test 1 -0.87579 0.42976 4.1528 0.0416 . prior therapy yes * type of treatment test

The following PROC PHREG statements illustrate the use of the backward elimination process to identify the effects that affect the survivorship of the lung cancer patients. The option SELECTION=BACKWARD is specified to carry out the backward elimination. The option SLSTAY=0.1 specifies the significant level for retaining the effects in the model.

```proc phreg data=VALung;
class Prior(ref='no') Cell(ref='large') Therapy(ref='standard');
model Time*Status(0) = Kps Duration Age Cell Prior|Therapy
/ selection=backward slstay=0.1;
run;
```

Results of the backward elimination process are summarized in Output 66.3.4. The effect Duration was eliminated first and was followed by Age.

Output 66.3.4 Effects Eliminated from the Model
The PHREG Procedure

Summary of Backward Elimination
Step Effect
Removed
DF Number
In
Wald
Chi-Square
Pr > ChiSq Effect
Label
1 Duration 1 6 0.1159 0.7335 months from diagnosis to randomization
2 Age 1 5 2.0458 0.1526 age in years

Output 66.3.5 shows the Type 3 analysis of effects and the maximum likelihood estimates of the regression coefficients of the model. Without controlling for Age and Duration, KPS and Cell remain significant, but the Prior*Therapy interaction is less prominent than before (p=0.0871) though still significant at 0.1 level.

Output 66.3.5 Type 3 Effects and Parameter Estimates for the Selected Model
Type 3 Tests
Effect DF Wald Chi-Square Pr > ChiSq
Kps 1 35.9218 <.0001
Cell 3 17.4134 0.0006
Prior 1 2.3113 0.1284
Therapy 1 3.8030 0.0512
Prior*Therapy 1 2.9269 0.0871

Analysis of Maximum Likelihood Estimates
Parameter     DF Parameter
Estimate
Standard
Error
Chi-Square Pr > ChiSq Hazard
Ratio
Label
Kps     1 -0.03111 0.00519 35.9218 <.0001 0.969 Karnofsky performance scale
Cell small   1 0.44265 0.26168 2.8614 0.0907 1.557 cell type small
Cell squamous   1 -0.41145 0.28309 2.1125 0.1461 0.663 cell type squamous
Prior yes   1 0.41755 0.27465 2.3113 0.1284 . prior therapy yes
Therapy test   1 0.45670 0.23419 3.8030 0.0512 . type of treatment test
Prior*Therapy yes test 1 -0.69443 0.40590 2.9269 0.0871 . prior therapy yes * type of treatment test

Finally, the following statements refit the previous model and computes hazard ratios at settings beyond those displayed in the "Analysis of Maximum Likelihood Estimates" table. You can use either the HAZARDRATIO statement or the CONTRAST statement to obtain hazard ratios. Using the CONTRAST statement to compute hazard ratios for CLASS variables can be a daunting task unless you are familiar with the parameterization schemes (see the section Parameterization of Model Effects of Chapter 19, Shared Concepts and Topics, for details), but you have control over which specific hazard ratios you want to compute. HAZARDRATIO statements, on the other hand, are designed specifically to provide hazard ratios. They are easy to use and you can also request both the Wald confidence limits and the profile-likelihood confidence limits; the latter is not available for the CONTRAST statements. Three HAZARDRATIO statements are specified; each has the CL=BOTH option to request both the Wald confidence limits and the profile-likelihood limits. The first HAZARDRATIO statement, labeled ’H1’, estimates the hazard ratio for an increase of 10 units in the KPS; the UNITS= option specifies the number of units increase. The second HAZARDRATIO statement, labeled ’H2’ computes the hazard ratios for comparing any pairs of tumor Cell types. The third HAZARDRATIO statement, labeled ’H3’, compares the test therapy with the standard therapy. The DIFF=REF option specifies that each nonreference category is compared to the reference category. The purpose of using DIFF=REF here is to ensure that the hazard ratio is comparing the test therapy to the standard therapy instead of the other way around. Three CONTRAST statements, labeled ’C1’, ’C2’, and ’C3’, parallel to the HAZARDRATIO statements ’H1’, ’H2’, and ’H3’, respectively, are specified. The ESTIMATE=EXP option specifies that the linear predictors be estimated in the exponential scale, which are precisely the hazard ratios.

```proc phreg data=VALung;
class Prior(ref='no') Cell(ref='large') Therapy(ref='standard');
model Time*Status(0) = Kps Cell Prior|Therapy;
hazardratio 'H1' Kps / units=10 cl=both;
hazardratio 'H2' Cell / cl=both;
hazardratio 'H3' Therapy / diff=ref cl=both;
contrast 'C1' Kps 10 / estimate=exp;
contrast 'C2' cell 1  0  0, /* adeno vs large    */
cell 1 -1  0, /* adeno vs small    */
cell 1  0 -1, /* adeno vs squamous */
cell 0 -1  0, /* large vs small    */
cell 0  0 -1, /* large vs Squamous */
cell 0  1 -1  /* small vs squamous */
/ estimate=exp;
contrast 'C3' Prior 0 Therapy 1  Prior*Therapy 0,
Prior 0 Therapy 1  Prior*Therapy 1  / estimate=exp;
run;
```

Output 66.3.6 displays the results of the three HAZARDRATIO statements in separate tables. Results of the three CONTRAST statements are shown in one table in Output 66.3.7. However, point estimates and the Wald confidence limits for the hazard ratio agree in between the two outputs.

Output 66.3.6 Results from HAZARDRATIO Statements
The PHREG Procedure

H1: Hazard Ratios for Kps
Description Point Estimate 95% Wald Confidence Limits 95% Profile Likelihood
Confidence Limits
Kps Unit=10 0.733 0.662 0.811 0.662 0.811

H2: Hazard Ratios for Cell
Description Point Estimate 95% Wald Confidence Limits 95% Profile Likelihood
Confidence Limits
Cell adeno vs large 2.115 1.164 3.843 1.162 3.855
Cell adeno vs small 1.359 0.798 2.312 0.791 2.301
Cell adeno vs squamous 3.192 1.773 5.746 1.770 5.768
Cell large vs small 0.642 0.385 1.073 0.380 1.065
Cell large vs squamous 1.509 0.866 2.628 0.863 2.634
Cell small vs squamous 2.349 1.387 3.980 1.399 4.030

H3: Hazard Ratios for Therapy
Description Point Estimate 95% Wald Confidence Limits 95% Profile Likelihood
Confidence Limits
Therapy test vs standard At Prior=no 1.579 0.998 2.499 0.998 2.506
Therapy test vs standard At Prior=yes 0.788 0.396 1.568 0.390 1.560

Output 66.3.7 Results from CONTRAST Statements
Contrast Estimation and Testing Results by Row
Contrast Type Row Estimate Standard
Error
Alpha Confidence Limits Wald
Chi-Square
Pr > ChiSq
C1 EXP 1 0.7326 0.0380 0.05 0.6618 0.8111 35.9218 <.0001
C2 EXP 1 2.1150 0.6443 0.05 1.1641 3.8427 6.0457 0.0139
C2 EXP 2 1.3586 0.3686 0.05 0.7982 2.3122 1.2755 0.2587
C2 EXP 3 3.1916 0.9575 0.05 1.7727 5.7462 14.9629 0.0001
C2 EXP 4 0.6423 0.1681 0.05 0.3846 1.0728 2.8614 0.0907
C2 EXP 5 1.5090 0.4272 0.05 0.8664 2.6282 2.1125 0.1461
C2 EXP 6 2.3493 0.6318 0.05 1.3868 3.9797 10.0858 0.0015
C3 EXP 1 1.5789 0.3698 0.05 0.9977 2.4985 3.8030 0.0512
C3 EXP 2 0.7884 0.2766 0.05 0.3964 1.5680 0.4593 0.4980