This example uses a data set on a study of the analgesic effects of treatments on elderly patients with neuralgia. The purpose
of this example is to show how PROC PLM behaves under different situations when BY-group processing is present. Two test treatments
and a placebo are compared to test whether the patient reported pain or not. For each patient, the information of age, gender,
and the duration of complaint before the treatment began were recorded. The following DATA step creates the data set named
`Neuralgia`

:

Data Neuralgia; input Treatment $ Sex $ Age Duration Pain $ @@; datalines; P F 68 1 No B M 74 16 No P F 67 30 No P M 66 26 Yes B F 67 28 No B F 77 16 No A F 71 12 No B F 72 50 No B F 76 9 Yes A M 71 17 Yes A F 63 27 No A F 69 18 Yes B F 66 12 No A M 62 42 No P F 64 1 Yes A F 64 17 No P M 74 4 No A F 72 25 No P M 70 1 Yes B M 66 19 No B M 59 29 No A F 64 30 No A M 70 28 No A M 69 1 No B F 78 1 No P M 83 1 Yes B F 69 42 No B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes A M 70 12 No A F 69 12 No B F 65 14 No B M 70 1 No B M 67 23 No A M 76 25 Yes P M 78 12 Yes B M 77 1 Yes B F 69 24 No P M 66 4 Yes P F 65 29 No P M 60 26 Yes A M 78 15 Yes B M 75 21 Yes A F 67 11 No P F 72 27 No P F 70 13 Yes A M 75 6 Yes B F 65 7 No P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No A M 65 15 No P F 67 1 Yes A M 67 10 No P F 72 11 Yes A F 74 1 No B M 80 21 Yes A F 69 3 No ;

The data set contains five variables. `Treatment`

is a classification variable that has three levels: A and B represent the two test treatments, and P represents the placebo
treatment. `Sex`

is a classification variable that indicates each patient’s gender. `Age`

is a continuous variable that indicates the age in years of each patient when a treatment began. `Duration`

is a continuous variable that indicates the duration of complaint in months. The last variable `Pain`

is the response variable with two levels: ‘Yes’ if pain was reported, ‘No’ if no pain was reported.

Suppose there is some preliminary belief that the dependency of `pain`

on the explanatory variables is different for male and female patients, leading to separate models between genders. You believe
there might be redundant information for predicting the probability of `Pain`

. Thus, you want to perform model selection to eliminate unnecessary effects. You can use the following statements:

proc sort data=Neuralgia; by sex; run; proc logistic data=Neuralgia; class Treatment / param=glm; model pain = Treatment Age Duration / selection=backward; by sex; store painmodel; title 'Logistic Model on Neuralgia'; run;

PROC SORT is called to sort the data by variable `Sex`

. The LOGISTIC procedure is then called to fit the probability of no pain. Three variables are specified for the full model:
`Treatment`

, `Age`

, and `Duration`

. Backward elimination is used as the model selection method. The BY statement fits separate models for male and female patients.
Finally, the STORE statement specifies that the fitted results be saved to an item store named `painmodel`

.

Output 87.5.1 lists parameter estimates from the two models after backward elimination is performed. From the model for female patients,
`Treatment`

is the only factor that affects the probability of no pain, and `Treatment`

A and B have the same positive effect in predicting the probability of no pain. From the model for male patients, both `Treatment`

and `Age`

are included in the selected model. `Treatment`

A and B have different positive effects, while `Age`

has a negative effect in predicting the probability of no pain.

Output 87.5.1: Parameter Estimates for Male and Female Patients

Logistic Model on Neuralgia |

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates | ||||||
---|---|---|---|---|---|---|

Parameter | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq | |

Intercept | 1 | -0.4055 | 0.6455 | 0.3946 | 0.5299 | |

Treatment | A | 1 | 2.6027 | 1.2360 | 4.4339 | 0.0352 |

Treatment | B | 1 | 2.6027 | 1.2360 | 4.4339 | 0.0352 |

Treatment | P | 0 | 0 | . | . | . |

Now the fitted models are saved to the item store `painmodel`

. Suppose you want to use it to score several new observations. The following DATA steps create three data sets for scoring:

data score1; input Treatment $ Sex $ Age; datalines; A F 20 B F 30 P F 40 A M 20 B M 30 P M 40 ; data score2; set score1(drop=sex); run; data score3; set score2(drop=Age); run;

The first score data set `score1`

contains six observations and all the variables that are specified in the full model. The second score data set `score2`

is a duplicate of `score1`

except that `Sex`

is dropped. The third score data set `score3`

is a duplicate of `score2`

except that `Age`

is dropped. You can use the following statements to score the three data sets:

proc plm restore=painmodel; score data=score1 out=score1out predicted; score data=score2 out=score2out predicted; score data=score3 out=score3out predicted; run;

Output 87.5.2 lists the store information that PROC PLM reads from the item store `painmodel`

. The "Model Effects" entry lists all three variables that are specified in the full model before the BY-group processing.

Output 87.5.2: Item Store Information for `painmodel`

Logistic Model on Neuralgia |

The PLM Procedure

Store Information | |
---|---|

Item Store | WORK.PAINMODEL |

Data Set Created From | WORK.NEURALGIA |

Created By | PROC LOGISTIC |

Date Created | 06APR15:20:21:00 |

By Variable | Sex |

Response Variable | Pain |

Link Function | Logit |

Distribution | Binary |

Class Variables | Treatment Pain |

Model Effects | Intercept Treatment Age Duration |

With the three SCORE statements, three data sets are thus produced: `score1out`

, `score2out`

, and `score3out`

. They contain the linear predictors in addition to all original variables. The data set `score1out`

contains the values shown in Output 87.5.3.

Output 87.5.3: Values of Data Set `score1out`

Linear predictors are computed for all six observations. Because the BY variable `Sex`

is available in `score1`

, PROC PLM uses separate models to score observations of male and female patients. So an observation with the same `Treatment`

and `Age`

has different linear predictors for different genders.

The data set `score2out`

contains the values shown in Output 87.5.4.

Output 87.5.4: Values of Data Set `score2out`

The second score data set `score2`

does not contain the BY variable `Sex`

. PROC PLM continues to score the full data set two times. Each time the scoring is based on the fitted model for each corresponding
BY-group. In the output data set, `Sex`

is added at the first column as the BY-group indicator. The first six entries correspond to the model for female patients,
and the next six entries correspond to the model for male patients. `Age`

is not included in the first model, and `Treatment`

A and B have the same parameter estimates, so observations 1, 2, 4, and 5 have the same linear predicted value.

The data set `score3out`

contains the values shown in Output 87.5.5.

Output 87.5.5: Values of Data Set `score3out`

The third score data set `score3`

does not contain the BY variable `Sex`

. PROC PLM scores the full data twice with separate models. Furthermore, it does not contain the variable `Age`

, which is a selected variable for predicting the probability of no pain for male patients. Thus, PROC PLM computes linear
predictor values for `score3`

by using the first model for female patients, and sets the linear predictor to missing when using the second model for male
patients to score the data set.