The COUNTREG Procedure

SCORE Statement

  • SCORE <OUT=SAS-data-set> <output-options>;

The SCORE statement enables you to compute predicted values and other statistics for a SAS data set. As with the OUTPUT statement, the new data set that is created contains all the variables in the input data set and, optionally, the estimates of $\mathbf{x}_{i}’\bbeta $, the expected value of the response variable, and the probability that the response variable will take the current value or other values that you specify. In a zero-inflated model, you can additionally request that the output data set contain the estimates of $\mathbf{z}_{i}’\bgamma $ and the probability that the response is zero as a result of the zero-generating process. For the Conway-Maxwell-Poisson model, the estimates of $\mathbf{g}_{i}’\bdelta $, $\lambda $, $\nu $, $\mu $, mode, variance, and dispersion are also available. Except for the probability of the current value, these statistics can be computed for all observations in which the regressors are not missing, even if the response is missing.

The following statements fit a Poisson model by using the DocVisit data set. Additional observations in the additionalPatients data set are used to compute expected values by using the SCORE statement. The data in the additionalPatients data set are not used during the fitting stage and are used only for scoring.

You score a data set in two separate steps. In the first step, you fit the model and use the STORE statement to preserve it in the DocVisitPoisson item store, as shown in the following statements:

   proc countreg data=docvisit;
      model doctorvisits=sex illness income / dist=poisson;
      store docvisitPoisson;
   run;

In the second step, you retrieve the content of the DocVisitPoisson item store and use it to calculate expected values by using the SCORE statement for the additionalPatients data set as follows:

   proc countreg restore=docvisitPoisson data=additionalPatients;
   score out=outScores mean=meanPoisson probability=prob;
   run;

By retrieving the model from the item store and using it in a postprocessing step, you can separate the fitting and scoring stages and use data for scoring that might not be available at the time when the model was fitted.

You can specify only one SCORE statement. You can specify the following output-options:

DISPERSION=name

names the variable that contains the value of dispersion for the Conway-Maxwell-Poisson distribution.

GDELTA=name

names the variable that contains estimates of $\mathbf{g}_{i}’\bdelta $ for the Conway-Maxwell-Poisson distribution.

LAMBDA=name

names the variable that contains the estimate of $\lambda $ for the Conway-Maxwell-Poisson distribution.

MODE=name

names the variable that contains the integral part of $\mu $ (mode) for the Conway-Maxwell-Poisson distribution.

MU=name

names the variable that contains the estimate of $\mu $ for the Conway-Maxwell-Poisson distribution.

NU=name

names the variable that contains the estimate of $\nu $ for the Conway-Maxwell-Poisson distribution.

OUT=SAS-data-set

names the output data set.

PRED=name
MEAN=name

names the variable that contains the predicted value of the response variable.

PROB=name

names the variable that contains the probability that the response variable will take the current value, Pr($Y=y_ i$).

PROBCOUNT(value1 <value2...>)

outputs the probability that the response variable will take particular values. Each value should be a nonnegative integer. Nonintegers are rounded to the nearest integer. The value can also be a list of the form X TO Y BY Z. For example, PROBCOUNT(0 1 2 TO 10 BY 2 15) requests predicted probabilities for counts 0, 1, 2, 4, 5, 6, 8, 10, and 15. This option is not available for the fixed-effects and random-effects panel models.

PROBZERO=name

names the variable that contains the value of $\varphi _{i}$, the probability of the response variable taking on the value of zero as a result of the zero-generating process. It is written to the output file only if the model is zero-inflated. This is not the overall probability of a zero response; that is provided by the PROBCOUNT(0) option.

VARIANCE=name

names the variable that contains the estimate of variance for the Conway-Maxwell-Poisson distribution.

XBETA=name

names the variable that contains estimates of $\mathbf{x}_{i}’\bbeta $.

ZGAMMA=name

names the variable that contains estimates of $\mathbf{z}_{i}’\bgamma $.