Contents: | Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |
Capabilities of this macro are now available from the SCORE statement in SAS/STAT® PROC PLM or the OUTPUT statement in SAS/ETS® PROC COUNTREG.
Version
|
Update Notes
|
1.2 | Added PROC= and INZEROMODEL= options to allow PROBCOUNTS to work with poisson, negative binomial, and (beginning in SAS 9.2) zero-inflated poisson models (not involving CLASS variables) fit by PROC GENMOD. |
1.1 | Added PRED= option. Removed need for MODEL= option. Renamed PARMS= to INMODEL=. Remove RESTRICT lines from COUNTREG parameters table. |
1.0 | Initial coding. |
Follow the instructions in the Downloads tab of this sample to save the PROBCOUNTS macro definition. Replace the text within quotes in the following statement with the location of the PROBCOUNTS macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the PROBCOUNTS macro and make it available for use:
%inc "<location of your file containing the PROBCOUNTS macro>";
Following this statement, you may call the PROBCOUNTS macro. See the Results tab for an example.
The following are required:
One or both of the following must be specified:
counts=%str(0, 1, 2 to 10 by 2, 15)requests predicted probabilities for values 0, 1, 2, 4, 6, 8, 10, and 15. If omitted, no predicted count probabilities are computed.
The following are optional:
The version of the PROBCOUNTS macro that you are using is displayed when you specify version (or any string) as the first argument. For example:
%probcounts(version, data=mydata, inmodel=mymodel, pred=p)
For ZIP and ZINB models, two processes work together to produce the response: a Poisson or negative binomial process and a zero-inflation process. The first process generates zeros and positive integer values from a Poisson or negative binomial process. The second process is a binomial process which generates extra zeros. The probability of a zero in this second process, φ, is modeled via either a logit or probit model. The predictors, x, for the model of the first process, μ=exp(x'β), are specified in the MODEL statement of PROC COUNTREG or PROC GENMOD. The predictors, z, for the model of the second process, φ=F(z'γ) are specified in the ZEROMODEL statement of either procedure, and the model type (logit or probit) is specified in the LINK= option. The ZEROLINK= option in the PROBCOUNTS macro should match the LINK= option. If not, the computation of predicted counts and count probabilities will be incorrect. The ParameterEstimates table available from PROC COUNTREG or PROC NLMIXED contains the estimates of β and γ. In PROC GENMOD, the ParameterEstimates table contains the estimates of β and the ZeroParameterEstimates table contains the estimates of γ. In order to use the PROBCOUNTS macro with a model fit by PROC GENMOD, no CLASS variables can be specified in either the MODEL or ZEROMODEL statement.
The PROBCOUNTS macro requires as input the data set for which you want predicted counts and/or predicted count probabilities. This data set (specified in the DATA= option) may be the data set used to fit the model or another data set for which you want predicted values. In this way, the PROBCOUNTS macro can be used to score new observations using a previously fit model. The DATA= data set must contain all predictors specified in the MODEL and ZEROMODEL statements (including any offsets) when the model was fit. Also required as input is the data set of parameters from the fitted model, β and γ. For PROC COUNTREG or PROC NLMIXED, this data set (specified in the INMODEL= option) is created by including the following statement when fitting the model:
ods output ParameterEstimates = <SAS-data-set>;
For models fit by PROC GENMOD, the β parameters are input via the INMODEL= option and the γ parameters, in the case of ZIP models, are input via the INZEROMODEL= option. Both data sets can be created with this statement when fitting the model:
ods output ParameterEstimates = <SAS-data-set> ZeroParameterEstimates = <SAS-data-set>;
Finally, you must specify either or both of the PRED= and COUNTS= options to request the desired predicted values.
The predicted count computed by the PROBCOUNTS macro for a given observation is an estimate of μ for Poisson and negative binomial models, or of μ(1-φ) for ZIP and ZINB models. For Poisson and negative binomial models, the predicted count probability for count c, p(c)=Pr(Y=c|x), is obtained directly from the Poisson or negative binomial probability mass function using the estimated mean for the given observation, μ. For ZIP or ZINB models, the predicted count probability is p(c)(1-φ)+I0φ, where I0=1 if c=0 and 0 otherwise.
The PROBCOUNTS macro produces no displayed results. It only produces an output data set.
The PROBCOUNTS macro attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:
PROBCOUNTS: Unable to check for newer version
The computation of predicted values is not affected by the appearance of this message.
Long, J. S. (1997), Regression Models for Categorical and Limited Dependent Variables, Thousand Oaks: Sage Publications, Inc.
SAS Institute Inc. (2006), "The COUNTREG Procedure (Experimental)," Cary, NC: SAS Institute Inc.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
The following statements use PROC FREQ to compute the proportion of scientists publishing each observed number of articles.
proc freq data=articles; table art / out=obs; run;
|
Note that the observed proportion publishing zero articles is 0.3005. If a single Poisson process generated these data, then the estimated Poisson mean is 1.69 (the average number of articles), and the probability of zero articles is Pr(art=0) = 1.690e-1.69/0! = e-1.69 = 0.18. This considerably underestimates the observed proportion. Under a single negative binomial process with the same estimated mean and with estimated dispersion parameter 0.5861 (which can be obtained by fitting an intercept-only model in PROC COUNTREG), the probability of zero articles is Pr(art=0) = 0.3085 which is much closer to the observed value. Given the parameters of the Poisson or negative binomial distribution, you can use the PDF function to obtain the probability of any number of articles.
The following statements use PROC COUNTREG to fit the Poisson model and the PROBCOUNTS macro to compute predicted count probabilities for counts 0 through 10 as variables POI0, POI1, ... , POI10 and saves them in data set PREDPOI. Under the Poisson model, the predicted proportion of zero articles is 0.2092 which, not surprisingly, is much closer to the probability from the single Poisson process (0.18) and considerably underestimates the observed proportion (0.30).
Note that syntax for the experimental SAS 9.1 version of PROC COUNTREG is shown below. The PROC COUNTREG syntax changed beginning in SAS 9.2. This example using production syntax is shown in the SAS/ETS User's Guide.
proc countreg data=articles type=poisson; model art = fem mar kid5 phd ment; ods output ParameterEstimates=pe; run; %probcounts(data=articles, inmodel=pe, counts=0 to 10, prefix=poi, out=predpoi)
The negative binomial model fit by the following statements more closely approximates the observed proportion of zeros (0.3036) as expected. Also, the test of the negative binomial dispersion parameter, _Alpha, in the negative binomial model indicates significant overdispersion in the Poisson model (p < .0001). As a result, the negative binomial model is preferred over the Poisson model. Predicted count probabilities are saved in data set PREDNB.
proc countreg data=articles type=negbin method=qn; model art = fem mar kid5 phd ment; ods output ParameterEstimates=pe; run; %probcounts(data=predpoi, inmodel=pe, counts=0 to 10, prefix=nb, out=prednb)
Another way to account for the large number of zeros in these data is to fit a zero-inflated Poisson (ZIP) or a zero-inflated negative binomial (ZINB) model. The following statements fit the ZIP model. In this ZIP model, all variables used to model the article counts are also used to model the extra zeros probability φ. The proportion of zeros predicted by the ZIP model is 0.2986 which is much closer to the observed proportion than was the Poisson model.
proc countreg data=articles type=zip; model art = fem mar kid5 phd ment / zi(var=fem mar kid5 phd ment); ods output ParameterEstimates=pe; run; %probcounts(data=prednb, inmodel=pe, counts=0 to 10, prefix=zip, out=predzip)
The following statements fit the ZINB model which also does a good job of estimating the proportion of zeros (0.3119) and it follows the other observed proportions well, though possibly not as well as the negative binomial model. The test for overdispersion again indicates a preference for the negative binomial version of the zero-inflated model (p < .0001).
proc countreg data=articles type=zinb method=qn; model art = fem mar kid5 phd ment / zi(var=fem mar kid5 phd ment); ods output ParameterEstimates=pe; run; %probcounts(data=predzip, inmodel=pe, counts=0 to 10, prefix=zinb, out=predzinb)
The following statements compute the average predicted count probability across all scientists for each count 0, 1, ... , 10. The averages for each model, along with the observed proportions, are then arranged for plotting.
/* For each model, compute the average probability at each count */ proc summary data=predzinb; var poi0-poi10 nb0-nb10 zip0-zip10 zinb0-zinb10; output out=mnpoi mean(poi0-poi10) =mn0-mn10; output out=mnnb mean(nb0-nb10) =mn0-mn10; output out=mnzip mean(zip0-zip10) =mn0-mn10; output out=mnzinb mean(zinb0-zinb10)=mn0-mn10; run; /* Arrange means for plotting and include the observed proportions */ data means; set mnpoi mnnb mnzip mnzinb; drop _type_ _freq_; run; proc transpose data=means out=tmeans; run; data allpred; merge obs(where=(art<=10)) tmeans; obs=percent/100; run;
These statements use the ODS Graphics Template Language to produce the graph of the predicted probabilities for counts 0 through 10.
ods html; ods graphics on; proc template; define statgraph NumArt / store = SASUSER.TEMPLAT; notes "Several models for number of published articles"; layout lattice ; columnheader; entrytitle "Models for number of published articles" / padtop=10px padbottom=20px fontsize=16pt; endcolumnheader; layout overlay / xaxisopts=(label="Number of Articles") yaxisopts=(label="Probability"); series y=obs x=art / name="obs" legendlabel="Observed" linecolor=black linethickness=4px; series y=col1 x=art / name="poi" legendlabel="Poisson" linecolor=blue; series y=col2 x=art/ name="nb" legendlabel="Negative Binomial" linecolor=red; series y=col3 x=art/ name="zip" legendlabel="ZIP" linecolor=blue linepattern=dashshort; series y=col4 x=art/ name="zinb" legendlabel="ZINB" linecolor=red linepattern=dashshort; discretelegend "poi" "zip" "nb" "zinb" "obs" / title="Models:" halign=right valign=top across=2 down=3 border=on; endlayout; endlayout; end; run; title; data _null_; set allpred; file print ods=(template='NumArt'); put _ods_; run; ods graphics off; ods html close;
For each of the four fitted models, the graph shows the average predicted count probability for each article count across all scientists. The poisson model clearly underestimates the proportion of zero articles published while the other three model are quite accurate at zero. All of the models fit the observed proportions well at the larger number of articles.
A similar plot can be created as follows with PROC GPLOT:
symbol1 v=none i=join c=black w=2; /* observed */ symbol2 v=none i=join c=green; /* obs'd univariate poisson */ symbol3 v=none i=join c=blue; /* poisson model */ symbol4 v=none i=join c=red; /* NB model */ symbol5 v=none i=join c=blue line=2; /* ZIP model */ symbol6 v=none i=join c=red line=2; /* ZINB model */ axis1 label=(angle=90 font=swissb "Probability") order=0 to 0.4 by .1 minor=none; axis2 label=(font=swissb "Number of Articles") order=0 to 10 minor=none; legend1 position=(middle center inside) mode=protect across=1 label=("Models:") offset=(,8 pct) frame value=("Observed" "Poisson" "ZIP" "NB" "ZINB"); proc gplot; plot obs*art=1 col1*art=3 col3*art=5 col2*art=4 col4*art=6 / overlay vaxis=axis1 haxis=axis2 legend=legend1; title "Models for number of published articles"; run; quit;
proc means data=articles; var ment phd; run;
|
PROC FREQ reveals the distribution of young children for married and unmarried scientists.
proc freq data=articles; table mar*kid5; run;
|
For married scientists, the mean number of young children is about 0.75.
proc means data=articles; where mar=1; var kid5; run;
|
The following creates the data set to be scored using the fitted negative binomial model. The predicted number of articles will be obtained for married and unmarried scientists of both genders at the averages of mentor articles, program prestige, and number of children given their marital status.
data toscore; input fem ment phd mar kid5; datalines; 0 8.76 3.10 0 0 0 8.76 3.10 1 .75 1 8.76 3.10 0 0 1 8.76 3.10 1 .75 ;
The negative binomial model is again fit to the data and the estimated model parameters are stored in data set PE.
proc countreg data=articles type=negbin method=qn; model art = fem mar kid5 phd ment; ods output ParameterEstimates=pe; run;
The PROBCOUNTS macro is called using the PRED= option to create a variable, PREDCOUNT, containing the predicted counts for each observation in the TOSCORE data set. The results are saved in data set NBPREDS.
%probcounts(data=toscore, inmodel=pe, pred=predcount, out=NBpreds)
The results show that male scientists produce a larger number of articles than female scientists, and that married scientists produce more articles than unmarried scientists.
proc print noobs; run;
|
proc genmod data=articles; model art = fem mar kid5 phd ment / dist=zip; zeromodel fem mar kid5 phd ment; ods output ParameterEstimates=pe ZeroParameterEstimates=zpe; run; %probcounts(data=articles, proc=genmod, inmodel=pe, inzeromodel=zpe, pred=p, counts=%str(0,1,2), out=GenZIP)
Right-click on the link below and select Save to save
the PROBCOUNTS macro definition
to a file. It is recommended that you name the file
probcounts.sas
.
Download and save probcounts.sas
Type: | Sample |
Topic: | SAS Reference ==> Procedures ==> COUNTREG SAS Reference ==> Procedures ==> GENMOD Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Procedures ==> NLMIXED |
Date Modified: | 2008-11-17 15:01:39 |
Date Created: | 2007-08-14 03:03:16 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/STAT | All | 9.1 TS1M3 | n/a |
SAS System | SAS/ETS | All | 9.1 TS1M3 | n/a |