With a sample of data from a single population, you can estimate the parameter(s) of several distributions by fitting an interceptonly model in the GENMOD, GLIMMIX, FMM or other procedure. The parameters displayed by the procedure (intercept, scale, dispersion or other parameters) are the estimated parameters of the distribution you specified in the procedure (usually in the DIST= option). However, the form (parameterization) of the distribution might not be the same as appears in statistical literature and might not be the same as the parameterization used in other procedures or in the RAND, PDF, QUANTILE, or related functions available in the DATA step. The parameterization used by the modeling procedure is typically shown in the Details section of the procedure's documentation.
If predictors are specified in the model, then the settings of the predictors in the observations define multiple distributions of the type specified in the procedure with each distribution having a different mean. The predicted value for an observation (usually available from an OUTPUT statement) provides the estimated mean of that observation's distribution.
The following examples illustrate using data from a single population to estimate the parameters of various theoretical distributions. The estimated parameters in the distribution parameterization used by the modeling procedure are then transformed as needed to those used in the parameterization used in the RAND, PDF, and related functions as described in the Base SAS^{®} documentation.
Suppose you have Poisson data values in variable Y, where the Poisson mass function is f(y) = λ^{y}e^{λ}/y! as documented in the PDF and RAND functions. The mean and variance of the Poisson distribution equal λ. The Poisson distribution is a discrete distribution that is commonly used to model count data. Examples of using the Poisson distribution in modeling can be seen in this note.
The following statements fit the interceptonly Poisson model to the Y data. Since the default link function for the Poisson distribution is the log link, the model is log(λ) = intercept. Exponentiating the intercept provides an estimate of λ. The OUTPUT statement saves the mean estimate, along with 95% confidence limits, in data set OUT. The ODS OUTPUT statement saves the intercept estimate in data set PE. In the DATA step below, a point estimate for λ is computed from the intercept. The PDF function is then used to obtain an estimate of the probability that Y=0. The 90th percentile is estimated using the QUANTILE function. Finally, ten random observations are produced from the Poisson distribution with mean λ.
proc genmod; model y = / dist=poisson; ods output parameterestimates=pe; output out=out p=lambda l=lower u=upper; run; proc print data=out(obs=1); var lambda lower upper; run; proc transpose data=pe out=tpe; var estimate; id parameter; run; data tpe; set; lambda = exp(intercept); mean = lambda; var = lambda; PrYeq0 = pdf("poisson",0,lambda); Pct90 = quantile("poisson",.90,lambda); do i=1 to 10; RandomY = rand("poisson",lambda); output; end; run; proc print; run;
Suppose you have negative binomial data values in variable Y. The negative binomial mass function is f(y) = (^{y+k1}C_{k1})p^{k}(1p)^{y} as documented in the PDF and RAND functions, where (^{y+k1}C_{k1}) is the number of combinations of y+k1 items taken k1 at a time. The negative binomial distribution is a discrete distribution that is commonly used to model overdispersed count data. Examples of using the negative binomial distribution in modeling can be seen in this note.
The following PROC GENMOD step fits the interceptonly negative binomial model and estimates an intercept and dispersion parameter. As for the Poisson distribution above, the log is the default link function for this distribution. The model is log(mean) = intercept so exponentiating the intercept provides an estimate of the mean. The DATA step transforms the intercept and dispersion parameter estimates into the parameters (NB_p and NB_k) in the parameterization used by the RAND, PDF, and related functions. These parameters are then used to compute estimates of the mean (in two equivalent ways) and variance of the distribution. Estimates of the probability that Y=0 and the 90th percentile are produced by the PDF and QUANTILE functions. Ten random observations from the distribution are generated using the RAND function.
proc genmod; model y = / dist=negbin; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; var estimate; id parameter; run; data tpe; set; NB_k = 1/dispersion; NB_p = 1/(1+exp(intercept)*dispersion); NB_mean_1 = exp(intercept); NB_mean_2 = nb_k*(1nb_p)/nb_p; NB_var = nb_k*(1nb_p)/nb_p**2; PrYeq0 = pdf("negbinomial",0,nb_p,nb_k); Pct90 = quantile("negbinomial",.90,nb_p,nb_k); do i=1 to 10; RandomY = rand("negbinomial",nb_p,nb_k); output; end; run; proc print; run;
The geometric mass function takes two similar forms: f(x) = p(1p)^{x1}, x=1,2,... as documented in the RAND function with mean 1/p, or f(y) = p(1p)^{y}, y=0,1,2,... as documented in the PDF function with mean (1/p)1. The variance of both distributions is (1p)/p^{2}. Note that these distributions are shifted relative to one another with y = x1. Also note that the RAND function generates values starting at 1 rather than 0, but support for the distribution used in the PDF function begins at 0. The geometric distribution is equivalent to the negative binomial when the negative binomial dispersion parameter, k, equals one. The GENMOD procedure uses yet another parameterization that includes 0 in its support.
The following statements generate values from each of the two forms of the geometric distribution with parameter p=0.5. Note that values from the zerobased form (variable Y) can be generated by subtracting 1 from values generated by RAND("geometric",p) as shown below or with RAND("negbinomial",p,1.).
data GeoSamp; do n=1 to 10000; call streaminit(23422); x=rand('geometric',.5); *x=1,2,3,...; y=x1; *y=0,1,2,...; output; end; run;
The DATA step below computes the true population mean, variance, probability of 1, and 90th percentile using the true population parameter, p=0.5. Since the PDF and QUANTILE functions use the parameterization for Y=X1, the 90th percentile for X is obtained by adding 1 to the 90th of Y. Similarly, the probability that X=1 is obtained by computing the probability that Y=0. In the MEANS and FREQ steps that follow, estimates of the means, variances, and 90th percentiles as well as counts at each observed value are computed from the generated sample of data for both X and Y.
data parms; p=0.5; Variable='X'; Mean = 1/p; Var = (1p)/p**2; PrEq1 = pdf("geometric",0,p); Pct90 = quantile("geometric",.90,p)+1; output; Variable='Y'; Mean = 1/p  1; Var = (1p)/p**2; PrEq1 = pdf("geometric",1,p); Pct90 = quantile("geometric",.90,p); output; run; proc print data=parms; id Variable; var p mean var preq1 pct90; title "Geometric population statistics"; run; proc means data=GeoSamp mean var min max p90; var x y; title "Geometric sample statistics"; run; proc freq data=GeoSamp; tables x y; title "Geometric sample statistics"; run;
Notice that the sample statistics computed from the 10,000 observations in the generated sample closely match the true population values.
The MEANS Procedure

The following PROC GENMOD step fits the interceptonly geometric model to the X data. The log is the default link function for this distribution. As a result, the model is log(mean) = log(1/p) = intercept. Exponentiating the intercept provides an estimate of the mean (also available from the P= option in the OUTPUT statement). The reciprocal of the exponentiated intercept is an estimate of p. In the DATA step, the above statistics are computed using the estimate of p from GENMOD on the generated sample.
proc genmod data=GeoSamp; model x = / dist=geometric; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; var estimate; id parameter; run; data Parms; set tpe; Variable = 'X'; p = 1/exp(intercept); Mean = exp(intercept); Var = (1p)/p**2; PrXeq1 = pdf("geometric",0,p); Pct90 = quantile("geometric",.90,p)+1; run; proc print; id Variable; var intercept p mean var prxeq1 pct90; title "Estimated geometric statistics from sample data"; title2 "X = 1,2,3, ..."; run;
Notice that the estimates of the mean, variance, probability of 1 and 90th percentile for variable X all closely match the values computed by the MEANS and FREQ procedures and the true population values shown above.

The following PROC GENMOD step fits the interceptonly geometric model to the Y data. The model is the same as above, but since Y=X1, the mean of Y is also shifted down and is now (1/p)1. As a result, the geometric parameter, p, is now estimated by 1/(1+exp(intercept)). The DATA step again computes the above statistics using the estimate of p from GENMOD on the generated sample.
proc genmod data=GeoSamp; model y = / dist=geometric; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; var estimate; id parameter; run; data Parms; set tpe; Variable = 'Y'; p = 1/(1+exp(intercept)); Mean = 1/p  1; Var = (1p)/p**2; PrYeq1 = pdf("geometric",1,p); Pct90 = quantile("geometric",.90,p); run; proc print; id Variable; var intercept p mean var pryeq1 pct90; title "Estimated geometric statistics from sample data"; title2 "Y = 0,1,2, ..."; run;
As for variable X, the estimates of the mean, variance, probability of 1 and 90th percentile for variable Y all closely match the values computed by the MEANS and FREQ procedures and the true population values shown above.

Examples of using the gamma distribution in modeling can be seen in this note. The gamma distribution for modeling positive continuous data and a modification of it are discussed in this note.
The most common parameterization for the gamma distribution is as a function of a shape parameter, a, and a scale parameter, λ, as shown in the description of the RAND('gamma') function. This is the parameterization used for the gamma distribution in PROC NLMIXED, which directly reports estimates for those parameters. The mean of the gamma distribution is aλ in this parameterization and its variance is aλ^{2}. Note that sometimes the scale parameter is β=1/λ.
proc nlmixed; model y ~ gamma(a,lambda); run;
Estimates for the a and λ parameters can also be obtained from other procedures such as GENMOD or GLIMMIX, which use a slightly different parameterization for the gamma distribution as described in detail in this note. The following statements fit the interceptonly gamma model. The default link function for the gamma distribution is the inverse, but the log link is most commonly used. With the log link, the model is log(mean) = log(aλ) = intercept. In the DATA step below, the mean and variance are estimated using the intercept and scale parameters provided by GENMOD and using the a and λ parameters derived from them. The median (50th percentile) is then estimated using the QUANTILE function. The SDF function is also used to estimate the probability that Y>20. Note that the similar CDF function computes the probability of smaller values than specified rather than larger as provided by the SDF function. Finally, ten random observations are generated by the RAND function using the estimated gamma parameters.
proc genmod; model y= / dist=gamma link=log; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; id parameter; var estimate; run; data gamma; set tpe; a = scale; lambda = exp(intercept)/scale; mean_alam = lambda*a; mean_munu = exp(intercept); var_alam = a*lambda**2; var_munu = exp(intercept)**2/scale; median = quantile('gamma',.5,a,lambda); PrGT20 = sdf("gamma",20,a,lambda); do i=1 to 10; RanGam = rand("gamma",a,lambda); output; end; run; proc print; run;
Suppose you have generalized Poisson data values in variable Y. The generalized Poisson mass function is shown in the documentation of the PDF function and in the Details section of the FMM procedure documentation. Like the Poisson and negative binomial distributions, the generalized Poisson distribution is a discrete distribution that is commonly used to model count data and can accommodate under and overdispersion in the data. Examples of using the generalized Poisson distribution can be seen in this note and this note.
The following PROC FMM step fits the interceptonly generalized Poisson model and estimates an intercept and scale parameter. The log is the default link function for this distribution. The model is log(mean) = intercept so exponentiating the intercept provides an estimate of the mean. The DATA step below transforms the intercept and dispersion parameter estimates into the parameters (θ and η) in the parameterization used by the RAND, PDF, and related functions. These parameters are then used to compute estimates of the mean and variance of the distribution. Estimates of the probability that Y=0, the probability that Y>15, and the median are produced by the SDF, PDF, and QUANTILE functions. Ten random observations from the distribution are generated using the RAND function.
proc fmm; model y= / dist=genpoisson; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; id parameter; var estimate; run; data genpoi; set tpe; theta = exp(intercept)*exp(scale); eta = 1exp(scale); mean = exp(intercept); var = mean/(1eta)**2; PrYeq0 = pdf("genpoisson",0,theta,eta); med = quantile("genpoisson",0.5,theta,eta); PrGT15 = sdf("genpoisson",15,theta,eta); do i=1 to 10; RanGP = rand("genpoisson",theta,eta); output; end; run; proc print; run;
Suppose you have Tweedie data values in variable Y. The Tweedie density function is shown in the documentation of the PDF function and in the Details section of the GENMOD procedure documentation. Details of the distribution are further discussed in "Tweedie Distribution for Generalized Linear Models" in the Details section of the GENMOD documentation. The Tweedie distribution is a continuous distribution for positive data that might also have a point mass at zero. Examples of using the generalized Poisson distribution can be seen in this note and this note.
The following PROC GENMOD step fits the interceptonly Tweedie model and estimates an intercept and power (p) and dispersion (φ) parameters. The log is the default link function for this distribution. The model is log(mean) = intercept so exponentiating the intercept provides an estimate of the mean. These parameters can be used directly in the RAND, PDF, and related functions. The DATA step below uses these parameters to compute estimates of the mean and variance of the distribution. Estimates of the probability that Y>60 and the median are produced by the PDF and QUANTILE functions. Ten random observations from the distribution are generated using the RAND function. Some rounding of the power parameter is done to avoid computational problems in generating random values.
proc genmod; model y= / dist=tweedie; ods output parameterestimates=pe; run; proc transpose data=pe out=tpe; id parameter; var estimate; run; data twee; set tpe; mean = exp(intercept); p = power; phi = dispersion; var = phi*mean**p; med = quantile("tweedie",0.5,p,mean,phi); PrGT60 = sdf("tweedie",60,p,mean,phi); do i=1 to 1000; RanTW = rand("tweedie",round(p,.001),mean,phi); output; end; run; proc print; run;
Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  All  n/a 
Type:  Usage Note 
Priority:  low 
Topic:  SAS Reference ==> Procedures ==> GENMOD Analytics ==> Distribution Analysis Analytics ==> Descriptive Statistics Analytics ==> Simulation SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Functions ==> Probability ==> PDF SAS Reference ==> Functions ==> Probability ==> SDF SAS Reference ==> Quantile ==> Quantile ==> QUANTILE SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Functions ==> Random Number ==> RAND SAS Reference ==> Functions ==> Probability ==> CDF SAS Reference ==> Functions ==> Probability ==> LOGCDF SAS Reference ==> Functions ==> Probability ==> LOGPDF SAS Reference ==> Functions ==> Probability ==> LOGSDF SAS Reference ==> Functions ==> Probability ==> POISSON SAS Reference ==> Functions ==> Probability ==> PROBGAM SAS Reference ==> Functions ==> Probability ==> PROBNEGB 
Date Modified:  20210903 13:34:12 
Date Created:  20040913 14:29:35 