requests that the procedure construct confidence intervals that have confidence level 1number for each parameter estimate. The value of number must be between 0 and 1; the default is 0.05.
requests that PROC FMM construct confidence limits for each parameter estimate. The confidence level is 0.95 by default; you can change it by using the ALPHA= option.
specifies the probability distribution for a mixture component.
If you specify the DIST= option and you do not specify a link function by using the LINK= option, a default link function is chosen (see Table 39.5). If you do not specify a distribution, the FMM procedure defaults to the normal distribution for continuous response variables and to the binary distribution for classification or character variables, unless you use the events/trials syntax in the MODEL statement. If you use the events/trials syntax, PROC FMM defaults to the binomial distribution.
Table 39.5 lists keywords that you can specify for the DISTRIBUTION= option and their corresponding default link functions. For generalized linear models with these distributions, you can find expressions for the loglikelihood functions in the section LogLikelihood Functions for Response Distributions.
Table 39.5: Keyword Values of the DIST= Option
Default Link 


keyword 
Alias 
Distribution 
Function 
BETA 
Beta 
Logit 

BETABINOMIAL 
BETABIN,BB 
Betabinomial 
Logit 
BINARY 
BERNOULLI 
Binary 
Logit 
BINOMIAL 
BIN 
Binomial 
Logit 
BINOMCLUSTER 
BCLUS 
Binomial cluster 
Logit 
CONSTANT <(c)> 
DEGENERATE <(c)> 
Degenerate 
N/A 
DIRICHLETMULTINOMIAL 
DIRIMULT,DM 
Dirichletmultinomial 
Generalized logit 
EXPONENTIAL 
EXPO 
Exponential 
Log 
FOLDEDNORMAL 
FNORMAL 
Folded normal 
Identity 
GAMMA 
GAM 
Gamma 
Log 
GAUSSIAN 
NORMAL 
Normal 
Identity 
GENPOISSON 
GPOISSON 
Generalized Poisson 
Log 
GEOMETRIC 
GEOM 
Geometric 
Log 
INVGAUSS 
IGAUSSIAN, IG 
Inverse Gaussian 
Inverse squared 
(power(–2)) 

LOGNORMAL 
LOGN 
Lognormal 
Identity 
MULTINOMIAL 
MULTI 
Multinomial 
Generalized logit 
MULTINOMCLUSTER 
MCLUS 
Multinomial cluster 
Logit 
NEGBINOMIAL 
NEGBIN, NB 
Negative binomial 
Log 
POISSON 
POI 
Poisson 
Log 
T <()> 
STUDENT <()> 
t 
Identity 
TRUNCEXPO <(a,b)> 
TEXPO <(a,b)> 
Truncated exponential 
Log 
TRUNCLOGN <(a,b)> 
TLOGN <(a,b)> 
Lognormal 
Identity 
TRUNCNEGBIN 
TNEGBIN, TNB 
Negative binomial 
Log 
TRUNCNORMAL <(a,b)> 
TNORMAL <(a,b)> 
Truncated normal 
Identity 
TRUNCPOISSON 
TPOISSON, TPOI 
Truncated Poisson 
Log 
UNIFORM <(a,b)> 
UNIF <(a,b)> 
Uniform 
N/A 
WEIBULL 
Weibull 
Log 
Note that the PROC FMM default link for the gamma or exponential distribution is not the canonical link (the reciprocal link).
The binomial cluster and multinomial cluster models are multiplecomponent models that are described in Morel and Nagaraj (1993); Morel and Neerchal (1997); Neerchal and Morel (1998). See Example 39.1 for an application of the binomial cluster model in a teratological experiment. See Example 39.4 for an application of the multinomial cluster model to housing survey data.
If you use the events/trials syntax, then the default distribution is the binomial distribution and only the following choices are available: DIST=BINOMIAL,
DIST=BETABINOMIAL, and DIST=BINOMCLUSTER. The trials variable is ignored for all other distributions. This enables you to fit models in which some components have a binomial
or binomiallike distribution. For example, suppose that the variable n
is a binomial denominator and the variable logn
is its logarithm. Then the following statements model a twocomponent mixture of a binomial and Poisson count model:
model y/n = ; model + / dist=Poisson offset=logn;
You use the OFFSET= option in the second MODEL statement to specify that the Poisson counts refer to different base counts, because the trial variable n is ignored in the second model.
If DIST=BINOMIAL is specified without the events/trials syntax, then n=1 is used for the default number of trials.
Similarly, if you specify multiple dependent variables by using the response ... response and event ... event /trials notation, the default distribution is the multinomial and only DIST=MULTINOMIAL, DIST=DIRICHLETMULTINOMIAL, and DIST=MULTINOMCLUSTER are available.
DIST=TRUNCNEGBIN and DIST=TRUNCPOISSON are zerotruncated versions of DIST=NEGBINOMIAL and DIST=POISSON, respectively—that is, only the value of 0 is excluded from the support.
For DIST=TRUNCEXPO, DIST=TRUNCLOGN, and DIST=TRUNCNORMAL, you must specify the lower (a) and upper (b) truncation points of the distribution. For example:
DIST=TRUNCEXPO<(a,b)>
DIST=TRUNCLOGN<(a,b)>
DIST=TRUNCNORMAL<(a,b)>
Each of these distributions is the conditional version of its corresponding nontruncated distribution that is confined to the support (inclusive). You can specify a missing value (.) for either a or b to truncate only on the other side; that is, a=. indicates a righttruncated distribution, and b=. indicates a lefttruncated distribution.
For several distribution specifications you can provide additional optional parameters to further define the distribution. These optional parameters are listed in the following:
The number c specifies the value where the mass is concentrated. The default is DIST=CONSTANT(0), so you can add zeroinflation to any model by adding a MODEL statement with DIST=CONSTANT.
The number specifies the degrees of freedom for the (shifted) t distribution. The default is DIST=T(3); this leads to a heavytailed distribution for which the variance is defined. See the section LogLikelihood Functions for Response Distributions for the density function of the shifted distribution.
The values a and b define the support of the uniform distribution, a < b. By default, a = 0 and b = 1.
specifies simple sets of parameter constraints across the components in a MODEL statement; the default is EQUATE=NONE. This option is available only for maximum likelihood estimation. If you specify EQUATE=MEAN, the parameters that determine the mean are reduced to a single set that is applicable to all components in the MODEL statement. If you specify EQUATE=SCALE, a single parameter represents the common scale for all components in the MODEL statement. The EFFECTS option enables you to force the parameters for the chosen model effects to be equal across components; however, the number of parameters is unaffected.
For example, the following statements fit a twocomponent multiple regression model in which the coefficients for variable
logd
vary by component and the intercepts and coefficients for variable dose
are the same for the two components:
proc fmm; model num = dose logd / equate=effects(int dose) k=2; run;
To fix all coefficients across the two components, you can write the MODEL statement as
model num = dose logd / equate=effects(int dose logd) k=2;
or
model num = dose logd / equate=mean k=2;
If you restrict all parameters in a kcomponent MODEL statement to be equal, the FMM procedure reduces the model to k=1.
specifies the number of components the MODEL statement contributes to the overall mixture. For binomial cluster models and multinomial cluster models, this option is not available because these are multiplecomponent models by definition.
specifies the maximum number of components the MODEL statement contributes to the overall mixture.
If the maximum number of components in the mixture, as determined by all KMAX= options, is larger than the minimum number of components, the FMM procedure fits all possible models and displays summary fit information for the sequence of evaluated models. The "best" model according to the CRITERION= option in the PROC FMM statement is then chosen, and the remaining output and analyses performed by PROC FMM pertain to this "best" model.
When you use MCMC methods to estimate the parameters of a mixture, you need to ensure that the chain for a given value of k has converged; otherwise, comparisons among models that have varying numbers of components might not be meaningful. You can use the FITDETAILS option to display summary and diagnostic information for the MCMC chains from each model.
If you specify the KMIN= option but not the KMAX= option, then the default value for the KMAX= option is the value of the KMIN= option (unless KMIN= 0, in which case the KMAX= option is set to 1).
specifies the minimum number of components that the MODEL statement contributes to the overall mixture. When you use MCMC methods to estimate the parameters of a mixture, you need to ensure that the chain for a given value of k has converged; otherwise, comparisons among models that have varying numbers of components might not be meaningful.
requests that the starting values for each analysis (that is, for each unique number of components as determined by the KMIN= and KMAX= options) be determined separately, in the same way as if no other analyses were performed. If you do not specify the KRESTART option, then the starting values for each analysis are based on results from the previous analysis with one less component.
specifies an optional label for the model that is used to identify the model in printed output, on graphics, and in data sets created from ODS tables.
specifies the link function in the model. The keywords and expressions for the associated link functions are shown in Table 39.6.
Table 39.6: Link Functions in MODEL Statement of the FMM Procedure
Link 


LINK= 
Alias 
Function 

CLOGLOG 
CLL 
Complementary loglog 

IDENTITY 
ID 
Identity 

LOG 
Log 


LOGIT 
Logit 


LOGLOG 
Loglog 


PROBIT 
NORMIT 
Probit 

POWER() 
POW() 
Power with exponent = number 

POWERMINUS2 
Power with exponent –2 


RECIPROCAL 
INVERSE 
Reciprocal 

The default link functions for the various distributions are shown in Table 39.5.
requests that no intercept be included in the model. An intercept is included by default, unless the distribution is DIST= CONSTANT or DIST= UNIFORM.
specifies the offset variable function for the linear predictor in the model. An offset variable can be thought of as a regressor variable whose regression coefficient is known to be 1. For example, you can use an offset in a Poisson model when counts have been obtained in time intervals of different lengths. With a log link function, you can model the counts as Poisson variables with the logarithm of the time interval as the offset variable.
specifies starting values for the model parameters. If no PARMS option is specified, the FMM procedure determines starting values by using a datadependent algorithm. To determine initial values for the Markov chain in Bayesian estimation, see the INITIAL= option in the BAYES statement. The specification of the parameters takes the following form: parameters in the mean function precede the scale parameters, and parameters for different components are separated by commas.
The following statements specify starting parameters for a twocomponent normal model. The initial values for the intercepts are 1 and –3; the initial values for the variances are 0.5 and 4.
proc fmm; model y = / k=2 parms(1 0.5, 3 4); run;
You can specify missing values for parameters whose starting values are to be determined by the default method. Only values for parameters that participate in the optimization are specified. The values for model effects are specified on the linear (linked) scale.