SAS/STAT Software

Bayesian Analysis

Bayesian methods treat parameters as random variables and define probability as "degrees of belief" (that is, the probability of an event is the degree to which you believe the event is true). It follows that probabilities are subjective and that you can make probability statements about parameters. When performing a Bayesian analysis you begin with a prior belief regarding the probability distribution of an unknown parameter. After learning information from observed data, you change or update your belief about the unknown parameter and obtain a posterior distribution. In theory, Bayesian methods offer simple alternatives to statistical inference—all inferences follow from the posterior distribution. In practice, however, you can obtain the posterior distribution with straightforward analytical solutions only in the most rudimentary problems. Most Bayesian analyses require sophisticated computations, including the use of simulation methods. You generate samples from the posterior distribution and use these samples to estimate the quantities of interest.

The SAS/STAT Bayesian analysis procedures include the following:

BCHOICE Procedure

The BCHOICE procedure fits Bayesian discrete choice models by using MCMC methods. The procedure's capabilities include the following:

  • fits the following types of models:
    • multinomial logit
    • multinomial probit
    • nested logit
    • multinomial logit with random effects
    • multinomial probit with random effects
  • samples directly from the full conditional distribution when possible
  • supports the following sampling algorithms:
    • Metropolis-Hastings approach of Gamerman
    • random walk Metropolis
    • latent variables via the data augmentation method
  • provides a variety of Markov chain convergence diagnostics
  • works with the postprocessing autocall macros that are designed for Bayesian posterior samples
  • supports a CLASS statement for specifying classification variables
  • supports a RESTRICT statement, enabling you to specify boundary requirements and order constraints on fixed effects for logit models
  • multithreaded
  • creates an output data set that contains the posterior samples of all parameters
  • creates an output data set that contains random samples from the posterior predictive distribution of the choice probabilities
  • creates an output data set that corresponds to any output table
  • supports BY group processing
  • automatically produces graphs by using ODS Graphics
For further details, see BCHOICE Procedure

FMM Procedure

The FMM procedure fits statistical models to data for which the distribution of the response is a finite mixture of univariate distributions–that is, each response comes from one of several random univariate distributions with unknown probabilities. The following are highlights of the FMM procedure's features:

  • model the component distributions in addition to the mixing probabilities
  • fit finite mixture models by maximum likelihood or Bayesian methods
  • fit finite mixtures of regression and generalized linear models
  • define the model effects for the mixing probabilities and their link function
  • model overdispersed data
  • estimate multimodal or heavy-tailed densities
  • fit zero-inflated or hurdle models to count data with excess zeros
  • fit regression models with complex error distributions
  • classify observations based on predicted component probabilities
  • twenty five different response distributions
  • linear equality and inequality constraints on model parameters
  • specify the response variable by using either the response syntax or the events/trials syntax
  • automated model selection for homogeneous mixtures
  • weighted estimation
  • control the performance characteristics of the procedure (for example, the number of CPUs, the number of threads for multithreading, and so on)
  • obtain separate analyses on observations in groups
  • create a data set that contains observationwise statistics that are computed after fitting the model
  • create a SAS data set corresponding to any output table
  • automatically create graphs by using ODS Graphics
For further details, see FMM Procedure

GENMOD Procedure

The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. The following are highlights of the GENMOD procedure's features:

  • provides the following built-in distributions and associated variance functions:
    • normal
    • binomial
    • Poisson
    • gamma
    • inverse Gaussian
    • negative binomial
    • geometric
    • multinomial
    • zero-inflated Poisson
    • Tweedie
  • provides the following built-in link functions:
    • identity
    • logit
    • probit
    • power
    • log
    • complementary log-log
  • enables you to define your own link functions or distributions through DATA step programming statements used within the procedure
  • fits models to correlated responses by the GEE method
  • perform Bayesian analysis for generalized linear models
  • performs exact logistic regression
  • performs exact Poisson regression
  • enables you to fit a sequence of models and to perform Type I and Type III analyses between each successive pair of models
  • computes likelihood ratio statistics for user-defined contrasts
  • computes estimated values, standard errors, and confidence limits for user-defined contrasts and least squares means
  • computes confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality
  • produces an overdispersion diagnostic plot for zero-inflated models
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates SAS data sets that correspond to most output tables
  • automatically generates graphs by using ODS Graphics
For further details, see GENMOD Procedure

LIFEREG Procedure

The LIFEREG procedure fits parametric models to failure time data that can be uncensored, right censored, left censored, or interval censored. The models for the response variable consist of a linear effect composed of the covariates and a random disturbance term. The distribution of the random disturbance can be taken from a class of distributions that includes the extreme value, normal, logistic, and, by using a log transformation, the exponential, Weibull, lognormal, log-logistic, and three-parameter gamma distributions. The following are highlights of the LIFEREG procedure's features:

  • estimates the parameters by maximum likelihood with a Newton-Raphson algorithm
  • estimates the standard errors of the parameter estimates from the inverse of the observed information matrix
  • fits an accelerated failure time model that assumes that the effect of independent variables on an event time distribution is multiplicative on the event time
  • computes least square means and least square mean differences for classification effects
  • performs multiple comparison adjustments for the p-values and confidence limits for the least square mean differences
  • estimates linear functions of the model parameters
  • tests hypotheses for linear combinations of the model parameters
  • performs sampling-based Bayesian analysis
  • performs weighted estimation
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a SAS data set that contains the parameter estimates, the maximized log likelihood, and the estimated covariance matrix
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see LIFEREG Procedure

MCMC Procedure

The MCMC procedure is a general purpose Markov chain Monte Carlo (MCMC) simulation procedure that is designed to fit a wide range of Bayesian models. PROC MCMC procedure enables you to do the following:

  • specify a likelihood function for the data, prior distributions for the parameters, and hyperprior distributions if you are fitting hierarchical models
  • obtain samples from the corresponding posterior distributions, produces summary and diagnostic statistics, and save the posterior samples in an output data set that can be used for further analysis
  • analyze data that have any likelihood, prior, or hyperprior as long as these functions are programmable using the SAS data step functions
  • enter parameters into a model linearly or in any nonlinear functional form
  • fit dynamic linear models, state space models, autoregressive models, or other models that have a conditionally dependent structure on either the random-effects parameters or the response variable
  • fit models that contain differential equations or models that require integration
  • use an adaptive blocked random-walk Metropolis algorithm that uses a normal or t proposal distribution by default
  • use a Hamiltonian Monte Carlo algorithm with a fixed step size and predetermined number of steps
  • use a No-U-Turn sampler with the Hamiltonian algorithm
  • create a user defined sampler as an alternative to the default algorithms
  • create a data set that contains random samples from the posterior predictive distribution of the response variable
  • perform BY group processing, which enables you to obtain separate analyses on grouped observations
  • take advantage of multiple processors
  • create a SAS data set that corresponds to any output table
  • automatically create graphs by using ODS Graphics
For further details, see MCMC Procedure

PHREG Procedure

The PHREG procedure performs regression analysis of survival data based on the Cox proportional hazards model. Cox's semiparametric model is widely used in the analysis of survival data to explain the effect of explanatory variables on hazard rates. The following are highlights of the PHREG procedure's features:

  • fits a superset of the Cox model, known as the multiplicative hazards model or the Anderson-Gill model
  • fits frailty models
  • fits competing risk model of Fine and Gray
  • performs stratified analysis
  • includes four methods for handling ties in the failure times
  • provides four methods of variable selection
  • permits an offset in the model
  • performs weighted estimation
  • enables you to use SAS programming statements within the procedure to modify values of the explanatory variables or to create ne explanatory variables
  • tests linear hypotheses about the regression parameters
  • estimates customized hazard ratios
  • performs graphical and numerical assessment of the adequacy of the Cox regression model
  • creates a new SAS data set that contains the baseline function estimates at the event times of each stratum for every specified set of covariates
  • outputs survivor function estimates, residuals, and regression diagnostics
  • performs conditional logistic regression analysis for matched case-control studies
  • fits multinomial logit choice models for discrete choice data
  • performs sampling-based Bayesian analysis
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates an output data set that contains parameter and covariance estimates
  • creates an output data set that contains user-specified statistics
  • creates a SAS data set that corresponds to any output table
  • automatically created graphs by using ODS Graphics
For further details, see PHREG Procedure