There are two approaches to performing categorical data analyses. The first computes statistics based
on tables defined by categorical variables (variables that assume only a limited number of discrete values),
performs hypothesis tests about the association between these variables, and requires the assumption of a
randomized process; call these methods randomization procedures.
The other approach investigates the association by modeling a categorical response variable, regardless of
whether the explanatory variables are continuous or categorical; call these methods modeling procedures.
The SAS/STAT categorical data analysis procedures include the following:
 CATMOD Procedure — Categorical data modeling
 FREQ Procedure — Oneway to nway frequency and contingency (crosstabulation) tables
 FMM — Finite mixture models
 GENMOD Procedure — Generalized linear models
 LOGISTIC Procedure — Models with binary, ordinal, or nominal dependent variables
 PROBIT Procedure — Maximum likelihood estimates of regression parameters and the natural
(or threshold) response rate for quantal response data from biological assays or other discrete event data
CATMOD Procedure
The CATMOD procedure performs categorical data modeling of data that can be represented by a contingency table.
PROC CATMOD fits linear models to functions of response frequencies, and it can be used for linear modeling,
loglinear modeling, logistic regression, and repeated measurement analysis.
The procedure enables you to do the following:
 estimate model parameters by using weighted least squares (WLS) for a wide range of general linear
models or maximum likelihood (ML) for loglinear models and the analysis of generalized logits
 supply raw data, where each observation is a subject, supply cell count data,
where each observation is a cell in a contingency table, or directly input a covariance matrix
 construct linear functions of the model parameters or loglinear effects and test the hypothesis that the linear combination equals zero
 perform constrained estimation
 perform BY group precessing, which enables you to obtain separate analyses on grouped observations

 create a data set that contains the observed and predicted values of the response
functions, their standard errors, the residuals, and variables that describe the population and response
profiles. In addition, if you use the standard response functions, the data set includes observed
and predicted values for the cell frequencies or the cell probabilities, together with their standard errors and residuals.
 create a data set that contains the estimated parameter vector and its estimated covariance matrix
 create a data set that corresponds to any output table

For further details, see
CATMOD Procedure
FREQ Procedure
The FREQ procedure produces oneway to nway frequency and contingency (crosstabulation) tables.
For twoway tables, PROC FREQ computes tests and measures of association. For nway tables, PROC FREQ provides
stratified analysis by computing statistics across, as well as within, strata.
The following are highlights of the FREQ procedure's features:
 computes goodnessoffit tests for equal proportions or specified null proportions for oneway frequency tables
 provides confidence limits and tests for binomial proportions, including tests for noninferiority
and equivalence for oneway frequency tables
 compute various statistics to examine the relationships between two classification variables. The statistics for contingency
tables include the following:
 chisquare tests and measures
 measures of association
 risks (binomial proportions) and risk differences for 2 x 2 tables
 odds ratios and relative risks for 2 x 2 tables
 tests for trend
 tests and measures of agreement
 CochranMantelHaenszel statistics

 computes asymptotic standard errors, confidence intervals, and tests for measures
of association and measures of agreement
 computes score confidence limits for odds ratios
 computes exact pvalues, exact midpvalues, and confidence intervals for many test statistics and measures
 performs BY group processing, which enables you to obtain separate analyses on grouped observations
 accepts either raw data or cell count data to produce frequency and crosstabulation tables
 creates a SAS data set that contains the computed statistics
 creates a SAS data set that corresponds to any output table
 automatically creates graphs by using ODS Graphics

For further details, see
FREQ Procedure
FMM Procedure
The FMM procedure fits statistical models to data for which the distribution of the response
is a finite mixture of univariate distributions–that is, each response comes from one of
several random univariate distributions with unknown probabilities.
The following are highlights of the FMM procedure's features:
 model the component distributions in addition to the mixing probabilities
 fit finite mixture models by maximum likelihood or Bayesian methods
 fit finite mixtures of regression and generalized linear models
 define the model effects for the mixing probabilities and their link function
 model overdispersed data
 estimate multimodal or heavytailed densities
 fit zeroinflated or hurdle models to count data with excess zeros
 fit regression models with complex error distributions
 classify observations based on predicted component probabilities
 twenty five different response distributions
 linear equality and inequality constraints on model parameters

 specify the response variable by using either the response syntax or the events/trials syntax
 automated model selection for homogeneous mixtures
 weighted estimation
 control the performance characteristics of the procedure (for example, the number of CPUs, the number of threads for multithreading, and so on)
 obtain separate analyses on observations in groups
 create a data set that contains observationwise statistics that are computed after fitting the model
 create a SAS data set corresponding to any output table
 automatically create graphs by using ODS Graphics

For further details, see
FMM Procedure
GENMOD Procedure
The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized
linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor
through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of
distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal
errors, logistic and probit models for binary data, and loglinear models for multinomial data. Many other useful statistical models
can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution.
The following are highlights of the GENMOD procedure's features:
 provides the following builtin distributions and associated variance functions:
 normal
 binomial
 Poisson
 gamma
 inverse Gaussian
 negative binomial
 geometric
 multinomial
 zeroinflated Poisson
 Teedie
 provides the following builtin link functions:
 identity
 logit
 probit
 power
 log
 complementary loglog
 enables you to define your own link functions or distributions through DATA step
programming statements used within the procedure
 fits models to correlated responses by the GEE method

 perform Bayesian analysis for generalized linear models
 performs exact logistic regression
 performs exact Poisson regression
 enables you to fit a sequence of models and to perform Type I and Type III analyses
between each successive pair of models
 computes likelihood ratio statistics for userdefined contrasts
 computes estimated values, standard errors, and confidence limits for userdefined
contrasts and least squares means
 computes confidence intervals for model parameters based on either the profile
likelihood function or asymptotic normality
 performs BY group processing, which enables you to obtain separate analyses on grouped observations
 creates SAS data sets that correspond to most output tables
 automatically generates graphs by using ODS Graphics

For further details, see
GENMOD Procedure
LOGISTIC Procedure
The LOGISTIC procedure fits linear logistic regression models for discrete response data by the method of maximum likelihood.
It can also perform conditional logistic regression for binary response data and exact logistic regression for binary and nominal
response data. The maximum likelihood estimation is carried out with either the Fisher scoring algorithm or the NewtonRaphson
algorithm, and you can perform the biasreducing penalized likelihood optimization as discussed by Firth (1993) and Heinze and
Schemper (2002). You can specify starting values for the parameter estimates. The logit link function in the logistic regression
models can be replaced by the probit function, the complementary loglog function, or the generalized logit function.
The LOGISTIC procedure enables you to do the following:
 fit stratified conditional logistic regression of binary response data
 fit partial proportional odds logistic regression models
 add or relax constraints on parameters in nominal response models and partial proportional odds models
 compute the partial correlation statistic for each model parameter (excluding the intercept)
 control the ordering of the response categories
 compute a generalized R^{2} measure for the fitted model
 reclassify binary response observations according to their predicted response probabilities
 test linear hypotheses about the regression parameters
 perform exact tests of the parameters for the specified effects and optionally estimates
the parameters and exact conditional distributions
 specify contrasts to compare several receiver operating characteristic curves

 score a data set by using a previously fitted model
 specify units of change for continuous explanatory variables so that customized odds ratios can be estimated
 perform BY group processing, which enables you to obtain separate analyses on grouped observations
 perform weighted estimation
 create a data set for producing a receiver operating characteristic curve for each fitted model
 create a data set that contains the estimated response probabilities, residuals, and influence diagnostics
 create a data set that contains the estimated parameter vector and its estimated covariance matrix
 create a data set that corresponds to any output table
 automatically create graphs by using ODS Graphics

For further details, see
LOGISTIC Procedure
PROBIT Procedure
The PROBIT procedure calculates maximum likelihood estimates of regression parameters and the natural (or threshold) response
rate for quantal response data from biological assays or other discrete event data. This includes probit, logit, ordinal logistic,
and extreme value (or gompit) regression models. The following are highlights of the PROBIT procedure's features:
 performs chisquare tests for model effects that test Type I, Type II, or Type III hypotheses
 provides amechanism for performing custom hypothesis tests
 produces a display of the fitted model and provides options for changing and enhancing the displays
 plots the predicted cumulative distribution function (CDF) of the multinomial response variable as a function of a single continuous independent variable (dose variable)
 plots the inverse of the predicted probability (IPP) against a single continuous variable (dose variable) for the binomial model
 plots the linear predictor (LPRED) x'b against a single continuous variable (dose variable) for either the binomial model or the multinomial model
 plots the predicted probability against a single continuous variable (dose variable) for both the binomial model and the multinomial model
 computes and compares least squares means (LSmeans) of fixed effects
 provides custom hypothesis tests among least squares means

 performs a partitioned analysis of the LSmeans for an interaction
 perform weighted estimation
 enables you to save the context and results of the statistical analysis in an item store, which can be processed with the PLM procedure
 creates a SAS data set that contains the parameter estimates and their estimated covariances
 creates a SAS data set that contains the input data, the fitted probabilities, the linear prediction and the estimate of its standard error
 creates a SAS data set that corresponds to any output table
 performs BY group processing, which enables you to obtain separate analyses on grouped observations
 automatically creates graphs by using ODS Graphics

For further details, see
PROBIT Procedure