• Print  |
  • Feedback  |

FOCUS AREAS

SAS/STAT Capabilities

SAS/STAT Software

Categorical Data Analysis

There are two approaches to performing categorical data analyses. The first computes statistics based on tables defined by categorical variables (variables that assume only a limited number of discrete values), performs hypothesis tests about the association between these variables, and requires the assumption of a randomized process; call these methods randomization procedures. The other approach investigates the association by modeling a categorical response variable, regardless of whether the explanatory variables are continuous or categorical; call these methods modeling procedures.

The SAS/STAT categorical data analysis procedures include the following:

CATMOD Procedure


The CATMOD procedure performs categorical data modeling of data that can be represented by a contingency table. PROC CATMOD fits linear models to functions of response frequencies, and it can be used for linear modeling, log-linear modeling, logistic regression, and repeated measurement analysis. The procedure enables you to do the following:

  • estimate model parameters by using weighted least squares (WLS) for a wide range of general linear models or maximum likelihood (ML) for log-linear models and the analysis of generalized logits
  • supply raw data, where each observation is a subject, supply cell count data, where each observation is a cell in a contingency table, or directly input a covariance matrix
  • construct linear functions of the model parameters or log-linear effects and test the hypothesis that the linear combination equals zero
  • perform constrained estimation
  • perform BY group precessing, which enables you to obtain separate analyses on grouped observations
  • create a data set that contains the observed and predicted values of the response functions, their standard errors, the residuals, and variables that describe the population and response profiles. In addition, if you use the standard response functions, the data set includes observed and predicted values for the cell frequencies or the cell probabilities, together with their standard errors and residuals.
  • create a data set that contains the estimated parameter vector and its estimated covariance matrix
  • create a data set that corresponds to any output table
For further details, see CATMOD Procedure

FREQ Procedure


The FREQ procedure produces one-way to n-way frequency and contingency (crosstabulation) tables. For two-way tables, PROC FREQ computes tests and measures of association. For n-way tables, PROC FREQ provides stratified analysis by computing statistics across, as well as within, strata. The following are highlights of the FREQ procedure's features:

  • computes goodness-of-fit tests for equal proportions or specified null proportions for one-way frequency tables
  • provides confidence limits and tests for binomial proportions, including tests for noninferiority and equivalence for one-way frequency tables
  • compute various statistics to examine the relationships between two classification variables. The statistics for contingency tables include the following:
    • chi-square tests and measures
    • measures of association
    • risks (binomial proportions) and risk differences for 2 x 2 tables
    • odds ratios and relative risks for 2 x 2 tables
    • tests for trend
    • tests and measures of agreement
    • Cochran-Mantel-Haenszel statistics
  • computes asymptotic standard errors, confidence intervals, and tests for measures of association and measures of agreement
  • computes score confidence limits for odds ratios
  • computes exact p-values, exact mid-p-values, and confidence intervals for many test statistics and measures
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • accepts either raw data or cell count data to produce frequency and crosstabulation tables
  • creates a SAS data set that contains the computed statistics
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see FREQ Procedure

FMM Procedure


The FMM procedure fits statistical models to data for which the distribution of the response is a finite mixture of univariate distributions–that is, each response comes from one of several random univariate distributions with unknown probabilities. The following are highlights of the FMM procedure's features:

  • model the component distributions in addition to the mixing probabilities
  • fit finite mixture models by maximum likelihood or Bayesian methods
  • fit finite mixtures of regression and generalized linear models
  • define the model effects for the mixing probabilities and their link function
  • model overdispersed data
  • estimate multimodal or heavy-tailed densities
  • fit zero-inflated or hurdle models to count data with excess zeros
  • fit regression models with complex error distributions
  • classify observations based on predicted component probabilities
  • twenty five different response distributions
  • linear equality and inequality constraints on model parameters
  • specify the response variable by using either the response syntax or the events/trials syntax
  • automated model selection for homogeneous mixtures
  • weighted estimation
  • control the performance characteristics of the procedure (for example, the number of CPUs, the number of threads for multithreading, and so on)
  • obtain separate analyses on observations in groups
  • create a data set that contains observationwise statistics that are computed after fitting the model
  • create a SAS data set corresponding to any output table
  • automatically create graphs by using ODS Graphics
For further details, see FMM Procedure

GENMOD Procedure


The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. The following are highlights of the GENMOD procedure's features:

  • provides the following built-in distributions and associated variance functions:
    • normal
    • binomial
    • Poisson
    • gamma
    • inverse Gaussian
    • negative binomial
    • geometric
    • multinomial
    • zero-inflated Poisson
    • Teedie
  • provides the following built-in link functions:
    • identity
    • logit
    • probit
    • power
    • log
    • complementary log-log
  • enables you to define your own link functions or distributions through DATA step programming statements used within the procedure
  • fits models to correlated responses by the GEE method
  • perform Bayesian analysis for generalized linear models
  • performs exact logistic regression
  • performs exact Poisson regression
  • enables you to fit a sequence of models and to perform Type I and Type III analyses between each successive pair of models
  • computes likelihood ratio statistics for user-defined contrasts
  • computes estimated values, standard errors, and confidence limits for user-defined contrasts and least squares means
  • computes confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates SAS data sets that correspond to most output tables
  • automatically generates graphs by using ODS Graphics
For further details, see GENMOD Procedure

LOGISTIC Procedure


The LOGISTIC procedure fits linear logistic regression models for discrete response data by the method of maximum likelihood. It can also perform conditional logistic regression for binary response data and exact logistic regression for binary and nominal response data. The maximum likelihood estimation is carried out with either the Fisher scoring algorithm or the Newton-Raphson algorithm, and you can perform the bias-reducing penalized likelihood optimization as discussed by Firth (1993) and Heinze and Schemper (2002). You can specify starting values for the parameter estimates. The logit link function in the logistic regression models can be replaced by the probit function, the complementary log-log function, or the generalized logit function. The LOGISTIC procedure enables you to do the following:

  • fit stratified conditional logistic regression of binary response data
  • fit partial proportional odds logistic regression models
  • add or relax constraints on parameters in nominal response models and partial proportional odds models
  • compute the partial correlation statistic for each model parameter (excluding the intercept)
  • control the ordering of the response categories
  • compute a generalized R2 measure for the fitted model
  • reclassify binary response observations according to their predicted response probabilities
  • test linear hypotheses about the regression parameters
  • perform exact tests of the parameters for the specified effects and optionally estimates the parameters and exact conditional distributions
  • specify contrasts to compare several receiver operating characteristic curves
  • score a data set by using a previously fitted model
  • specify units of change for continuous explanatory variables so that customized odds ratios can be estimated
  • perform BY group processing, which enables you to obtain separate analyses on grouped observations
  • perform weighted estimation
  • create a data set for producing a receiver operating characteristic curve for each fitted model
  • create a data set that contains the estimated response probabilities, residuals, and influence diagnostics
  • create a data set that contains the estimated parameter vector and its estimated covariance matrix
  • create a data set that corresponds to any output table
  • automatically create graphs by using ODS Graphics
For further details, see LOGISTIC Procedure

PROBIT Procedure


The PROBIT procedure calculates maximum likelihood estimates of regression parameters and the natural (or threshold) response rate for quantal response data from biological assays or other discrete event data. This includes probit, logit, ordinal logistic, and extreme value (or gompit) regression models. The following are highlights of the PROBIT procedure's features:

  • performs chi-square tests for model effects that test Type I, Type II, or Type III hypotheses
  • provides amechanism for performing custom hypothesis tests
  • produces a display of the fitted model and provides options for changing and enhancing the displays
  • plots the predicted cumulative distribution function (CDF) of the multinomial response variable as a function of a single continuous independent variable (dose variable)
  • plots the inverse of the predicted probability (IPP) against a single continuous variable (dose variable) for the binomial model
  • plots the linear predictor (LPRED) x'b against a single continuous variable (dose variable) for either the binomial model or the multinomial model
  • plots the predicted probability against a single continuous variable (dose variable) for both the binomial model and the multinomial model
  • computes and compares least squares means (LS-means) of fixed effects
  • provides custom hypothesis tests among least squares means
  • performs a partitioned analysis of the LS-means for an interaction
  • perform weighted estimation
  • enables you to save the context and results of the statistical analysis in an item store, which can be processed with the PLM procedure
  • creates a SAS data set that contains the parameter estimates and their estimated covariances
  • creates a SAS data set that contains the input data, the fitted probabilities, the linear prediction and the estimate of its standard error
  • creates a SAS data set that corresponds to any output table
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • automatically creates graphs by using ODS Graphics
For further details, see PROBIT Procedure