FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Exact Inference

Exact nonparametric methods have an advantage over asymptotic methods since they remain valid for very small sample sizes, as well as for data that are sparse, skewed, or heavily tied.

The SAS/STAT exact inference procedures include the following:

FREQ Procedure


The FREQ procedure produces one-way to n-way frequency and contingency (crosstabulation) tables. For two-way tables, PROC FREQ computes tests and measures of association. For n-way tables, PROC FREQ provides stratified analysis by computing statistics across, as well as within, strata. The following are highlights of the FREQ procedure's features:

  • computes goodness-of-fit tests for equal proportions or specified null proportions for one-way frequency tables
  • provides confidence limits and tests for binomial proportions, including tests for noninferiority and equivalence for one-way frequency tables
  • compute various statistics to examine the relationships between two classification variables. The statistics for contingency tables include the following:
    • chi-square tests and measures
    • measures of association
    • risks (binomial proportions) and risk differences for 2 x 2 tables
    • odds ratios and relative risks for 2 x 2 tables
    • tests for trend
    • tests and measures of agreement
    • Cochran-Mantel-Haenszel statistics
  • computes asymptotic standard errors, confidence intervals, and tests for measures of association and measures of agreement
  • computes score confidence limits for odds ratios
  • computes exact p-values, exact mid-p-values, and confidence intervals for many test statistics and measures
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • accepts either raw data or cell count data to produce frequency and crosstabulation tables
  • creates a SAS data set that contains the computed statistics
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see FREQ Procedure

GENMOD Procedure


The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. The following are highlights of the GENMOD procedure's features:

  • provides the following built-in distributions and associated variance functions:
    • normal
    • binomial
    • Poisson
    • gamma
    • inverse Gaussian
    • negative binomial
    • geometric
    • multinomial
    • zero-inflated Poisson
    • Tweedie
  • provides the following built-in link functions:
    • identity
    • logit
    • probit
    • power
    • log
    • complementary log-log
  • enables you to define your own link functions or distributions through DATA step programming statements used within the procedure
  • fits models to correlated responses by the GEE method
  • perform Bayesian analysis for generalized linear models
  • performs exact logistic regression
  • performs exact Poisson regression
  • enables you to fit a sequence of models and to perform Type I and Type III analyses between each successive pair of models
  • computes likelihood ratio statistics for user-defined contrasts
  • computes estimated values, standard errors, and confidence limits for user-defined contrasts and least squares means
  • computes confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality
  • produces an overdispersion diagnostic plot for zero-inflated models
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates SAS data sets that correspond to most output tables
  • automatically generates graphs by using ODS Graphics
For further details, see GENMOD Procedure

LOGISTIC Procedure


The LOGISTIC procedure fits linear logistic regression models for discrete response data by the method of maximum likelihood. It can also perform conditional logistic regression for binary response data and exact logistic regression for binary and nominal response data. The maximum likelihood estimation is carried out with either the Fisher scoring algorithm or the Newton-Raphson algorithm, and you can perform the bias-reducing penalized likelihood optimization as discussed by Firth (1993) and Heinze and Schemper (2002). You can specify starting values for the parameter estimates. The logit link function in the logistic regression models can be replaced by the probit function, the complementary log-log function, or the generalized logit function. The LOGISTIC procedure also enables you to do the following:

  • fit stratified conditional logistic regression of binary response data
  • fit partial proportional odds logistic regression models
  • fit adjacent-category logit models to ordinal response data
  • add or relax constraints on parameters in nominal response models and partial proportional odds models
  • compute the partial correlation statistic for each model parameter (excluding the intercept)
  • control the ordering of the response categories
  • compute a generalized R2 measure for the fitted model
  • reclassify binary response observations according to their predicted response probabilities
  • test linear hypotheses about the regression parameters
  • perform exact tests of the parameters for the specified effects and optionally estimates the parameters and exact conditional distributions
  • specify contrasts to compare several receiver operating characteristic curves
  • score a data set by using a previously fitted model
  • specify units of change for continuous explanatory variables so that customized odds ratios can be estimated
  • perform BY group processing, which enables you to obtain separate analyses on grouped observations
  • perform weighted estimation
  • create a data set for producing a receiver operating characteristic curve for each fitted model
  • create a data set that contains the estimated response probabilities, residuals, and influence diagnostics
  • create a data set that contains the estimated parameter vector and its estimated covariance matrix
  • create a data set that corresponds to any output table
  • automatically create graphs by using ODS Graphics
For further details, see LOGISTIC Procedure

MULTTEST Procedure


The MULTTEST procedure addresses the multiple testing problem by adjusting the p-values from a family of hypothesis tests. PROC MULTTEST approaches the multiple testing problem by adjusting the p-values from a family of hypothesis tests. An adjusted p-value is defined as the smallest significance level for which the given hypothesis would be rejected, when the entire family of tests is considered. The decision rule is to reject the null hypothesis when the adjusted p-value is less than α. For most methods, this decision rule controls the familywise error rate at or below the α level. However, the false discovery rate controlling procedures control the false discovery rate at or below the α level. The following are highlights of the MULTTEST procedure's features:

  • provides the following p-value adjustments:
    • Bonferroni
    • Šidák
    • step-down methods
    • Hochberg
    • Hommel
    • Fisher combination
    • bootstrap
    • permutation
    • adaptive methods
    • false discovery rate
    • positive FDR
  • handles data arising from a multivariate one-way ANOVA model, possibly stratified, with continuous and discrete response variables; it can also accept raw p-values as input data
  • performs a t test for the mean for continuous data with or without a homogeneity assumption, and the following statistical tests for discrete data:
    • Cochran-Armitage linear trend test
    • Freeman-Tukey double arcsine test
    • Peto mortality-prevalence (log-rank) test
    • Fisher exact test
  • provides exact versions of the Cochran-Armitage and Peto tests that use permutation distributions and asymptotic versions that use an optional continuity correction.
  • enables you to use a stratification variable to construct Mantel-Haenszel-type tests
  • enables you to perform one- or two-sided tests
  • enables you to specify linear contrasts that compare means or proportions of the treated groups
  • creates output data sets containing raw and adjusted p-values, test statistics and other intermediate calculations, permutation distributions, and resampling information
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a SAS data set that corresponds to any table
  • automatically creates graphs by using ODS Graphics
For further details, see MULTTEST Procedure

NPAR1WAY Procedure


The NPAR1WAY procedure performs nonparametric tests for location and scale differences across a one-way classification. PROC NPAR1WAY also provides a standard analysis of variance on the raw data and tests based on the empirical distribution function. The following are highlights of the NPAR1WAY procedure's features:

  • performs nonparametric tests for location and scale differences across a one-way classification based on the following scores of a response variable
    • Wilcoxon
    • median
    • Van der Waerden (normal)
    • Savage
    • Siegel-Tukey
    • Ansari-Bradley
    • Klotz
    • Mood
    • Conover
    • raw data
  • computes tests based on simple linear rank statistics when the data are classified into two samples
  • computes tests based on one-way ANOVA statistics when the data are classified into more than two samples
  • provides asymptotic, exact p-values, and exact mid-p-values for tests
  • provides Hodges-Lehmann estimate of location shift including exact confidence limits
  • provides tests based on Conover scores inclusing exact tests
  • provides stratified rank-based analysis of two-sample data
  • computes the following empirical distribution function (EDF) statistics:
    • Kolmogorov-Smirnov test
    • Cramer-von Mises test
    • Kuiper test
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see NPAR1WAY Procedure