FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Longitudinal Data Analysis

Longitudinal data (also known as panel data) arises when you measure a response variable of interest repeatedly through time for multiple subjects. Thus, longitudinal data combines the characteristics of both cross-sectional data and time-series data. The response variables in longitudinal studies can be either continuous or discrete. The objective of a statistical analysis of longitudinal data is usually to model the expected value of the response variable as either a linear or nonlinear function of a set of explanatory variables. Statistical analysis of longitudinal data requires an accounting for possible between-subject heterogeneity and within-subject correlation. SAS/STAT software provides two approaches for modeling longitudinal data: marginal models (also known as population-average models) and mixed models (also known as subject-specific models).

The SAS/STAT longitudinal data analysis procedures include the following:

GEE Procedure


The GEE procedure fits generalized linear models for longitudinal data by using the generalized estimating equations (GEE) estimation method of Liang and Zeger (1986). The GEE method fits a marginal model to longitudinal data and is commonly used to analyze longitudinal data when the population-average effect is of interest. The following are highlights of the GEE procedure's features:

  • perform weighted GEE estimation when there are missing data that are missing at random (MAR)
  • supports the following response variable distributions:
    • binomial
    • gamma
    • inverse Gaussian
    • negative binomial
    • normal
    • Poisson
    • multinomial
  • supports the following link functions:
    • complementary log-log
    • identity
    • log
    • logit
    • probit
    • reciprocal
    • power with exponent -2
  • supports the following correlation structures:
    • first order autoregressive
    • exchangeable
    • independent
    • m-dependent
    • unstructured
    • fixed (user specified)
  • performs alternating logistic regression analysis for ordinal and binary data
  • supports ESTIMATE, LSMEANS, and OUTPUT statements
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see GEE Procedure

GENMOD Procedure


The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. The following are highlights of the GENMOD procedure's features:

  • provides the following built-in distributions and associated variance functions:
    • normal
    • binomial
    • Poisson
    • gamma
    • inverse Gaussian
    • negative binomial
    • geometric
    • multinomial
    • zero-inflated Poisson
    • Tweedie
  • provides the following built-in link functions:
    • identity
    • logit
    • probit
    • power
    • log
    • complementary log-log
  • enables you to define your own link functions or distributions through DATA step programming statements used within the procedure
  • fits models to correlated responses by the GEE method
  • perform Bayesian analysis for generalized linear models
  • performs exact logistic regression
  • performs exact Poisson regression
  • enables you to fit a sequence of models and to perform Type I and Type III analyses between each successive pair of models
  • computes likelihood ratio statistics for user-defined contrasts
  • computes estimated values, standard errors, and confidence limits for user-defined contrasts and least squares means
  • computes confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality
  • produces an overdispersion diagnostic plot for zero-inflated models
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates SAS data sets that correspond to most output tables
  • automatically generates graphs by using ODS Graphics
For further details, see GENMOD Procedure

GLIMMIX Procedure


The GLIMMIX procedure fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. These models are known as generalized linear mixed models (GLMM). GLMMs, like linear mixed models, assume normal (Gaussian) random effects. Conditional on these random effects, data can have any distribution in the exponential family. The following are highlights of the GLIMMIX procedure's features:

  • provides the following built-in link functions:
    • cumulative complementary log-log
    • cumulative logit
    • cumulative log-log
    • cumulative probit
    • complementary log-log
    • generalized logit
    • identity
    • log
    • logit
    • log-log
    • probit
    • power with exponent λ = number
    • power with exponent -2
    • reciprocal
  • provides the following built-in distributions and associated variance functions:
    • beta
    • binary
    • binomial
    • exponential
    • gamma
    • normal
    • geometric
    • inverse gaussian
    • lognormal
    • negative binomial
    • Poisson
    • t
  • use SAS programming statements within the procedure to compute model effects, weights, frequency, subject, group, and other variables, and to define mean and variance functions
  • fits covariance structures including:
    • ANTE(1)
    • AR(1)
    • ARH(1)
    • ARMA(1,1)
    • Cholesky
    • compound symmetry
    • heterogeneous compound symmetry
    • factor analytic
    • Huynh-Feldt
    • general linear
    • P-spline
    • radial smoother
    • simple
    • exponential spatial
    • gaussian
    • Matern
    • power
    • anisitropic power
    • spherical
    • Toeplitz
    • unstructured
  • permits subject and group effects that enable blocking and heterogeneity, respectively
  • permits weighted multilevel models for analyzing survey data that arise from multistage sampling
  • choice of linearization approach or integral approximation by quadrature or Laplace method for mixed models with nonlinear random effects or nonnormal distribution
  • choice of linearization about expected values or expansion about current solutions of best linear unbiased predictors (BLUP)
  • flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures
  • produce hypothesis tests and estimable linear combinations of effects
  • provides a mechanism to obtain inferences for the covariance parameters. Significance tests are based on the ratio of (residual) likelihoods or pseudo-likelihoods. Confidence limits and bounds are computed as Wald or likelihood ratio limits.
  • construct special collections of columns for the design matrices in your model. These special collections, which are referred to as constructed effects can include the following:
    • COLLECTION is a collection effect defining one or more variables as a single effect with multiple degrees of freedom. The variables in a collection are considered as a unit for estimation and inference.
    • MULTIMEMBER | MM is a multimember classification effect whose levels are determined by one or more variables that appear in a CLASS statement.
    • POLYNOMIAL | POLY is a multivariate polynomial effect in the specified numeric variables.
    • SPLINE is a regression spline effect whose columns are univariate spline expansions of one or more variables. A spline expansion replaces the original variable with an expanded or larger set of new variables.
  • provides the following estimation methods:
    • RSPL
    • MSPL
    • RMPL
    • MMPL
    • Laplace
    • adaptive quadrature
  • enables you to exercise control over the numerical optimization. You can choose techniques, update methods, line search algorithms, convergence criteria, and more. Or, you can choose the default optimization strategies selected for the particular class of model you are fitting.
  • enables you to generate variables with SAS programming statements inside of PROC GLIMMIX (except for variables listed in the CLASS statement).
  • performs grouped data analysis
  • supports BY group processing, which enebales you to obtain separate analyses on grouped observations
  • use ODS to create a SAS data set corresponding to any table
  • automaticlly generates graphs by using ODS Graphics
For further details, see GLIMMIX Procedure

MIXED Procedure


The MIXED procedure fits a variety of mixed linear models to data and enables you to use these fitted models to make statistical inferences about the data. A mixed linear model is a generalization of the standard linear model used in the GLM procedure, the generalization being that the data are permitted to exhibit correlation and nonconstant variability. The mixed linear model, therefore, provides you with the flexibility of modeling not only the means of your data (as in the standard linear model) but their variances and covariances as well. The following are highlights of the MIXED procedure's features:

  • fits general linear models with fixed and random effects under the assumption that the data are normally distributed. The types of models include:
    • simple regression
    • multiple regression
    • analysis of variance for balanced or unbalanced data
    • analysis of covariance
    • response surface models
    • weighted regression
    • polynomial regression
    • multivariate analysis of variance (MANOVA)
    • partial correlation
    • repeated measures analysis of variance
  • fits covariance structures including:
    • variance components
    • compound symmetry
    • unstructured
    • AR(1) and (ARMA(1,1,)
    • Toeplitz
    • spatial
    • general linear
    • factor analytic
  • offers six estimation methods for the covariance parameters including:
    • Restricted Maximum Likelihood (REML)
    • Maximum Likelihood (ML)
    • Method of Moments
    • MIVQUE0
    • Type I
    • Type II
    • Type III
  • uses PROC GLM - type syntax by using MODEL, RANDOM, and REPEATED statements for model specification and CONTRAST, ESTIMATE, and LSMEANS statements for inferences
  • provides appropriate standard errors for all specified estimable linear combinations of fixed and random effects, and corresponding t and F tests
  • enables you to construct custom hypothesis tests
  • enables you to construct custom scalar estimates and their confidence limits
  • computes least square means and least square mean differences for classification fixed effects
  • permits subject and group effects that enable blocking and heterogeneity, respectively
  • performs multiple comparison of main effect means
  • accommodates unbalanced data
  • computes Type I, Type II, and Type III tests of fixed effects
  • performs sampling-based Bayesian analysis
  • performs weighted estimation
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see MIXED Procedure