SAS/STAT Software

Longitudinal Data Analysis

Longitudinal data (also known as panel data) arises when you measure a response variable of interest repeatedly through time for multiple subjects. Thus, longitudinal data combines the characteristics of both cross-sectional data and time-series data. The response variables in longitudinal studies can be either continuous or discrete. The objective of a statistical analysis of longitudinal data is usually to model the expected value of the response variable as either a linear or nonlinear function of a set of explanatory variables. Statistical analysis of longitudinal data requires an accounting for possible between-subject heterogeneity and within-subject correlation. SAS/STAT software provides two approaches for modeling longitudinal data: marginal models (also known as population-average models) and mixed models (also known as subject-specific models).

The SAS/STAT longitudinal data analysis procedures include the following:

GEE Procedure

The GEE procedure fits generalized linear models for longitudinal data by using the generalized estimating equations (GEE) estimation method of Liang and Zeger (1986). The GEE method fits a marginal model to longitudinal data and is commonly used to analyze longitudinal data when the population-average effect is of interest. The following are highlights of the GEE procedure's features:

 perform weighted GEE estimation when there are missing data that are missing at random (MAR) supports the following response variable distributions: binomial gamma inverse Gaussian negative binomial normal Poisson multinomial supports the following link functions: complementary log-log identity log logit probit reciprocal power with exponent -2 supports the following correlation structures: first order autoregressive exchangeable independent m-dependent unstructured fixed (user specified) performs alternating logistic regression analysis for ordinal and binary data supports ESTIMATE, LSMEANS, and OUTPUT statements creates a SAS data set that corresponds to any output table automatically creates graphs by using ODS Graphics
For further details, see GEE Procedure

GENMOD Procedure

The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. The following are highlights of the GENMOD procedure's features:

 provides the following built-in distributions and associated variance functions: normal binomial Poisson gamma inverse Gaussian negative binomial geometric multinomial zero-inflated Poisson Tweedie provides the following built-in link functions: identity logit probit power log complementary log-log enables you to define your own link functions or distributions through DATA step programming statements used within the procedure fits models to correlated responses by the GEE method perform Bayesian analysis for generalized linear models performs exact logistic regression performs exact Poisson regression enables you to fit a sequence of models and to perform Type I and Type III analyses between each successive pair of models computes likelihood ratio statistics for user-defined contrasts computes estimated values, standard errors, and confidence limits for user-defined contrasts and least squares means computes confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality produces an overdispersion diagnostic plot for zero-inflated models performs BY group processing, which enables you to obtain separate analyses on grouped observations creates SAS data sets that correspond to most output tables automatically generates graphs by using ODS Graphics
For further details, see GENMOD Procedure

GLIMMIX Procedure

The GLIMMIX procedure fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. These models are known as generalized linear mixed models (GLMM). GLMMs, like linear mixed models, assume normal (Gaussian) random effects. Conditional on these random effects, data can have any distribution in the exponential family. The following are highlights of the GLIMMIX procedure's features:

 provides the following built-in link functions: cumulative complementary log-log cumulative logit cumulative log-log cumulative probit complementary log-log generalized logit identity log logit log-log probit power with exponent λ = number power with exponent -2 reciprocal provides the following built-in distributions and associated variance functions: beta binary binomial exponential gamma normal geometric inverse gaussian lognormal negative binomial Poisson t use SAS programming statements within the procedure to compute model effects, weights, frequency, subject, group, and other variables, and to define mean and variance functions fits covariance structures including: ANTE(1) AR(1) ARH(1) ARMA(1,1) Cholesky compound symmetry heterogeneous compound symmetry factor analytic Huynh-Feldt general linear P-spline radial smoother simple exponential spatial gaussian Matern power anisitropic power spherical Toeplitz unstructured permits subject and group effects that enable blocking and heterogeneity, respectively permits weighted multilevel models for analyzing survey data that arise from multistage sampling choice of linearization approach or integral approximation by quadrature or Laplace method for mixed models with nonlinear random effects or nonnormal distribution choice of linearization about expected values or expansion about current solutions of best linear unbiased predictors (BLUP) flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures produce hypothesis tests and estimable linear combinations of effects provides a mechanism to obtain inferences for the covariance parameters. Significance tests are based on the ratio of (residual) likelihoods or pseudo-likelihoods. Confidence limits and bounds are computed as Wald or likelihood ratio limits. construct special collections of columns for the design matrices in your model. These special collections, which are referred to as constructed effects can include the following: COLLECTION is a collection effect defining one or more variables as a single effect with multiple degrees of freedom. The variables in a collection are considered as a unit for estimation and inference. MULTIMEMBER | MM is a multimember classification effect whose levels are determined by one or more variables that appear in a CLASS statement. POLYNOMIAL | POLY is a multivariate polynomial effect in the specified numeric variables. SPLINE is a regression spline effect whose columns are univariate spline expansions of one or more variables. A spline expansion replaces the original variable with an expanded or larger set of new variables. provides the following estimation methods: RSPL MSPL RMPL MMPL Laplace adaptive quadrature enables you to exercise control over the numerical optimization. You can choose techniques, update methods, line search algorithms, convergence criteria, and more. Or, you can choose the default optimization strategies selected for the particular class of model you are fitting. enables you to generate variables with SAS programming statements inside of PROC GLIMMIX (except for variables listed in the CLASS statement). performs grouped data analysis supports BY group processing, which enebales you to obtain separate analyses on grouped observations use ODS to create a SAS data set corresponding to any table automaticlly generates graphs by using ODS Graphics
For further details, see GLIMMIX Procedure

MIXED Procedure

The MIXED procedure fits a variety of mixed linear models to data and enables you to use these fitted models to make statistical inferences about the data. A mixed linear model is a generalization of the standard linear model used in the GLM procedure, the generalization being that the data are permitted to exhibit correlation and nonconstant variability. The mixed linear model, therefore, provides you with the flexibility of modeling not only the means of your data (as in the standard linear model) but their variances and covariances as well. The following are highlights of the MIXED procedure's features:

 fits general linear models with fixed and random effects under the assumption that the data are normally distributed. The types of models include: simple regression multiple regression analysis of variance for balanced or unbalanced data analysis of covariance response surface models weighted regression polynomial regression multivariate analysis of variance (MANOVA) partial correlation repeated measures analysis of variance fits covariance structures including: variance components compound symmetry unstructured AR(1) and (ARMA(1,1,) Toeplitz spatial general linear factor analytic offers six estimation methods for the covariance parameters including: Restricted Maximum Likelihood (REML) Maximum Likelihood (ML) Method of Moments MIVQUE0 Type I Type II Type III uses PROC GLM - type syntax by using MODEL, RANDOM, and REPEATED statements for model specification and CONTRAST, ESTIMATE, and LSMEANS statements for inferences provides appropriate standard errors for all specified estimable linear combinations of fixed and random effects, and corresponding t and F tests enables you to construct custom hypothesis tests enables you to construct custom scalar estimates and their confidence limits computes least square means and least square mean differences for classification fixed effects permits subject and group effects that enable blocking and heterogeneity, respectively performs multiple comparison of main effect means accommodates unbalanced data computes Type I, Type II, and Type III tests of fixed effects performs sampling-based Bayesian analysis performs weighted estimation performs BY group processing, which enables you to obtain separate analyses on grouped observations creates a SAS data set that corresponds to any output table automatically creates graphs by using ODS Graphics
For further details, see MIXED Procedure