Analysis of Variance for Categorical Data and Generalized Linear Models

A categorical variable is defined as one that can assume only a limited number of values. For example, a person’s gender is a categorical variable that can assume one of two values. Variables with levels that simply name a group are said to be measured on a nominal scale. Categorical variables can also be measured using an ordinal scale, which means that the levels of the variable are ordered in some way. For example, responses to an opinion poll are usually measured on an ordinal scale, with levels ranging from strongly disagree to no opinion to strongly agree.

For two categorical variables, one measured on an ordinal scale and one measured on a nominal scale, you can assign scores to the levels of the ordinal variable and test whether the mean scores for the different levels of the nominal variable are significantly different. This process is analogous to performing an analysis of variance on continuous data, which can be performed by PROC CATMOD. If there are n nominal variables, rather than 1, then PROC CATMOD can perform an n-way analysis of variance of the mean scores.

For two categorical variables measured on a nominal scale, you can test whether the distribution of the first variable is significantly different for the levels of the second variable. This process is an analysis of variance of proportions, rather than means, and can be performed by PROC CATMOD. The corresponding n-way analysis of variance can also be performed by PROC CATMOD.

See Chapter 8: Introduction to Categorical Data Analysis Procedures, and Chapter 30: The CATMOD Procedure, for more information.

The GENMOD procedure uses maximum likelihood estimation to fit generalized linear models. This family includes models for categorical data such as logistic, probit, and complementary log-log regression for binomial data and Poisson regression for count data, as well as continuous models such as ordinary linear regression, gamma, and inverse Gaussian regression models. PROC GENMOD performs analysis of variance through likelihood ratio and Wald tests of fixed effects in generalized linear models, and provides contrasts and estimates for customized hypothesis tests. It performs analysis of repeated measures data with generalized estimating equation (GEE) methods.

See Chapter 8: Introduction to Categorical Data Analysis Procedures, and Chapter 40: The GENMOD Procedure, for more information.