The GENMOD Procedure

STRATA Statement

  • STRATA variable <(option)> …<variable <(option)>> </ options>;

The STRATA statement names the variables that define strata or matched sets to use in stratified exact logistic regression of binary response data, or a stratified exact Poisson regression of count data. An EXACT statement must also be specified.

Observations that have the same variable values are in the same matched set. For a stratified logistic model, you can analyze 1:1, 1:n, m:n, and general $m_ i$:$n_ i$ matched sets where the number of cases and controls varies across strata. For a stratified Poisson model, you can have any number of observations in each stratum. At least one variable must be specified to invoke the stratified analysis, and the usual unconditional asymptotic analysis is not performed. The stratified logistic model has the form

\[ \mbox{logit}(\pi _{hi})= \alpha _{h} + \mb{x}_{hi}’\bbeta \]

where $\pi _{hi}$ is the event probability for the ith observation in stratum h with covariates $\mb{x}_{hi}$ and where the stratum-specific intercepts $\alpha _{h}$ are the nuisance parameters that are to be conditioned out.

STRATA variables can also be specified in the MODEL statement as classification or continuous covariates; however, the effects are nondegenerate only when crossed with a nonstratification variable. Specifying several STRATA statements is the same as specifying one STRATA statement that contains all the strata variables. The STRATA variables can be either character or numeric, and the formatted values of the STRATA variables determine the levels. Thus, you can also use formats to group values into levels; see the discussion of the FORMAT procedure in the Base SAS Procedures Guide.

The "Strata Summary" table is displayed by default. For an exact logistic regression, it displays the number of strata that have a specific number of events and non-events. For example, if you are analyzing a 1:5 matched study, this table enables you to verify that every stratum in the analysis has exactly one event and five non-events. Strata that contain only events or only non-events are reported in this table, but such strata are uninformative and are not used in the analysis. For an exact Poisson regression, the "Strata Summary" table displays the number of strata that contain a specific number of observations, which enables you to check whether every stratum in the analysis has the same number of observations.

The ASSESSMENT, BAYES, CONTRAST, EFFECTPLOT, ESTIMATE, LSMEANS, LSMESTIMATE, OUTPUT, REPEATED, SLICE, and STORE statements are not available with a STRATA statement. Exact analyses are not performed when you specify a WEIGHT statement, or a model other than LINK=LOGIT with DIST=BIN or LINK=LOG with DIST=POISSON. An OFFSET= variable is not available with exact logistic regression.

The following option can be specified for a stratification variable by enclosing the option in parentheses after the variable name, or it can be specified globally for all STRATA variables after a slash (/).

MISSING

treats missing values ('.', ._, .A, …, .Z for numeric variables and blanks for character variables) as valid STRATA variable values.

The following strata options are also available after the slash:

CHECKDEPENDENCY | CHECK=keyword

specifies which variables are to be tested for dependency before the analysis is performed. The available keywords are as follows:

NONE

performs no dependence checking. Typically, a message about a singular information matrix is displayed if you have dependent variables. Dependent variables can be identified after the analysis by noting any missing parameter estimates.

COVARIATES

checks dependence between covariates and an added intercept. Dependent covariates are removed from the analysis. However, covariates that are linear functions of the strata variable might not be removed, which results in a singular information matrix message being displayed in the SAS log. This is the default.

ALL

checks dependence between all the strata and covariates. This option can adversely affect performance if you have a large number of strata.

NOSUMMARY

suppresses the display of the "Strata Summary" table.

INFO

displays the "Strata Information" table, which includes the stratum number, levels of the STRATA variables that define the stratum, and the total frequency for each stratum. Since the number of strata can be very large, this table is displayed only by request.