PROC MI: Discriminant Function Method for Monotone Missing Data

The MI Procedure

Discriminant Function Method for Monotone Missing Data

The discriminant function method is the default imputation method for classification variables in a data set with a monotone missing pattern.

For a nominal classification variable $\text{[math]}$ with responses 1, ..., $\text{[math]}$ and a set of effects from its preceding variables, if the covariates $\text{[math]}$ , $\text{[math]}$ , ..., $\text{[math]}$ associated with these effects within each group are approximately multivariate normal and the within-group covariance matrices are approximately equal, the discriminant function method (Brand 1999, pp. 95–96) can be used to impute missing values for the variable $\text{[math]}$ .

Denote the group-specific means for covariates $\text{[math]}$ , $\text{[math]}$ , ..., $\text{[math]}$ by

$\text{[math]}$

then the pooled covariance matrix is computed as

$\text{[math]}$

where $\text{[math]}$ is the within-group covariance matrix, $\text{[math]}$ is the group-specific sample size, and $\text{[math]}$ is the total sample size.

In each imputation, new parameters of the group-specific means ( $\text{[math]}$ ), pooled covariance matrix ( $\text{[math]}$ ), and prior probabilities of group membership ( $\text{[math]}$ ) can be drawn from their corresponding posterior distributions (Schafer 1997, p. 356).

Pooled Covariance Matrix and Group-Specific Means

For each imputation, the MI procedure uses either the fixed observed pooled covariance matrix (PCOV=FIXED) or a drawn pooled covariance matrix (PCOV=POSTERIOR) from its posterior distribution with a noninformative prior. That is,

$\text{[math]}$

where $\text{[math]}$ is an inverted Wishart distribution.

The group-specific means are then drawn from their posterior distributions with a noninformative prior

$\text{[math]}$

See the section Bayesian Estimation of the Mean Vector and Covariance Matrix for a complete description of the inverted Wishart distribution and posterior distributions that use a noninformative prior.

Prior Probabilities of Group Membership

The prior probabilities are computed through the drawing of new group sample sizes. When the total sample size $\text{[math]}$ is considered fixed, the group sample sizes $\text{[math]}$ have a multinomial distribution. New multinomial parameters (group sample sizes) can be drawn from their posterior distribution by using a Dirichlet prior with parameters $\text{[math]}$ .

After the new sample sizes are drawn from the posterior distribution of $\text{[math]}$ , the prior probabilities $\text{[math]}$ are computed proportionally to the drawn sample sizes.

See Schafer (1997, pp. 247–255) for a complete description of the Dirichlet prior.

Imputation Steps

The discriminant function method uses the following steps in each imputation to impute values for a nominal classification variable $\text{[math]}$ with $\text{[math]}$ responses:

Draw a pooled covariance matrix $\text{[math]}$ from its posterior distribution if the PCOV=POSTERIOR option is used.
For each group, draw group means $\text{[math]}$ from the observed group mean $\text{[math]}$ and either the observed pooled covariance matrix (PCOV=FIXED) or the drawn pooled covariance matrix $\text{[math]}$ (PCOV=POSTERIOR).
For each group, compute or draw $\text{[math]}$ , prior probabilities of group membership, based on the PRIOR= option:
- PRIOR=EQUAL, $\text{[math]}$ , prior probabilities of group membership are all equal.
- PRIOR=PROPORTIONAL, $\text{[math]}$ , prior probabilities are proportional to their group sample sizes.
- PRIOR=JEFFREYS= $\text{[math]}$ , a noninformative Dirichlet prior with $\text{[math]}$ is used.
- PRIOR=RIDGE= $\text{[math]}$ , a ridge prior is used with $\text{[math]}$ for $\text{[math]}$ and $\text{[math]}$ for $\text{[math]}$ .
With the group means $\text{[math]}$ , the pooled covariance matrix $\text{[math]}$ , and the prior probabilities of group membership $\text{[math]}$ , the discriminant function method derives linear discriminant function and computes the posterior probabilities of an observation belonging to each group

$\text{[math]}$

where $\text{[math]}$ is the generalized squared distance from $\text{[math]}$ to group $\text{[math]}$ .
Draw a random uniform variate $\text{[math]}$ , between 0 and 1, for each observation with missing group value. With the posterior probabilities, $\text{[math]}$ , the discriminant function method imputes $\text{[math]}$ if the value of $\text{[math]}$ is less than $\text{[math]}$ , $\text{[math]}$ if the value is greater than or equal to $\text{[math]}$ but less than $\text{[math]}$ , and so on.

Top of Page