Polyserial Correlation

Polyserial correlation measures the correlation between two continuous variables with a bivariate normal distribution, where one variable is observed directly, and the other is unobserved. Information about the unobserved variable is obtained through an observed ordinal variable that is derived from the unobserved variable by classifying its values into a finite set of discrete, ordered values (Olsson, Drasgow, and Dorans 1982).

Let X be the observed continuous variable from a normal distribution with mean and variance , let Y be the unobserved continuous variable, and let be the Pearson correlation between X and Y. Furthermore, assume that an observed ordinal variable D is derived from Y as follows:

     

where are ordered observed values, and are ordered unknown threshold values.

The likelihood function for the joint distribution (X, D) from a sample of observations is

     

where is the normal density function with mean and standard deviation (Drasgow 1986).

The conditional distribution of Y given is normal with mean and variance , where is a standard normal variate. Without loss of generality, assume the variable Y has a standard normal distribution. Then if , the ordered value in D, the resulting conditional density is

     

where is the cumulative normal distribution function.

Cox (1972) derives the maximum likelihood estimates for all parameters , ,   and , ..., . The maximum likelihood estimates for and can be derived explicitly. The maximum likelihood estimate for is the sample mean and the maximum likelihood estimate for is the sample variance

     

The maximum likelihood estimates for the remaining parameters, including the polyserial correlation and thresholds , ..., , can be computed by an iterative process, as described by Cox (1972). The asymptotic standard error of the maximum likelihood estimate of can also be computed after this process.

For a vector of parameters, the information matrix is the negative of the Hessian matrix (the matrix of second partial derivatives of the log likelihood function), and is used in the computation of the maximum likelihood estimates of these parameters. The CORR procedure uses the observed information matrix (the information matrix evaluated at the current parameter estimates) in the computation. After the maximum likelihood estimates are derived, the asymptotic covariance matrix for these parameter estimates is computed as the inverse of the observed information matrix (the information matrix evaluated at the maximum likelihood estimates).

Probability Values

The CORR procedure computes two types of testing for the zero polyserial correlation: the Wald test and the likelihood ratio (LR) test.

Given the maximum likelihood estimate of the polyserial correlation , and its asymptotic standard error , the Wald chi-square test statistic is computed as

     

The Wald statistic has an asymptotic chi-square distribution with one degree of freedom.

For the LR test, the maximum likelihood function assuming zero polyserial correlation is also needed. If , the likelihood function is reduced to

     

In this case, the maximum likelihood estimates for all parameters can be derived explicitly. The maximum likelihood estimates for is the sample mean and the maximum likelihood estimate for is the sample variance

     

In addition, the maximum likelihood estimate for the threshold , , is

     

where is the number of observations in the ordered group of the ordinal variable , and is the total number of observations.

The LR test statistic is computed as

     

where is the likelihood function with the maximum likelihood estimates for all parameters, and is the likelihood function with the maximum likelihood estimates for all parameters except the polyserial correlation, which is set to zero. The LR statistic also has an asymptotic chi-square distribution with one degree of freedom.