The CORR Procedure

Partial Correlation

Subsections:

Probability Values

A partial correlation measures the strength of a relationship between two variables, while controlling the effect of other variables. The Pearson partial correlation between two variables, after controlling for variables in the PARTIAL statement, is equivalent to the Pearson correlation between the residuals of the two variables after regression on the controlling variables.

Let $\mb{y} = ( y_1, y_2, \ldots , y_ v)$ be the set of variables to correlate and $\mb{z} = ( z_1, z_2, \ldots , z_ p)$ be the set of controlling variables. The population Pearson partial correlation between the ith and the jth variables of $\mb{y}$ given $\mb{z}$ is the correlation between errors $(y_ i-\mr{E}(y_ i))$ and $(y_ j-\mr{E}(y_ j))$ , where

$\mr{E}(y_ i) = {\alpha }_ i + \mb{z} {\bbeta }_ i \, \, \, \, \, \, \mr{and} \, \, \, \, \, \, \mr{E}(y_ j) = {\alpha }_ j + \mb{z} {\bbeta }_ j$

are the regression models for variables $y_ i$ and $y_ j$ given the set of controlling variables $\mb{z}$ , respectively.

For a given sample of observations, a sample Pearson partial correlation between $y_ i$ and $y_ j$ given $\mb{z}$ is derived from the residuals $y_ i - \hat{y_ i}$ and $y_ j - \hat{y_ j}$ , where

$\hat{y}_ i = \hat{\alpha }_ i + \mb{z} \hat{\bbeta }_ i \, \, \, \, \, \, \mr{and} \, \, \, \, \, \, \hat{y}_ j = \hat{\alpha }_ j + \mb{z} \hat{\bbeta }_ j$

are fitted values from regression models for variables $y_ i$ and $y_ j$ given $\mb{z}$ .

The partial corrected sums of squares and crossproducts (CSSCP) of $\mb{y}$ given $\mb{z}$ are the corrected sums of squares and crossproducts of the residuals $\mb{y}-\hat{\mb{y}}$ . Using these partial corrected sums of squares and crossproducts, you can calculate the partial covariances and partial correlations.

PROC CORR derives the partial corrected sums of squares and crossproducts matrix by applying the Cholesky decomposition algorithm to the CSSCP matrix. For Pearson partial correlations, let $S$ be the partitioned CSSCP matrix between two sets of variables, $\mb{z}$ and $\mb{y}$ :

$\begin{eqnarray*} \Strong{S} & = & \left[ \begin{array}{rrr} \Strong{S}_{zz} & \Strong{S}_{zy} \\ \Strong{S}_{zy}’ & \Strong{S}_{yy} \\ \end{array} \right] \end{eqnarray*}$

PROC CORR calculates $S_{yy.z}$ , the partial CSSCP matrix of $\mb{y}$ after controlling for $\mb{z}$ , by applying the Cholesky decomposition algorithm sequentially on the rows associated with $\mb{z}$ , the variables being partialled out.

After applying the Cholesky decomposition algorithm to each row associated with variables $\mb{z}$ , PROC CORR checks all higher-numbered diagonal elements associated with $\mb{z}$ for singularity. A variable is considered singular if the value of the corresponding diagonal element is less than $\varepsilon$ times the original unpartialled corrected sum of squares of that variable. You can specify the singularity criterion $\varepsilon$ by using the SINGULAR= option. For Pearson partial correlations, a controlling variable $\mb{z}$ is considered singular if the $R^2$ for predicting this variable from the variables that are already partialled out exceeds $1-\varepsilon$ . When this happens, PROC CORR excludes the variable from the analysis. Similarly, a variable is considered singular if the $R^2$ for predicting this variable from the controlling variables exceeds $1-\varepsilon$ . When this happens, its associated diagonal element and all higher-numbered elements in this row or column are set to zero.

After the Cholesky decomposition algorithm is applied to all rows associated with $\mb{z}$ , the resulting matrix has the form

$\begin{eqnarray*} T = \left[ \begin{array}{rrr} \Strong{T}_{zz} & \Strong{T}_{zy} \\ 0 & \Strong{S}_{yy.z} \\ \end{array} \right] \end{eqnarray*}$

where $T_{zz}$ is an upper triangular matrix with $T’_{zz}T_{zz} = S’_{zz}$ , $T’_{zz}T_{zy} = S’_{zy}$ , and $S_{yy.z} = S_{yy}- T’_{zy} T_{zy}$ .

If $S_{zz}$ is positive definite, then $T_{zy} = {T’_{zz}}^{-1} S’_{zy}$ and the partial CSSCP matrix $S_{yy.z}$ is identical to the matrix derived from the formula

$S_{yy.z}= S_{yy}-S’_{zy} S’^{-1}_{zz}S_{zy}$

The partial variance-covariance matrix is calculated with the variance divisor (VARDEF= option). PROC CORR then uses the standard Pearson correlation formula on the partial variance-covariance matrix to calculate the Pearson partial correlation matrix.

When a correlation matrix is positive definite, the resulting partial correlation between variables x and y after adjusting for a single variable z is identical to that obtained from the first-order partial correlation formula

$r_{xy.z}=\frac{r_{xy}-r_{xz}r_{yz}}{\sqrt {(1-r^{2}_{xz})(1-r^{2}_{yz})}}$

where $r_{xy}$ , $r_{xz}$ , and $r_{yz}$ are the appropriate correlations.

The formula for higher-order partial correlations is a straightforward extension of the preceding first-order formula. For example, when the correlation matrix is positive definite, the partial correlation between x and y controlling for both z_1 and z_2 is identical to the second-order partial correlation formula

$r_{xy.z_1z_2} = \frac{r_{xy.z_1}-r_{xz_2.z_1}r_{yz_2.z_1}}{\sqrt {(1-r^2_{xz_2.z_1})(1-r^2_{yz_2.z_1})}}$

where $r_{xy.z_1}$ , $r_{xz_2.z_1}$ , and $r_{yz_2.z_1}$ are first-order partial correlations among variables x, y, and z_2 given z_1.

To derive the corresponding Spearman partial rank-order correlations and Kendall partial tau-b correlations, PROC CORR applies the Cholesky decomposition algorithm to the Spearman rank-order correlation matrix and Kendall’s tau-b correlation matrix and uses the correlation formula. That is, the Spearman partial correlation is equivalent to the Pearson correlation between the residuals of the linear regression of the ranks of the two variables on the ranks of the partialled variables. Thus, if a PARTIAL statement is specified with the CORR=SPEARMAN option, the residuals of the ranks of the two variables are displayed in the plot. The partial tau-b correlations range from –1 to 1. However, the sampling distribution of this partial tau-b is unknown; therefore, the probability values are not available.

Probability Values

Probability values for the Pearson and Spearman partial correlations are computed by treating

$\frac{(n-k-2)^{1/2}r}{(1-r^{2})^{1/2}}$

as coming from a t distribution with $(n-k-2)$ degrees of freedom, where r is the partial correlation and k is the number of variables being partialled out.