The HPCORR Procedure

Pearson Product-Moment Correlation

Subsections:

Probability Values

The Pearson product-moment correlation is a parametric measure of association for two variables. It measures both the strength and the direction of a linear relationship. If one variable X is an exact linear function of another variable Y, a positive relationship exists if the correlation is 1, and a negative relationship exists if the correlation is $-$ 1. If there is no linear predictability between the two variables, the correlation is 0. If the two variables are normal with a correlation 0, the two variables are independent. Correlation does not imply causality because, in some cases, an underlying causal relationship might not exist.

The formula for the population Pearson product-moment correlation, denoted ${\rho }_{xy}$ , is

${\rho }_{xy}=\frac{\mr{Cov}(x,y)}{\sqrt {\mr{V}(x) \mr{V}(y)}} = \frac{\mr{E}(\, (x - \mr{E} (x)) (y - \mr{E} (y))\, )}{\sqrt {\mr{E}(x-\mr{E}(x))^{2}\, \mr{E}(y-\mr{E}(y))^{2}}}$

The sample correlation, such as a Pearson product-moment correlation or weighted product-moment correlation, estimates the population correlation. The formula for the sample Pearson product-moment correlation is as follows, where $\bar{x}$ is the sample mean of x is the sample mean of y:

$r_{xy}=\frac{\sum _ i ( \, (x_ i-\bar{x})(y_ i-\bar{y})\, )}{\sqrt {\sum _{i}(x_ i-\bar{x})^{2} \, \sum _{i}(y_ i-\bar{y})^2}}$

The formula for a weighted Pearson product-moment correlation is as follows, where $w_ i$ is the weight, $\bar{x}_ w$ is the weighted mean of x, and $\bar{y}_ w$ is the weighted mean of y:

$r_{xy}=\frac{\sum _ i \, w_ i(x_ i-\bar{x}_ w)(y_ i-\bar{y}_ w)}{\sqrt {\sum _ i w_ i(x_ i-\bar{x}_ w)^2 \, \sum _ i w_ i(y_ i-\bar{y}_ w)^2}}$

Probability Values

Probability values for the Pearson correlation are computed by treating the following equation as if it came from a t distribution with $(n-2)$ degrees of freedom, where r is the sample correlation:

$t \, = \, {(n-2)}^{1/2} \, {\left(\frac{r^{2}}{1-r^{2}}\right)}^{1/2}$

The partial variance-covariance matrix is calculated with the variance divisor (specified in the VARDEF= option). PROC HPCORR then uses the standard Pearson correlation formula on the partial variance-covariance matrix to calculate the Pearson partial correlation matrix.

When a correlation matrix is positive definite, the resulting partial correlation between variables x and y after adjusting for a single variable z is identical to that obtained from the following first-order partial correlation formula, where $r_{xy}$ , $r_{xz}$ , and $r_{yz}$ are the appropriate correlations:

$r_{xy.z}=\frac{r_{xy}-r_{xz}r_{yz}}{\sqrt {(1-r^{2}_{xz})(1-r^{2}_{yz})}}$

The formula for higher-order partial correlations is a straightforward extension of the preceding first-order formula. For example, when the correlation matrix is positive definite, the partial correlation between x and y that controls for both z_1 and z_2 is identical to the following second-order partial correlation formula, where $r_{xy.z_1}$ , $r_{xz_2.z_1}$ , and $r_{yz_2.z_1}$ are first-order partial correlations among variables x, y, and z_2 given z_1:

$r_{xy.z_1z_2} = \frac{r_{xy.z_1}-r_{xz_2.z_1}r_{yz_2.z_1}}{\sqrt {(1-r^2_{xz_2.z_1})(1-r^2_{yz_2.z_1})}}$