The CORR Procedure

Pearson Product-Moment Correlation


The Pearson product-moment correlation is a parametric measure of association for two variables. It measures both the strength and the direction of a linear relationship. If one variable X is an exact linear function of another variable Y, a positive relationship exists if the correlation is 1 and a negative relationship exists if the correlation is –1. If there is no linear predictability between the two variables, the correlation is 0. If the two variables are normal with a correlation 0, the two variables are independent. However, correlation does not imply causality because, in some cases, an underlying causal relationship might not exist.

The scatter plot matrix in Figure 2.4 displays the relationship between two numeric random variables in various situations.

Figure 2.4: Correlations between Two Variables

Correlations between Two Variables

The scatter plot matrix shows a positive correlation between variables Y1 and X1, a negative correlation between Y1 and X2, and no clear correlation between Y2 and X1. The plot also shows no clear linear correlation between Y2 and X2, even though Y2 is dependent on X2.

The formula for the population Pearson product-moment correlation, denoted ${\rho }_{xy}$, is

\[ {\rho }_{xy}=\frac{\mr{Cov}(x,y)}{\sqrt {\mr{V}(x) \mr{V}(y)}} = \frac{\mr{E}(\, (x - \mr{E} (x)) (y - \mr{E} (y))\, )}{\sqrt {\mr{E}(x-\mr{E}(x))^{2}\, \mr{E}(y-\mr{E}(y))^{2}}} \]

The sample correlation, such as a Pearson product-moment correlation or weighted product-moment correlation, estimates the population correlation. The formula for the sample Pearson product-moment correlation is

\[ r_{xy}=\frac{\sum _ i ( \, (x_ i-\bar{x})(y_ i-\bar{y})\, )}{\sqrt {\sum _{i}(x_ i-\bar{x})^{2} \, \sum _{i}(y_ i-\bar{y})^2}} \]

where $\bar{x}$ is the sample mean of x and $\bar{y}$ is the sample mean of y. The formula for a weighted Pearson product-moment correlation is

\[ r_{xy}=\frac{\sum _ i \, w_ i(x_ i-\bar{x}_ w)(y_ i-\bar{y}_ w)}{\sqrt {\sum _ i w_ i(x_ i-\bar{x}_ w)^2 \, \sum _ i w_ i(y_ i-\bar{y}_ w)^2}} \]

where $w_ i$ is the weight, $\bar{x}_ w$ is the weighted mean of x, and $\bar{y}_ w$ is the weighted mean of y.

Probability Values

Probability values for the Pearson correlation are computed by treating

\[ t \, = \, {(n-2)}^{1/2} \, {\left(\frac{r^{2}}{1-r^{2}}\right)}^{1/2} \]

as coming from a t distribution with $(n-2)$ degrees of freedom, where r is the sample correlation.