The CORR Procedure

Confidence and Prediction Ellipses

When the relationship between two variables is nonlinear or when outliers are present, the correlation coefficient might incorrectly estimate the strength of the relationship. Plotting the data enables you to verify the linear relationship and to identify the potential outliers.

The partial correlation between two variables, after controlling for variables in the PARTIAL statement, is the correlation between the residuals of the linear regression of the two variables on the partialled variables. Thus, if a PARTIAL statement is also specified, the residuals of the analysis variables are displayed in the scatter plot matrix and scatter plots.

The CORR procedure optionally provides two types of ellipses for each pair of variables in a scatter plot. One is a confidence ellipse for the population mean, and the other is a prediction ellipse for a new observation. Both assume a bivariate normal distribution.

Let $\bar{\mb {Z}}$ and $\mb {S}$ be the sample mean and sample covariance matrix of a random sample of size $n$ from a bivariate normal distribution with mean $\bmu $ and covariance matrix $\bSigma $. The variable $\bar{\mb {Z}}-\bmu $ is distributed as a bivariate normal variate with mean zero and covariance $(1/n) \bSigma $, and it is independent of $\mb {S}$. Using Hotelling’s $T^2$ statistic, which is defined as

\[  T^2 = n (\bar{\mb {Z}}-\bmu )’ {\bS }^{-1} (\bar{\mb {Z}}-\bmu )  \]

a $100(1-\alpha )\% $ confidence ellipse for $\bmu $ is computed from the equation

\[  \frac{n}{n-1} (\bar{\mb {Z}}-\bmu )’ {\bS }^{-1} (\bar{\mb {Z}}-\bmu ) = \frac{2}{n-2} F_{2,n-2}(1-\alpha )  \]

where $F_{2,n-2}(1-\alpha )$ is the $(1-\alpha )$ critical value of an $F$ distribution with degrees of freedom $2$ and $n-2$.

A prediction ellipse is a region for predicting a new observation in the population. It also approximates a region that contains a specified percentage of the population.

Denote a new observation as the bivariate random variable $\bZ _\mr {new}$. The variable

\[  \mb {Z}_\mr {new} - \bar{\mb {Z}} = (\mb {Z}_\mr {new}-\bmu ) - (\bar{\mb {Z}}-\bmu )  \]

is distributed as a bivariate normal variate with mean zero (the zero vector) and covariance $(1+1/n) \bSigma $, and it is independent of $\mb {S}$. A $100(1-\alpha )\% $ prediction ellipse is then given by the equation

\[  \frac{n}{n-1} (\bar{\mb {Z}}-\bmu )’ \mb {S}^{-1} (\bar{\mb {Z}}-\bmu ) = \frac{2(n+1)}{n-2} F_{2,n-2}(1-\alpha )  \]

The family of ellipses generated by different critical values of the $F$ distribution has a common center (the sample mean) and common major and minor axis directions.

The shape of an ellipse depends on the aspect ratio of the plot. The ellipse indicates the correlation between the two variables if the variables are standardized (by dividing the variables by their respective standard deviations). In this situation, the ratio between the major and minor axis lengths is

\[  \sqrt {\frac{1+|r|}{1-|r|}}  \]

In particular, if $r=0$, the ratio is 1, which corresponds to a circular confidence contour and indicates that the variables are uncorrelated. A larger value of the ratio indicates a larger positive or negative correlation between the variables.