The CORR Procedure

Example 2.2 Computing Correlations between Two Sets of Variables

The following statements create the data set Setosa, which contains measurements for four iris parts from Fisher’s iris data (1936): sepal length, sepal width, petal length, and petal width. The data set has been altered to contain some missing values.

*------------------- Data on Iris Setosa --------------------*
| The data set contains 50 iris specimens from the species   |
| Iris Setosa with the following four measurements:          |
| SepalLength (sepal length)                                 |
| SepalWidth  (sepal width)                                  |
| PetalLength (petal length)                                 |
| PetalWidth  (petal width)                                  |
| Certain values were changed to missing for the analysis.   |
*------------------------------------------------------------*;
data Setosa;
  input SepalLength SepalWidth PetalLength PetalWidth @@;
  label sepallength='Sepal Length in mm.'
        sepalwidth='Sepal Width in mm.'
        petallength='Petal Length in mm.'
        petalwidth='Petal Width in mm.';
  datalines;
50 33 14 02  46 34 14 03  46 36 .  02
51 33 17 05  55 35 13 02  48 31 16 02
52 34 14 02  49 36 14 01  44 32 13 02
50 35 16 06  44 30 13 02  47 32 16 02
48 30 14 03  51 38 16 02  48 34 19 02
50 30 16 02  50 32 12 02  43 30 11 .
58 40 12 02  51 38 19 04  49 30 14 02
51 35 14 02  50 34 16 04  46 32 14 02
57 44 15 04  50 36 14 02  54 34 15 04
52 41 15 .   55 42 14 02  49 31 15 02
54 39 17 04  50 34 15 02  44 29 14 02
47 32 13 02  46 31 15 02  51 34 15 02
50 35 13 03  49 31 15 01  54 37 15 02
54 39 13 04  51 35 14 03  48 34 16 02
48 30 14 01  45 23 13 03  57 38 17 03
51 38 15 03  54 34 17 02  51 37 15 04
52 35 15 02  53 37 15 02
;

The following statements request a correlation analysis between two sets of variables, the sepal measurements (length and width) and the petal measurements (length and width):

ods graphics on;
title 'Fisher (1936) Iris Setosa Data';
proc corr data=Setosa sscp cov plots=matrix;
   var  sepallength sepalwidth;
   with petallength petalwidth;
run;
ods graphics off;

The Simple Statistics table in Output 2.2.1 displays univariate statistics for variables in the VAR and WITH statements.

Output 2.2.1: Simple Statistics

Fisher (1936) Iris Setosa Data

The CORR Procedure

2 With Variables: PetalLength PetalWidth
2 Variables: SepalLength SepalWidth

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
PetalLength 49 14.71429 1.62019 721.00000 11.00000 19.00000 Petal Length in mm.
PetalWidth 48 2.52083 1.03121 121.00000 1.00000 6.00000 Petal Width in mm.
SepalLength 50 50.06000 3.52490 2503 43.00000 58.00000 Sepal Length in mm.
SepalWidth 50 34.28000 3.79064 1714 23.00000 44.00000 Sepal Width in mm.


When the WITH statement is specified together with the VAR statement, the CORR procedure produces rectangular matrices for statistics such as covariances and correlations. The matrix rows correspond to the WITH variables (PetalLength and PetalWidth), while the matrix columns correspond to the VAR variables (SepalLength and SepalWidth). The CORR procedure uses the WITH variable labels to label the matrix rows.

The SSCP option requests a table of the uncorrected sum-of-squares and crossproducts matrix, and the COV option requests a table of the covariance matrix. The SSCP and COV options also produce a table of the Pearson correlations.

The sum-of-squares and crossproducts statistics for each pair of variables are computed by using observations with nonmissing row and column variable values. The Sums of Squares and Crossproducts table in Output 2.2.2 displays the crossproduct, sum of squares for the row variable, and sum of squares for the column variable for each pair of variables.

Output 2.2.2: Sums of Squares and Crossproducts

Sums of Squares and Crossproducts
SSCP / Row Var SS / Col Var SS
  SepalLength SepalWidth
PetalLength
Petal Length in mm.
36214.00000
10735.00000
123793.0000
24756.00000
10735.00000
58164.0000
PetalWidth
Petal Width in mm.
6113.00000
355.00000
121356.0000
4191.00000
355.00000
56879.0000


The variances are computed by using observations with nonmissing row and column variable values. The Variances and Covariances table in Output 2.2.3 displays the covariance, variance for the row variable, variance for the column variable, and associated degrees of freedom for each pair of variables.

Output 2.2.3: Variances and Covariances

Variances and Covariances
Covariance / Row Var Variance / Col Var Variance / DF
  SepalLength SepalWidth
PetalLength
Petal Length in mm.
1.270833333
2.625000000
12.33333333
48
1.363095238
2.625000000
14.60544218
48
PetalWidth
Petal Width in mm.
0.911347518
1.063386525
11.80141844
47
1.048315603
1.063386525
13.62721631
47


When there are missing values in the analysis variables, the Pearson Correlation Coefficients table in Output 2.2.4 displays the correlation, the $p$-value under the null hypothesis of zero correlation, and the number of observations for each pair of variables. Only the correlation between PetalWidth and SepalLength and the correlation between PetalWidth and SepalWidth are slightly positive.

Output 2.2.4: Pearson Correlation Coefficients

Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
  SepalLength SepalWidth
PetalLength
Petal Length in mm.
0.22335
0.1229
49
0.22014
0.1285
49
PetalWidth
Petal Width in mm.
0.25726
0.0775
48
0.27539
0.0582
48


When ODS Graphics is enabled, the PLOTS= option displays a scatter matrix plot by default. Output 2.2.5 displays a rectangular scatter plot matrix for the two sets of variables: the VAR variables SepalLength and SepalWidth are listed across the top of the matrix, and the WITH variables PetalLength and PetalWidth are listed down the side of the matrix. As measured in Output 2.2.4, the plot for PetalWidth and SepalLength and the plot for PetalWidth and SepalWidth also show slight positive correlations.

Output 2.2.5: Rectangular Matrix Plot


Note that this graphical display is requested by enabling ODS Graphics and by specifying the PLOTS= option. For more information about ODS Graphics, see Chapter 21: Statistical Graphics Using ODS in SAS/STAT User's Guide.