Usage Note 24561: Display a correlation matrix showing only the lower triangular portion
You can display a correlation matrix showing only the lower triangular portion by using PROC DISTANCE, a DATA step, or the methods discussed in this posting to the Graphically Speaking blog.
Because PROC DISTANCE computes distances (or correlations) among the observations (rows) rather than among the variables (columns), it is necessary to transpose the data before running the procedure. This makes it more suitable for smaller data sets (less than 30,000 observations).
The following example generates some example data and uses PROC CORR to compute and display the complete correlation matrix.
data a;
do rep=1 to 10;
y1=rannor(1324);
y2=rannor(1324);
y3=rannor(1324);
y4=rannor(1324);
y5=rannor(1324);
output;
end;
run;
proc corr data=a noprob out=fullcorr;
var y1-y5;
run;
1.00000 |
0.62985 |
0.14778 |
0.65209 |
-0.18110 |
0.62985 |
1.00000 |
-0.04293 |
0.61048 |
-0.10540 |
0.14778 |
-0.04293 |
1.00000 |
-0.05108 |
-0.56446 |
0.65209 |
0.61048 |
-0.05108 |
1.00000 |
-0.08904 |
-0.18110 |
-0.10540 |
-0.56446 |
-0.08904 |
1.00000 |
|
PROC TRANSPOSE prepares the data, and PROC DISTANCE computes the lower triangular correlation matrix and saves it in an output data set. Note that the columns of the transposed data set are named COL1, COL2, COL3, up to COLn, where n is the number of observations of the input data set. In PROC DISTANCE, the METHOD=CORR option requests that PROC DISTANCE compute correlations. The VAR statement indicates that the variables being analyzed are interval level variables and uses the wildcard COL: to select all variables with names beginning with COL. The ID statement uses the _NAME_ variable from PROC TRANSPOSE to restore the original variable names for the columns of the correlation matrix.
proc transpose data=a;
var y1-y5;
run;
proc distance method=corr;
var interval(col:);
id _name_;
run;
proc print noobs;
var y1-y5;
run;
1.00000 |
. |
. |
. |
. |
0.62985 |
1.00000 |
. |
. |
. |
0.14778 |
-0.04293 |
1.00000 |
. |
. |
0.65209 |
0.61048 |
-0.05108 |
1.00000 |
. |
-0.18110 |
-0.10540 |
-0.56446 |
-0.08904 |
1 |
|
Alternatively, the following DATA step produces the same result as above by using the full correlation matrix created by the OUTP= option in PROC CORR and therefore can be used with any size data set. The WHERE and ARRAY statements select only the observations and variables from the OUTP= data set containing the correlation matrix. _NUMERIC_ selects all numeric variables in the data set. The DO loop sets all values in the upper triangle to missing values.
data lower_tri;
set fullcorr;
where _type_='CORR';
array _corrs (*) _numeric_;
do _i=1 to dim(_corrs);
if (_i > _n_) then _corrs[_i]=.;
end;
drop _i;
run;
proc print data=lower_tri noobs;
var y1-y5;
run;
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> DISTANCE Analytics ==> Transformations SAS Reference ==> Procedures ==> CORR Analytics ==> Descriptive Statistics Analytics ==> Spatial Analysis Analytics ==> Nonparametric Analysis
|
Date Modified: | 2019-05-06 15:42:03 |
Date Created: | 2007-04-26 14:33:14 |