Example 9.1 Correlation

The following statements show how you can define modules to compute correlation coefficients between numeric variables and standardized values for a set of data. For more efficient computations, use the built-in CORR function and the STD function.

proc iml;
   /* Module to compute correlations  */
start corr;
   n = nrow(x);                      /* number of observations */
   sum = x[+,] ;                        /* compute column sums */
   xpx = t(x)*x-t(sum)*sum/n;         /* compute sscp matrix   */
   s = diag(1/sqrt(vecdiag(xpx)));           /* scaling matrix */
   corr = s*xpx*s;                       /* correlation matrix */
   print "Correlation Matrix",,corr[rowname=nm colname=nm] ;
finish corr;

   /* Module to standardize data */
start std;
   mean = x[+,] /n;                       /* means for columns */
   x = x-repeat(mean,n,1);            /* center x to mean zero */
   ss = x[##,] ;                 /* sum of squares for columns */
   std = sqrt(ss/(n-1));         /* standard deviation estimate*/
   x = x*diag(1/std);                  /* scaling to std dev 1 */
   print ,"Standardized Data",,X[colname=nm] ;
finish std;

   /* Sample run */
x = { 1 2 3,
      3 2 1,
      4 2 1,
      0 4 1,
     24 1 0,
      1 3 8};
nm={age weight height};
run corr;
run std;

The results are shown in Output 9.1.1.

Output 9.1.1: Correlation Coefficients and Standardized Values

Correlation Matrix

corr
  AGE WEIGHT HEIGHT
AGE 1 -0.717102 -0.436558
WEIGHT -0.717102 1 0.3508232
HEIGHT -0.436558 0.3508232 1

Standardized Data

x
AGE WEIGHT HEIGHT
-0.490116 -0.322749 0.2264554
-0.272287 -0.322749 -0.452911
-0.163372 -0.322749 -0.452911
-0.59903 1.6137431 -0.452911
2.0149206 -1.290994 -0.792594
-0.490116 0.6454972 1.924871