General Statistics Examples


Example 10.1 Correlation Computation

The following statements show how you can define modules to standardized columns for a matrix of numeric data. For a more robust implementation, see the STANDARD function .

proc iml;
/* Standardize data: Assume no column has 0 variance */
start stdMat(x);
   mean = mean(x);                        /* means for columns */
   cx = x - mean;                     /* center x to mean zero */
   std = std(x);                 /* standard deviation estimate*/
   y = cx / std(x);                    /* scaling to std dev 1 */
   return( y );
finish stdMat;

x = { 1 2 3,
      3 2 1,
      4 2 1,
      0 4 1,
     24 1 0,
      1 3 8};
nm = {age weight height};
std = stdMat(x);
print std[colname=nm label="Standardized Data"];

Output 10.1.1: Standardized Variables

Standardized Data
AGE WEIGHT HEIGHT
-0.490116 -0.322749 0.2264554
-0.272287 -0.322749 -0.452911
-0.163372 -0.322749 -0.452911
-0.59903 1.6137431 -0.452911
2.0149206 -1.290994 -0.792594
-0.490116 0.6454972 1.924871



The columns shown in Output 10.1.1 have zero mean and unit variance.

In a similar way, you can define a module that returns the correlation matrix of numeric data. The following module computes the correlation matrix according to a formula that you might see in a statistics textbook. For a more efficient implementation that supports missing values, use the built-in CORR function .

/* Compute correlations: Assume no missing values  */
start corrMat(x);
   n = nrow(x);                      /* number of observations */
   sum = x[+,];                         /* compute column sums */
   xpx = x`*x - sum`*sum/n;           /* compute sscp matrix   */
   s = diag(1/sqrt(vecdiag(xpx)));           /* scaling matrix */
   corr = s*xpx*s;                       /* correlation matrix */
   return( corr );
finish corrMat;

corr = corrMat(x);
print corr[rowname=nm colname=nm label="Correlation Matrix"];

Output 10.1.2: A Correlation Matrix

Correlation Matrix
  AGE WEIGHT HEIGHT
AGE 1 -0.717102 -0.436558
WEIGHT -0.717102 1 0.3508232
HEIGHT -0.436558 0.3508232 1



There are many equivalent ways to compute a correlation matrix. If you have already written and debugged the STDMAT function, you might want to call that function during the computation of the correlation matrix. The following function is an alternative way to compute the correlation matrix of data:

/* Another way to compute correlations: Assume no missing values */
start corrMat2(x);
   y = StdMat(x);                       /* standardize columns */
   corr = (y`*y)/(nrow(x)-1);            /* correlation matrix */
   return( corr );
finish corrMat2;

c = corrMat2(x);