The following statements show how you can define modules to compute correlation coefficients between numeric variables and standardized values for a set of data. For more efficient computations, use the built-in CORR function and the STD function.
proc iml;
/* Module to compute correlations */
start corr;
n = nrow(x); /* number of observations */
sum = x[+,] ; /* compute column sums */
xpx = t(x)*x-t(sum)*sum/n; /* compute sscp matrix */
s = diag(1/sqrt(vecdiag(xpx))); /* scaling matrix */
corr = s*xpx*s; /* correlation matrix */
print "Correlation Matrix",,corr[rowname=nm colname=nm] ;
finish corr;
/* Module to standardize data */
start std;
mean = x[+,] /n; /* means for columns */
x = x-repeat(mean,n,1); /* center x to mean zero */
ss = x[##,] ; /* sum of squares for columns */
std = sqrt(ss/(n-1)); /* standard deviation estimate*/
x = x*diag(1/std); /* scaling to std dev 1 */
print ,"Standardized Data",,X[colname=nm] ;
finish std;
/* Sample run */
x = { 1 2 3,
3 2 1,
4 2 1,
0 4 1,
24 1 0,
1 3 8};
nm={age weight height};
run corr;
run std;
The results are shown in Output 9.1.1.
Output 9.1.1: Correlation Coefficients and Standardized Values
| Correlation Matrix |
| corr | |||
|---|---|---|---|
| AGE | WEIGHT | HEIGHT | |
| AGE | 1 | -0.717102 | -0.436558 |
| WEIGHT | -0.717102 | 1 | 0.3508232 |
| HEIGHT | -0.436558 | 0.3508232 | 1 |
| Standardized Data |
| x | ||
|---|---|---|
| AGE | WEIGHT | HEIGHT |
| -0.490116 | -0.322749 | 0.2264554 |
| -0.272287 | -0.322749 | -0.452911 |
| -0.163372 | -0.322749 | -0.452911 |
| -0.59903 | 1.6137431 | -0.452911 |
| 2.0149206 | -1.290994 | -0.792594 |
| -0.490116 | 0.6454972 | 1.924871 |