SAS Institute. The Power to Know

Base SAS(R) 9.2 Procedures Guide: Statistical Procedures

Previous Page | Next Page

The CORR Procedure

Example 2.4 Applications of Fisher’s z Transformation

This example illustrates some applications of Fisher’s $z$ transformation. For details, see the section Fisher’s z Transformation.

The following statements simulate independent samples of variables X and Y from a bivariate normal distribution. The first batch of 150 observations is sampled using a known correlation of 0.3, the second batch of 150 observations is sampled using a known correlation of 0.25, and the third batch of 100 observations is sampled using a known correlation of 0.3.

   
   data Sim (drop=i);
   do i=1 to 400;
     X = rannor(135791);
     Batch = 1 + (i>150) + (i>300);
     if Batch = 1 then Y = 0.3*X + 0.9*rannor(246791);
     if Batch = 2 then Y = 0.25*X + sqrt(.8375)*rannor(246791);
     if Batch = 3 then Y = 0.3*X + 0.9*rannor(246791);
     output;
   end;
   run;

This data set will be used to illustrate the following applications of Fisher’s $z$ transformation:

testing whether a population correlation is equal to a given value

testing for equality of two population correlations

combining correlation estimates from different samples

Testing Whether a Population Correlation Is Equal to a Given Value $\rho _0$

You can use the following statements to test the null hypothesis $H_0\colon \rho = 0.5$ against a two-sided alternative $H_1\colon \rho \neq 0.5$. The test is requested with the option FISHER(RHO0=0.5).

   
   title 'Analysis for Batch 1';
   proc corr data=Sim (where=(Batch=1)) fisher(rho0=.5);
      var X Y;
   run;

Output 2.4.1 displays the results based on Fisher’s transformation. The null hypothesis is rejected since the $p$-value is less than $0.0001$.

Output 2.4.1 Fisher’s Test for $H_0: \rho = \rho _0$
Analysis for Batch 1

The CORR Procedure
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate   H0:Rho=Rho0
95% Confidence Limits Rho0 p Value
X Y 150 0.22081 0.22451 0.0007410 0.22011 0.062034 0.367409 0.50000 <.0001

Testing for Equality of Two Population Correlations

You can use the following statements to test for equality of two population correlations, $\rho _1$ and $\rho _2$. Here, the null hypothesis $H_0\colon \rho _1 = \rho _2$ is tested against the alternative $H_1\colon \rho _1 \neq \rho _2$.

   
   ods output FisherPearsonCorr=SimCorr;
   title 'Testing Equality of Population Correlations';
   proc corr data=Sim (where=(Batch=1 or Batch=2)) fisher;
      var X Y;
      by Batch;
   run;

The ODS OUTPUT statement saves the "FisherPearsonCorr" table into an output data set in the CORR procedure. The output data set SimCorr contains Fisher’s $z$ statistics for both batches.

The following statements display (in Figure 2.4.2) the output data set SimCorr:

   
   proc print data=SimCorr;   
   run;
Output 2.4.2 Fisher’s Correlation Statistics
Obs Batch Var WithVar NObs Corr ZVal BiasAdj CorrEst Lcl Ucl pValue
1 1 X Y 150 0.22081 0.22451 0.0007410 0.22011 0.062034 0.367409 0.0065
2 2 X Y 150 0.33694 0.35064 0.00113 0.33594 0.185676 0.470853 <.0001


The $p$-value for testing $H_0$ is derived by treating the difference $z_1 - z_2$ as a normal random variable with mean zero and variance $1/(n_1-3) + 1/(n_2-3)$, where $z_1$ and $z_2$ are Fisher’s $z$ transformation of the sample correlations $r_1$ and $r_2$, respectively, and where $n_1$ and $n_2$ are the corresponding sample sizes.

The following statements compute the $p$-value in Output 2.4.3:

   
   data SimTest (drop=Batch);
      merge SimCorr (where=(Batch=1) keep=Nobs ZVal Batch 
                     rename=(Nobs=n1 ZVal=z1))
            SimCorr (where=(Batch=2) keep=Nobs ZVal Batch 
                     rename=(Nobs=n2 ZVal=z2));
      variance = 1/(n1-3) + 1/(n2-3);
      z = (z1 - z2) / sqrt( variance );
      pval = probnorm(z);
      if (pval > 0.5) then pval = 1 - pval;
      pval = 2*pval;
   run;
   
   proc print data=SimTest noobs;
   run;
Output 2.4.3 Test of Equality of Observed Correlations
n1 z1 n2 z2 variance z pval
150 0.22451 150 0.35064 0.013605 -1.08135 0.27954


In Output 2.4.3, the $p$-value of 0.2795 does not provide evidence to reject the null hypothesis that $\rho _1=\rho _2$. The sample sizes $n_1=150$ and $n_2=150$ are not large enough to detect the difference $\rho _1-\rho _2=0.05$ at a significance level of $\alpha =0.05$.

Combining Correlation Estimates from Different Samples

Assume that sample correlations $r_1$ and $r_2$ are computed from two independent samples of $n_1$ and $n_2$ observations, respectively. A combined correlation estimate is given by $\bar{r} = {\tanh } (\bar{z})$, where $\bar{z}$ is the weighted average of the $z$ transformations of $r_1$ and $r_2$:

  \[ \bar{z} = \frac{(n_1-3) z_1 + (n_2 -3) z_2}{n_1+n_2-6} \]    

The following statements compute a combined estimate of $\rho $ by using Batch 1 and Batch 3:

   
   ods output FisherPearsonCorr=SimCorr2;
   proc corr data=Sim (where=(Batch=1 or Batch=3)) fisher;
      var X Y;
      by Batch;
   run;
   
   data SimComb (drop=Batch);
      merge SimCorr2 (where=(Batch=1) keep=Nobs ZVal Batch 
                      rename=(Nobs=n1 ZVal=z1))
            SimCorr2 (where=(Batch=3) keep=Nobs ZVal Batch 
                      rename=(Nobs=n2 ZVal=z2));
      z = ((n1-3)*z1 + (n2-3)*z2) / (n1+n2-6);
      corr = tanh(z);
      var = 1/(n1+n2-6);
      zlcl = z - probit(0.975)*sqrt(var);
      zucl = z + probit(0.975)*sqrt(var);
      lcl= tanh(zlcl);
      ucl= tanh(zucl);
      pval= probnorm( z/sqrt(var));
      if (pval > .5)  then pval= 1 - pval;
      pval= 2*pval;
   run;
   
   proc print data=SimComb noobs;
      var n1 z1 n2 z2 corr lcl ucl pval;
   run;

Output 2.4.4 displays the combined estimate of $\rho $. The table shows that a correlation estimate from the combined samples is $r=0.2264$. The $95\% $ confidence interval is $(0.10453,0.34156)$, using the variance of the combined estimate. Note that this interval contains the population correlation $0.3$.

Output 2.4.4 Combined Correlation Estimate
Obs n1 z1 n2 z2 z corr var zlcl zucl lcl ucl pval
1 150 0.22451 100 0.23929 0.23039 0.22640 .004098361 0.10491 0.35586 0.10453 0.34156 .000319748
Previous Page | Next Page | Top of Page