
Usage Note 24298: Compute power or type 2 error of a test of two proportions or the sample size needed to achieve the specified power

This information is updated from David Schlotzhauer (1996), "Comparing Two Proportions: The Chi-Square Test, Power, and Sample Size," Observations: The Technical Journal for SAS Software Users 5(4): 59-62.

A test for comparing proportions from two independent samples can be performed by using the CHISQ option in the FREQ procedure1. Just think of your data as a 2x2 ("two-by-two") table. For example, suppose you ask 50 men and 50 women if they voted on election day. You find that 12 men voted (or 12/50 = 24%) and 18 women voted (or 18/50 = 36%)—a difference of 12%. You want to test whether there is a significant difference in the probabilities of men and women voting in the population from which you sampled. Here is how this data can be arranged in a 2x2 table:

Row Pct
Col Pct
Table of gender by vote
gender vote Total
no yes
female 32
male 38
Total 70

The following SAS statements created the table above. Note the use of the WEIGHT statement to enter the cell counts of the table. If you have raw data (one observation per person that contains values for GENDER and VOTE) instead of cell counts, then simply omit the WEIGHT statement.

     data vote;
       input gender $ vote $ count;
     male   yes 12
     male   no  38
     female yes 18
     female no  32

     proc freq;
       weight count;
       table gender*vote;

The Row Pct values in the vote=yes column are the proportions of men and women who voted in your sample. Pearson's chi-square statistic can be used to test the hypothesis that the corresponding population proportions are equal. This test can be requested with the CHISQ option, which you specify in the TABLE statement of PROC FREQ:

       table gender*vote / chisq;

The following table of statistics is produced by the CHISQ option.

Statistics for Table of gender by vote

Statistic DF Value Prob
Chi-Square 1 1.7143 0.1904
Likelihood Ratio Chi-Square 1 1.7230 0.1893
Continuity Adj. Chi-Square 1 1.1905 0.2752
Mantel-Haenszel Chi-Square 1 1.6971 0.1927
Phi Coefficient   -0.1309  
Contingency Coefficient   0.1298  
Cramer's V   -0.1309  

The first statistic, labeled Chi-Square, is Pearson's chi-square statistic, which has a value of 1.714 in this table. Large values of the chi-square statistic indicate inequality of the population proportions. The value in the Prob column enables you to assess whether the chi-square value is large. For this table, Prob=0.190 means that the probability of obtaining a chi-square value at least as large as 1.714 when there is really no difference in population proportions is 0.190.

Based on this information, should you reject the hypothesis of equality? Should you accept it? Can a conclusion even be made about the equality of the population proportions? First, consider that there are two types of errors that you can make based on your test. You make a type 1 error by rejecting equality when the population proportions really are equal. The chance of a type 1 error is denoted by alpha (α). It is also called the size or significance level of the test. You make a type 2 error by accepting equality when the population proportions are not equal. The chance of a type 2 error is denoted by beta (β) for a given size difference between the population proportions. Discussion of the basic concepts that underlie statistical hypothesis testing can be found in many statistical theory texts, such as Lindgren (1976).

If you reject equality in the voting example, then your chance of making a type 1 error (α) is .19. If this chance were smaller—say, .05 or less—then most analysts would conclude that observing a value of Pearson's statistic as large as 1.714 must indicate that there really is a difference in population proportions. They would then reject the hypothesis of equality and accept the .05 chance of a type 1 error. However, in this case, the .19 chance indicates that such large statistic values can occur reasonably often when the population proportions are equal. Because you might have merely observed such a case with this data, you cannot conclude that the population proportions differ.

But should you conclude that men and women really vote in equal proportions? There was, after all, a difference of 12% in the observed proportions. Just as you needed to know the chance of a type 1 error when deciding whether to reject equality, you now need to know the probability of making a type 2 error if you accept the hypothesis of equality. The power of this test is the probability of rejecting equality when the population proportions differ by a given amount. Beta, the probability of a type 2 error, is the probability of accepting equality given this amount of difference and is simply 1–power.

The %POWER2x2 macro provides power and beta of the Pearson chi-square statistic when computed for 2x2 tables such as this voting example. Some details of the power computations that are used by the macro are given in note 2.

Power depends on sample size, the significance level of the test, and the unknown population proportions. For each of these, supply values at which you are interested in obtaining power. It's a good idea to compute power for several settings close to what you expect the true proportions to be. For this example, assume that the population proportions really differ by the 12% observed. Setting the significance level of the test (chance of a type 1 error) at .05 and both sample sizes at 50 will provide the power of the test that was performed above.

     %power2x2(p1=.36, p2=.24, n1=50, n2=50)

This program generates the following output:

Power for comparing two independent proportions
p1=.36, p2=.24, level=.05

Power Beta
100 0.25630 0.74370

Beginning with SAS 9, the power for this test can also be computed by using PROC POWER. The TWOSAMPLEFREQ statement provides power calculations for tests that compare two proportions. The TEST=PCHI option focuses the calculations on the Pearson chi-square test. Calculations for the likelihood ratio chi-square test and Fisher's exact test are also available. The following statements compute the power of the Pearson test for the voting example:

      proc power;
        twosamplefreq test=pchi
                      groupproportions=(.36 .24)
                      power =.;
The POWER Procedure
Pearson Chi-square Test for Two Proportions

Fixed Scenario Elements
Distribution Asymptotic normal
Method Normal approximation
Null Proportion Difference 0
Group 1 Proportion 0.36
Group 2 Proportion 0.24
Sample Size Per Group 50
Number of Sides 2
Alpha 0.05

Computed Power

Based on the test, if the population proportions really differ by 12%, then your chance of incorrectly accepting equality is almost .75. Stated another way, your chance of detecting a 12% difference is only .25. Of course, your chance of detecting larger differences will increase. But for the sample size, significance level, and difference in population proportions that are assumed above, the resulting risk is large and probably unacceptable. It seems the data that you've collected leaves you in a gray zone—you can neither accept nor reject the hypothesis of equality without incurring an unacceptably large risk of error.

Actually, the problem is that the data provides insufficient evidence to accept or reject equality. If you had gotten the same proportions from samples of 1000 men and 1000 women, then there would not have been this ambiguity. Power is affected by sample size—as sample size increases, power goes up and beta goes down. You can avoid inconclusive results, following the expense and effort of data collection, by first selecting a sample size that yields adequate power and acceptable beta.

Suppose at the next election you plan to do another study of gender and voting and you would like to pick a sample size to avoid inconclusive results. Assuming equal-sized samples, you'd like to examine power and beta for total sample sizes that range from 10 to 1000. To compute the power for rejecting the null hypothesis of equal probabilities, you must specify the expected voting probabilities for men and women at which to calculate the power and the alpha level that you are willing to accept. Suppose you can tolerate a .05 probability of a type 1 error. From your previous study, a 30% overall voter turnout can be expected. The %POWER2x2 macro enables you to find a sample size that will detect a 12% difference in population proportions with reasonably high probability. Again, because the population proportions are unknown, you might want to try several reasonable settings of the difference.

     %power2x2(p1=.36, p2=.24, nmin=10, nmax=1000)

The following output is generated:

Power for comparing two independent proportions
p1=.36, p2=.24, level=.05

Power Beta
10 0.06778 0.93222
50 0.15025 0.84975
100 0.25630 0.74370
150 0.35978 0.64022
200 0.45656 0.54344
250 0.54429 0.45571
300 0.62192 0.37808
350 0.68927 0.31073
400 0.74678 0.25322
450 0.79520 0.20480
500 0.83550 0.16450
550 0.86870 0.13130
600 0.89580 0.10420
650 0.91775 0.08225
700 0.93539 0.06461
750 0.94948 0.05052
800 0.96066 0.03934
850 0.96949 0.03051
900 0.97643 0.02357
950 0.98185 0.01815
1000 0.98607 0.01393

Again, PROC POWER can be used to explore the power for a range of sample sizes:

     proc power;
       twosamplefreq test=pchi
                     groupproportions=(.36 .24)
                     ntotal=10, 50 to 1000 by 50
                     power =.;
The POWER Procedure
Pearson Chi-square Test for Two Proportions

Fixed Scenario Elements
Distribution Asymptotic normal
Method Normal approximation
Null Proportion Difference 0
Group 1 Proportion 0.36
Group 2 Proportion 0.24
Number of Sides 2
Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1

Computed Power
Index N Total Power
1 10 0.068
2 50 0.150
3 100 0.256
4 150 0.360
5 200 0.457
6 250 0.544
7 300 0.622
8 350 0.689
9 400 0.747
10 450 0.795
11 500 0.836
12 550 0.869
13 600 0.896
14 650 0.918
15 700 0.935
16 750 0.949
17 800 0.961
18 850 0.969
19 900 0.976
20 950 0.982
21 1000 0.986

Unfortunately, these results show that it will take a lot more data to avoid inconclusive results. If you want a .90 chance of detecting a difference of 12% between the population proportions, then you'll need to take a sample of at least 300 men and 300 women. This would lower the chance of making a type 2 error to .11.

The sample sizes that are shown are estimates based on your guesses of the voting probabilities. As the probabilities become either very small or very large, the variance of the difference in proportions decreases, causing the power for a given sample size to increase. The sample size estimates are also affected by the difference that you want to detect and the type 1 error rate that you choose. If you decide that you want to detect a smaller difference in the population proportions, then more data will be required. If you decide that you are willing to accept only a .01 type 1 error probability, instead of .05, then you'll again need more data. Experiment with different settings and notice the results.

Agresti, A. 1990, 2002. Categorical Data Analysis. New York: John Wiley & Sons, Inc.

Berry, J. J., and G. I. Hurtado. 1994. "Comparing Non-independent Proportions." Observations: The Technical Journal for SAS Software Users.

Lindgren, B. W. 1976. Statistical Theory. 3d ed. New York: Macmillan Publishing Co.


1 Methods for testing proportions in non-independent samples can be found in Berry and Hurtado (1994).

2 Power and beta are computed by the macro, using the statements below. p1 and p2 are the group success probabilities under the alternative hypothesis. n1 and n2 are the samples sizes for the two groups. ph0 is the overall success probability, which assumes the null hypothesis of no difference. stdh0 is the standard error under the null hypothesis, and stdha is the standard error under the alternative hypothesis. The PROBNORM function returns the probability of smaller values from a standard normal variate. The PROBIT function is the inverse of PROBNORM and returns the value of a standard normal variate that is associated with the given probability.
  power=1-probnorm(-probit(level/2)*stdh0/stdha-diff/stdha) +
          probnorm( probit(level/2)*stdh0/stdha-diff/stdha);

