![]() | ![]() | ![]() |
You can test the equality of two proportions obtained from independent samples using the Pearson chi-square test. Specify the CHISQ option in the TABLES statement of PROC FREQ to compute this test. This is equivalent to the well-known Z test for comparing two independent proportions.
Suppose you want to compare the proportions responding Yes to a question in independent samples of 100 men and 100 women. The number of men responding Yes is observed to be 30 and the number of women responding Yes was 45. The data can be arranged in a 2 × 2 table:
Response
Yes No Total
Gender Men 30 70 100
Women 44 55 100
It is assumed that each man in the sample of men had the same probability of responding yes. The same is assumed for the sample of women, though of course, the two probabilities may differ.
To conduct the test, begin by entering the data in a DATA step. In the statements below, there is one line of data giving the number of men responding Yes and the total sample size for men. A second line contains the same information for the sample of women. The programming statements create the Response variable with values Yes and No, create the Count variable computing the count for the No level as total minus number responding Yes, and output an observation for each Response level. The PROC PRINT step shows the resulting data set which has one observation for each cell of the table.
data YesNo;
input Gender $ NumYes Total;
Response="Yes"; Count=NumYes; output;
Response="No "; Count=Total-NumYes; output;
datalines;
Men 30 100
Women 45 100
;
proc print noobs;
var Gender Response Count;
run;
|
You can now use PROC FREQ to display the data in table form and compute the chi-square test comparing the proportions. The WEIGHT statement specifies the variable containing the cell counts of the table by entering these cell counts. In the TABLE statement, the first variable defines the rows of the table and the second variable defines the columns. The ORDER=DATA option, while not necessary to display the table and conduct a proper analysis, assures that the levels defining the rows and columns appear in the same order as they are encountered in when reading the data set. So, Men appears before Women and Yes appears before No. The CHISQ option requests the chi-square test.
proc freq order=data;
weight Count;
table Gender * Response / chisq;
run;
The results show the table which includes the row-, column-, and total percentages represented by each cell count. The table of statistics includes the Pearson chi-square test (labeled "Chi-Square"). The small p -value for the test indicates that the null hypothesis of equal proportions can be rejected. You conclude that the proportions are unequal (p =0.0285).
Note that, in general, the Pearson test is a test of the independence of the row and column variables (Gender and Response in this example). But this hypothesis can be shown to be equivalent to the hypothesis of equal row (or column) proportions.
For the chi-square test to be valid, the cell counts must not be too small. The usual rule of thumb is that all cell counts should be at least 5, though this may be a little too stringent. When some cell counts are too small, you can use Fisher's exact test which is also provided by the CHISQ option. The Fisher test, while more conservative, also shows a significant difference between the proportions (p =0.0405).
| ||||||||||||||
Notice that the Fisher test results include one-sided tests as well as the two-sided test that corresponds to the Pearson chi-square test. See this usage note for interpretation of the one-sided Fisher test results. As noted above, the Pearson chi-square statistic is equivalent to the Z test statistic which is also commonly used to test the equality of independent proportions. In fact, chi-square = Z2. Consequently, the p -value for the two-sided Z test is the same as for the chi-square test. And since the p -value for a two-sided test is double the one-sided p -value when the test statistic's distribution is symmetric (as is the normal distribution), you can get the one-sided p -value by simply halving the two-sided p -value. So, if your alternative hypothesis was that the proportion of men responding Yes is less than than the proportion of women responding Yes, then the p -value for this one-sided test would be 0.0285/2 = 0.0143.
This usage note discusses assessing power and sample size needs and gives an example.
You and compare more than two proportions in the same way as above — simply add a line in the DATA step for each proportion. The chi-square test will now have more than one degree of freedom and a small p -value tells you that at least two of the proportions differ. See this usage note for how to do multiple comparisons to see which proportions differ.
| Product Family | Product | System | SAS Release | |
| Reported | Fixed* | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | SAS Reference ==> Procedures ==> FREQ Analytics ==> Exact Methods Analytics ==> Descriptive Statistics Analytics ==> Categorical Data Analysis Analytics ==> Power and Sample Size |
| Date Modified: | 2010-05-07 16:42:07 |
| Date Created: | 2002-12-16 10:56:39 |



