You can test the equality of two proportions obtained from independent samples using the Pearson chi-square test. Specify the CHISQ option in the TABLES statement of PROC FREQ to compute this test. This is equivalent to the well-known Z test for comparing two independent proportions.
Suppose you want to compare the proportions responding Yes to a question in independent samples of 100 men and 100 women. The number of men responding Yes is observed to be 30 and the number of women responding Yes was 45. The data can be arranged in a 2×2 table:
Response Yes No Total Gender Men 30 70 100 Women 45 55 100
It is assumed that each man in the sample of men had the same probability of responding yes. The same is assumed for the sample of women, though of course, the two probabilities may differ.
To conduct the test, begin by entering the data in a DATA step. In the statements below, there is one line of data giving the number of men responding Yes and the total sample size for men. A second line contains the same information for the sample of women. The programming statements create the Response variable with values Yes and No, create the Count variable computing the count for the No level as total minus number responding Yes, and output an observation for each Response level. The PROC PRINT step shows the resulting data set which has one observation for each cell of the table.
data YesNo; input Gender $ NumYes Total; Response="Yes"; Count=NumYes; output; Response="No "; Count=Total-NumYes; output; datalines; Men 30 100 Women 45 100 ; proc print noobs; var Gender Response Count; run;
|
You can now use PROC FREQ to display the data in table form and compute the chi-square test comparing the proportions. The WEIGHT statement specifies the variable containing the cell counts of the table by entering these cell counts. In the TABLE statement, the first variable defines the rows of the table and the second variable defines the columns. The ORDER=DATA option, while not necessary to display the table and conduct a proper analysis, assures that the levels defining the rows and columns appear in the same order as they are encountered in when reading the data set. So, Men appears before Women and Yes appears before No. The CHISQ option requests the chi-square test.
proc freq order=data; weight Count; table Gender * Response / chisq riskdiff; run;
In the results, a table of statistics includes the Pearson chi-square test (labeled "Chi-Square"). The small p-value for the test (p=0.0285) indicates that the null hypothesis of equal proportions can be rejected and that the proportions are unequal.
Note that the Pearson test is a test of the independence of the row and column variables — Gender and Response in this example. But this hypothesis can be shown to be equivalent to the hypothesis of equal row (or column) proportions.
|
If you are interested in the difference in the probability of responding Yes between men and women, the RISKDIFF option provides an estimate as well as a confidence interval. Since Yes is in Column 1, the "Column 1 Risk Estimates" table provides the desired estimates. The estimated difference in probabilities (Men in Row 1 - Women in Row 2) is -0.15 with 95% confidence limits (-0.2826, -0.0174). Since these limits do not include zero as a likely value of the population mean difference, the difference is significant at the 0.05 level.
|
When the table is more complex, involving more variables, a logistic model is often fit to the data. In such models, the odds ratio is often used for comparisons. But differences in probabilities can also be estimated as described in this note.
For the chi-square test to be valid, the cell counts must not be too small. The usual rule of thumb is that all cell counts should be at least 5, though this may be a little too stringent. When some cell counts are too small, you can use Fisher's exact test which is also provided by the CHISQ option. The Fisher test, while more conservative, also shows a significant difference between the proportions (p=0.0405).
|
An exact confidence interval of the difference in probabilities can also be obtained by using the RISKDIFF(CL=EXACT) option and the EXACT statement.
proc freq order=data; weight Count; table Gender * Response / riskdiff(cl=exact); exact riskdiff; run;
Note that the exact method can require considerable time and memory. It should only be used when the large-sample confidence interval (above) is questionable due to small cell counts. There is no reason to question the validity of the large-sample interval in this table, and this is confirmed by the exact confidence interval being very similar to the large-sample interval.
|
Notice that the Fisher test results include one-sided tests as well as the two-sided test that corresponds to the Pearson chi-square test. See this note for interpretation of the one-sided Fisher test results. As noted above, the Pearson chi-square statistic is equivalent to the Z test statistic which is also commonly used to test the equality of independent proportions. In fact, chi-square = Z2. Consequently, the p-value for the two-sided Z test is the same as for the chi-square test. And since the p-value for a two-sided test is double the one-sided p-value when the test statistic's distribution is symmetric (as is the normal distribution), you can get the one-sided p-value by simply halving the two-sided p-value. So, if your alternative hypothesis was that the proportion of men responding Yes is less than than the proportion of women responding Yes, then the p-value for this one-sided test would be 0.0285/2 = 0.0143.
This note discusses assessing power and sample size needs and gives an example.
You and compare more than two proportions in the same way as above — simply add a line in the DATA step for each proportion. The chi-square test will now have more than one degree of freedom and a small p-value tells you that at least two of the proportions differ. See this note for how to do multiple comparisons to see which proportions differ.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | All | n/a |
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> FREQ Analytics ==> Exact Methods Analytics ==> Descriptive Statistics Analytics ==> Categorical Data Analysis Analytics ==> Power and Sample Size |
Date Modified: | 2016-06-01 16:05:26 |
Date Created: | 2002-12-16 10:56:39 |