Previous Page | Next Page

The FREQ Procedure

Example 35.3 Chi-Square Goodness-of-Fit Tests

This example examines whether the children’s hair color (from Example 35.1) has a specified multinomial distribution for the two geographical regions. The hypothesized distribution of hair color is 30% fair, 12% red, 30% medium, 25% dark, and 3% black.

In order to test the hypothesis for each region, the data are first sorted by Region. Then the FREQ procedure uses a BY statement to produce a separate table for each BY group (Region). The option ORDER=DATA orders the variable values (hair color) in the frequency table by their order in the input data set. The TABLES statement requests a frequency table for hair color, and the option NOCUM suppresses the display of the cumulative frequencies and percentages.

The CHISQ option requests a chi-square goodness-of-fit test for the frequency table of Hair. The TESTP= option specifies the hypothesized (or test) percentages for the chi-square test; the number of percentages listed equals the number of table levels, and the percentages sum to 100%. The TESTP= percentages are listed in the same order as the corresponding variable levels appear in frequency table.

The PLOTS= option requests a deviation plot, which is associated with the CHISQ option and displays the relative deviations from the test frequencies. The TYPE=DOT plot-option requests a dot plot instead of the default type, which is a bar chart. The ONLY plot-option requests that PROC FREQ produce only the deviation plot. By default, PROC FREQ produces all plots associated with the requested analyses. A frequency plot is associated with a one-way table request but is not produced in this example because ONLY is specified with the DEVIATIONPLOT request. Note that ODS Graphics must be enabled before requesting plots. These statements produce Output 35.3.1 through Output 35.3.4.

   proc sort data=Color;
      by Region;
   run;
   
   ods graphics on;
   proc freq data=Color order=data;
      tables Hair / nocum chisq testp=(30 12 30 25 3)
                    plots(only)=deviationplot(type=dot);
      weight Count;
      by Region;
      title 'Hair Color of European Children';
   run;
   ods graphics off;

Output 35.3.1 Frequency Table and Chi-Square Test for Region 1
Hair Color of European Children

The FREQ Procedure

Hair Color
Hair Frequency Percent Test
Percent
fair 76 30.89 30.00
red 19 7.72 12.00
medium 83 33.74 30.00
dark 65 26.42 25.00
black 3 1.22 3.00

Chi-Square Test
for Specified Proportions
Chi-Square 7.7602
DF 4
Pr > ChiSq 0.1008


Output 35.3.1 shows the frequency table and chi-square test for Region 1. The frequency table lists the variable values (hair color) in the order in which they appear in the data set. The "Test Percent" column lists the hypothesized percentages for the chi-square test. Always check that you have ordered the TESTP= percentages to correctly match the order of the variable levels.

Output 35.3.2 shows the deviation plot for Region 1, which displays the relative deviations from the hypothesized values. The relative deviation for a level is the difference between the observed and hypothesized (test) percentage divided by the test percentage. You can suppress the chi-square p-value that is displayed by default in the deviation plot by specifying the NOSTATS plot-option.

Output 35.3.2 Deviation Plot for Region 1
 Deviation Plot for Region 1


Output 35.3.3 and Output 35.3.4 show the results for Region 2. PROC FREQ computes a chi-square statistic for each region. The chi-square statistic is significant at the 0.05 level for Region 2 (=0.0003) but not for Region 1. This indicates a significant departure from the hypothesized percentages in Region 2.

Output 35.3.3 Frequency Table and Chi-Square Test for Region 2
Hair Color of European Children

The FREQ Procedure

Geographic Region=2

Hair Color
Hair Frequency Percent Test
Percent
fair 152 29.46 30.00
red 94 18.22 12.00
medium 134 25.97 30.00
dark 117 22.67 25.00
black 19 3.68 3.00

Chi-Square Test
for Specified Proportions
Chi-Square 21.3824
DF 4
Pr > ChiSq 0.0003

Output 35.3.4 Deviation Plot for Region 2
 Deviation Plot for Region 2

Previous Page | Next Page | Top of Page