Previous Page | Next Page

Producing Charts to Summarize Variables

Customizing Frequency Charts


Changing the Number of Ranges

You can change the appearance of the charts in the following ways:

Action Option
specify midpoints that define the range of values that each bar, block, or section represents. MIDPOINTS= option
specify the number of bars on the chart and let PROC CHART compute the midpoints. LEVELS= option
specify a variable that contains discrete numeric values. PROC CHART will produce a bar chart with a bar for each distinct value. DISCRETE option

Note:   Most examples in this section use vertical bar charts. However, unless documented otherwise, you can use any of the options in the PIE, BLOCK, or HBAR statements.  [cautionend]


Specifying Midpoints for a Numeric Variable

You can specify midpoints for a continuous numeric variable by using the MIDPOINTS= option in the VBAR statement. The form of this option is

VBAR variable / MIDPOINTS=midpoints-list;
where midpoints-list is a list of the numbers to use as midpoints.

For example, to specify the traditional grading ranges with midpoints from 55 to 95, use the following option:

midpoints=55 65 75 85 95

Or, you can abbreviate the list of midpoints:

midpoints=55 to 95 by 10

The corresponding ranges are as follows:

  50 to 59
  60 to 69
  70 to 79
  80 to 89
  90 to 99

The following program uses the MIDPOINTS= option to create a bar chart for ExamGrade1:

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Examgrade1 / midpoints=55 to 95 by 10;
   title 'Assigning Grades for First Chemistry Exam';
run;

The MIDPOINTS= option forces PROC CHART to center the five bars around the traditional midpoints for exam grades.

The following output shows the bar chart:

Specifying the Midpoints for a Vertical Bar Chart

                   Assigning Grades for First Chemistry Exam                   1

    Frequency

    16 +       *****
       |       *****
    15 +       *****                   *****
       |       *****                   *****
    14 +       *****                   *****
       |       *****                   *****
    13 +       *****                   *****
       |       *****                   *****
    12 +       *****                   *****
       |       *****                   *****
    11 +       *****                   *****
       |       *****                   *****
    10 +       *****                   *****       *****
       |       *****                   *****       *****
     9 +       *****                   *****       *****
       |       *****                   *****       *****
     8 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     7 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     6 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     5 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     4 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     3 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     2 +       *****       *****       *****       *****
       |       *****       *****       *****       *****
     1 +       *****       *****       *****       *****       *****
       |       *****       *****       *****       *****       *****
       --------------------------------------------------------------------
                 55          65          75          85          95

                                ExamGrade1 Midpoint
A traditional method to assign grades assumes the data is normally distributed. However, the bars do not appear as a normal (bell-shaped) curve. If grades are assigned based on these midpoints and the traditional pass/fail boundary of 60, then a substantial portion of the class will fail the exam because more observations fall in the bar around the midpoint of 55 than in any other bar.

Specifying the Number of Midpoints in a Chart

You can specify the number of midpoints in the chart rather than the values of the midpoints by using the LEVELS= option. The procedure selects the midpoints.

The form of the option is

VBAR variable / LEVELS=number-of-midpoints;
where number-of-midpoints specifies the number of midpoints.

The following program uses the LEVELS= option to create a bar chart with five bars:(footnote 1)

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Examgrade1 / levels=5;
   title 'Assigning Grades for First Chemistry Exam';
run;

The LEVELS= option forces PROC CHART to compute only five midpoints.

The following output shows the bar chart:

Specifying Five Midpoints for a Vertical Bar Chart

                   Assigning Grades for First Chemistry Exam                   1

    Frequency

       |                                           *****
    20 +                                           *****
       |                                           *****
       |                                           *****
       |                                           *****
       |                                           *****
    15 +                                           *****
       |                                           *****
       |                   *****                   *****
       |                   *****       *****       *****
       |                   *****       *****       *****
    10 +                   *****       *****       *****
       |                   *****       *****       *****
       |                   *****       *****       *****
       |                   *****       *****       *****
       |                   *****       *****       *****
     5 +                   *****       *****       *****
       |                   *****       *****       *****
       |       *****       *****       *****       *****
       |       *****       *****       *****       *****
       |       *****       *****       *****       *****       *****
       --------------------------------------------------------------------
                37.5        52.5        67.5        82.5        97.5

                                ExamGrade1 Midpoint
Assigning grades for these midpoints results in three students with exam grades in the lowest range.

Charting Every Value

By default, PROC CHART assumes that all numeric variables are continuous and automatically chooses intervals for them unless you use MIDPOINTS= or LEVELS=. You can specify that a numeric variable is discrete rather than continuous by using the DISCRETE option. PROC CHART will create a frequency chart with bars for each distinct value of the discrete numeric variable.

The following program uses the DISCRETE option to create a bar chart with a bar for each value of ExamGrade1:

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Examgrade1 / discrete;
   title 'Grades for First Chemistry Exam';
run;

The following output shows the bar chart:

Specifying a Bar for Each Exam Grade

                        Grades for First Chemistry Exam                        1

     Frequency

     6 +                                                       **
       |                                                       **
       |                                                       **
       |                                                       **
       |                                                       **
     5 +                                                       ** ** **
       |                                                       ** ** **
       |                                                       ** ** **
       |                                                       ** ** **
       |                                                       ** ** **
     4 +                      **       **                      ** ** **
       |                      **       **                      ** ** **
       |                      **       **                      ** ** **
       |                      **       **                      ** ** **
       |                      **       **                      ** ** **
     3 +                      **       **                   ** ** ** **
       |                      **       **                   ** ** ** **
       |                      **       **                   ** ** ** **
       |                      **       **                   ** ** ** **
       |                      **       **                   ** ** ** **
     2 +    ** **    **    ** **       ** **    **       ** ** ** ** **
       |    ** **    **    ** **       ** **    **       ** ** ** ** **
       |    ** **    **    ** **       ** **    **       ** ** ** ** **
       |    ** **    **    ** **       ** **    **       ** ** ** ** **
       |    ** **    **    ** **       ** **    **       ** ** ** ** **
     1 + ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
       | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
       | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
       | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
       | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
       --------------------------------------------------------------------
         39 41 46 48 49 51 55 58 59 62 63 64 67 70 71 73 75 77 79 85 89 98

                                    ExamGrade1
The chart shows that in most cases only one or two students earned a given grade. However, clusters of three or more students earned grades of 58, 63, 77, 79, 85, and 89. The mode for this exam (most frequently earned exam grade) is 79.

Note:   PROC CHART does not proportionally space the values of a discrete numeric variable on the horizontal axis.  [cautionend]


Charting the Frequency of a Character Variable

You can create charts of a character variable as well as a numeric variable. For instance, to compare enrollment among sections, PROC CHART creates a chart that shows the number of students in each section.

Creating a frequency chart of a character variable is the same as creating a frequency chart of a numeric variable. However, the main difference between charting a numeric variable and charting a character variable is how PROC CHART selects the midpoints. By default, PROC CHART uses each value of a character variable as a midpoint, as if the DISCRETE option were in effect. You can limit the selection of midpoints to a subset of the variable's values, but if you do not define a format for the chart variable, then a single bar, block, or section represents a single value of the variable.


Specifying Midpoints for a Character Variable

By default, the midpoints that PROC CHART uses for character variables are in alphabetical order. However, you can easily rearrange the order of the midpoints with the MIDPOINTS= option. When you use the MIDPOINTS= option for character variables, you must enclose the value of each midpoint in single or double quotation marks, and the values must correspond to values in the data set. For example,

midpoints='Mon' 'Wed' 'Fri'

uses the three days the class sections meet as midpoints.

The following program uses the MIDPOINTS= option to create a bar chart that shows the number of students enrolled in each section:

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Section / midpoints='Mon' 'Wed' 'Fri';
   title 'Enrollment for an Introductory Chemistry Course';
run;

The MIDPOINTS= option alters the chart so that the days of the week appear in chronological rather than alphabetical order.

The following output shows the bar chart:

Ordering Character Midpoints Chronologically

                Enrollment for an Introductory Chemistry Course                1

                Frequency

                   |       *****       *****
                   |       *****       *****       *****
                15 +       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                10 +       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                 5 +       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   |       *****       *****       *****
                   --------------------------------------------
                            Mon         Wed         Fri

                                      Section
The chart shows that the Monday and Wednesday sections have the same number of students; the Friday section has one fewer student.

Creating Subgroups within a Range

You can show how a subgroup contributes to each bar or block by using the SUBGROUP= option in the BLOCK statement, HBAR statement, or VBAR statement. For example, you can use the SUBGROUP= option to explore patterns within a population (gender differences).

The SUBGROUP= option defines a variable called the subgroup variable. PROC CHART uses the first character of each value to fill in the portion of the bar or block that corresponds to that value, unless more than one value begins with the same first character. In that case, PROC CHART uses the letters A, B, C, and so on to fill in the bars or blocks.

If you assign a format to the variable, then PROC CHART uses the first character of the formatted value. The characters that PROC CHART uses in the chart and the values that they represent are shown in a legend at the bottom of the chart.

PROC CHART orders the subgroup symbols as A through Z, and as 0 through 9, with the characters in ascending order. PROC CHART calculates the height of a bar or block for each subgroup individually and rounds the percentage of the total bar up or down. So the total height of the bar might be greater or less than the height of the same bar without the SUBGROUP= option.

The following program uses GENDER as the subgroup variable to show how many members in each section are male and female:

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Section / midpoints='Mon' 'Wed' 'Fri'
                  subgroup=Gender;
   title 'Enrollment for an Introductory Chemistry Course';
run;

The following output shows the bar chart:

Using Gender to Form Subgroups

                Enrollment for an Introductory Chemistry Course                1

                Frequency

                   |       MMMMM       MMMMM
                   |       MMMMM       MMMMM       MMMMM
                15 +       MMMMM       MMMMM       MMMMM
                   |       MMMMM       MMMMM       MMMMM
                   |       MMMMM       MMMMM       MMMMM
                   |       MMMMM       MMMMM       MMMMM
                   |       MMMMM       MMMMM       MMMMM
                10 +       MMMMM       MMMMM       MMMMM
                   |       FFFFF       MMMMM       MMMMM
                   |       FFFFF       MMMMM       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                 5 +       FFFFF       FFFFF       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                   |       FFFFF       FFFFF       FFFFF
                   --------------------------------------------
                            Mon         Wed         Fri

                                      Section


                        Symbol Gender     Symbol Gender

                           F   F             M   M
PROC CHART fills each bar in the chart with the characters that represent the value of the variable GENDER. The portion of the bar that is filled with Fs represents the number of observations that correspond to females; the portion that is filled with Ms represents the number of observations that correspond to males. Because the value of Gender contains a single character (F or M), the symbol that PROC CHART uses as the fill character is identical to the value of the variable.

Charting Mean Values

PROC CHART enables you to specify what the bars or sections in the chart represent. By default, each bar, block, or section represents the frequency of the chart variable. You can also identify a variable whose values determine the sizes of the bars, blocks, or sections in the chart.

You define a variable called the sumvar variable by using the SUMVAR= option. With the SUMVAR= option, you can also use the TYPE= option to specify whether the sum of the Sumvar variable or the mean of the Sumvar variable determines the size of the bars or sections. The available types are

SUM

sums the values of the Sumvar variable in each range. Then PROC CHART uses the sums to determine the size of each bar, block, or section. SUM is the default type.

MEAN

determines the mean value of the Sumvar variable in each range. Then PROC CHART uses the means to determine the size of each bar, block, or section.

The following program creates a bar chart grouped by gender to compare the mean value of all grades in each section:

options pagesize=40 linesize=80 pageno=1 nodate;

proc chart data=grades;
   vbar Section / midpoints='Mon' 'Wed' 'Fri' group=Gender
                  sumvar=Examgrade1 type=mean;
   title 'Mean Exam Grade for Introductory Chemistry Sections';
run;

The SUMVAR= option specifies that the values of ExamGrade1 determine the size of the bars. The TYPE=MEAN option specifies to compare the mean grade for each group.

The following output shows the bar chart:

Using the SUMVAR= Option to Compare Mean Values

              Mean Exam Grade for Introductory Chemistry Sections              1

ExamGrade1 Mean

   |                         *****
80 +                         *****
   |                         *****                            *****
   |                         *****                  *****     *****
   |                         *****                  *****     *****
60 +     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
40 +     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
20 +     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   |     *****     *****     *****        *****     *****     *****
   -----------------------------------------------------------------------------
          Mon       Wed       Fri          Mon       Wed       Fri       Section

         |---------- F ----------|        |---------- M ----------|      Gender
The chart shows that the females in the Friday section achieved the highest mean grade, followed by the males in the same section.

Creating a Three-Dimensional Chart

Complicated relationships such as the ones charted with the GROUP= option might be easier to understand if you present them as three-dimensional block charts. The following program uses the BLOCK statement to create a block chart for the numeric variable ExamGrade1:

options linesize=120 pagesize=40 pageno=1 nodate;
proc chart data=grades;
   block Section / midpoints='Mon' 'Wed' 'Fri'
                   sumvar=Examgrade1 type=mean
                   group=Gender;
   format Examgrade1 4.1;
   title 'Mean Exam Grade for Introductory Chemistry Sections';
run;

The FORMAT statement specifies the number of decimals that PROC CHART uses to report the mean value of ExamGrade1 beneath each block.

Note:   If the line size or page size is not sufficient to display all the bars, then PROC CHART produces a horizontal bar chart.  [cautionend]

The following output shows the block chart:

Using a Block Chart to Compare Group Means

                                  Mean Exam Grade for Introductory Chemistry Sections                                  1

                                    Mean of ExamGrade1 by Section grouped by Gender


                                                                             ___
                                                               ___          /_ /|
                                                 ___          /_ /|        |**| |
                                                /_ /|        |**| |        |**| |
                                               |**| |        |**| |        |**| |
                                               |**| |        |**| |        |**| |
                                              -|**| |--------|**| |---___ -|**| |-------
                                             / |**| |      / |**| |  /_ /| |**| |      /
                                            /  |**| |     /  |**| | |**| | |**| |     /
                                      M   ___  |**| |   ___  |**| | |**| | |**| |    /
                                         /_ /| |**|/   /_ /| |**|/  |**| | |**|/    /
                                        |**| |        |**| |        |**| |         /
                                        |**| | 60.3   |**| | 69.8   |**| | 75.3   /
              Gender                   /|**| |-------/|**| |-------/|**| |-------/
                                      / |**| |      / |**| |      / |**| |      /
                                     /  |**| |     /  |**| |     /  |**| |     /
                               F    /   |**| |    /   |**| |    /   |**| |    /
                                   /    |**|/    /    |**|/    /    |**|/    /
                                  /             /             /             /
                                 /      60.7   /      61.4   /      83.6   /
                                /-------------/-------------/-------------/

                                     Mon           Wed           Fri

                                                 Section
The value that is shown beneath each block is the mean of ExamGrade1 for that combination of Section and Gender. You can easily see that both females and males in the Friday section earned higher grades than their counterparts in the other sections.

FOOTNOTE 1:   You can use SAS to normalize the data before the chart is created. [arrow]

Previous Page | Next Page | Top of Page