Producing Charts to Summarize Variables |
Changing the Number of Ranges |
You can change the appearance of the charts in the following ways:
Note: Most examples in this section use vertical bar charts. However, unless documented otherwise, you can use any of the options in the PIE, BLOCK, or HBAR statements.
You can specify midpoints for a continuous numeric variable by using the MIDPOINTS= option in the VBAR statement. The form of this option is
VBAR variable / MIDPOINTS=midpoints-list; |
For example, to specify the traditional grading ranges with midpoints from 55 to 95, use the following option:
midpoints=55 65 75 85 95
Or, you can abbreviate the list of midpoints:
midpoints=55 to 95 by 10
The corresponding ranges are as follows:
50 to 59 60 to 69 70 to 79 80 to 89 90 to 99
The following program uses the MIDPOINTS= option to create a bar chart for ExamGrade1:
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Examgrade1 / midpoints=55 to 95 by 10; title 'Assigning Grades for First Chemistry Exam'; run;
The MIDPOINTS= option forces PROC CHART to center the five bars around the traditional midpoints for exam grades.
The following output shows the bar chart:
Specifying the Midpoints for a Vertical Bar Chart
Assigning Grades for First Chemistry Exam 1 Frequency 16 + ***** | ***** 15 + ***** ***** | ***** ***** 14 + ***** ***** | ***** ***** 13 + ***** ***** | ***** ***** 12 + ***** ***** | ***** ***** 11 + ***** ***** | ***** ***** 10 + ***** ***** ***** | ***** ***** ***** 9 + ***** ***** ***** | ***** ***** ***** 8 + ***** ***** ***** ***** | ***** ***** ***** ***** 7 + ***** ***** ***** ***** | ***** ***** ***** ***** 6 + ***** ***** ***** ***** | ***** ***** ***** ***** 5 + ***** ***** ***** ***** | ***** ***** ***** ***** 4 + ***** ***** ***** ***** | ***** ***** ***** ***** 3 + ***** ***** ***** ***** | ***** ***** ***** ***** 2 + ***** ***** ***** ***** | ***** ***** ***** ***** 1 + ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** -------------------------------------------------------------------- 55 65 75 85 95 ExamGrade1 MidpointA traditional method to assign grades assumes the data is normally distributed. However, the bars do not appear as a normal (bell-shaped) curve. If grades are assigned based on these midpoints and the traditional pass/fail boundary of 60, then a substantial portion of the class will fail the exam because more observations fall in the bar around the midpoint of 55 than in any other bar.
You can specify the number of midpoints in the chart rather than the values of the midpoints by using the LEVELS= option. The procedure selects the midpoints.
VBAR variable / LEVELS=number-of-midpoints; |
The following program uses the LEVELS= option to create a bar chart with five bars:(footnote 1)
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Examgrade1 / levels=5; title 'Assigning Grades for First Chemistry Exam'; run;
The LEVELS= option forces PROC CHART to compute only five midpoints.
The following output shows the bar chart:
Specifying Five Midpoints for a Vertical Bar Chart
Assigning Grades for First Chemistry Exam 1 Frequency | ***** 20 + ***** | ***** | ***** | ***** | ***** 15 + ***** | ***** | ***** ***** | ***** ***** ***** | ***** ***** ***** 10 + ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** 5 + ***** ***** ***** | ***** ***** ***** | ***** ***** ***** ***** | ***** ***** ***** ***** | ***** ***** ***** ***** ***** -------------------------------------------------------------------- 37.5 52.5 67.5 82.5 97.5 ExamGrade1 MidpointAssigning grades for these midpoints results in three students with exam grades in the lowest range.
By default, PROC CHART assumes that all numeric variables are continuous and automatically chooses intervals for them unless you use MIDPOINTS= or LEVELS=. You can specify that a numeric variable is discrete rather than continuous by using the DISCRETE option. PROC CHART will create a frequency chart with bars for each distinct value of the discrete numeric variable.
The following program uses the DISCRETE option to create a bar chart with a bar for each value of ExamGrade1:
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Examgrade1 / discrete; title 'Grades for First Chemistry Exam'; run;
The following output shows the bar chart:
Specifying a Bar for Each Exam Grade
Grades for First Chemistry Exam 1 Frequency 6 + ** | ** | ** | ** | ** 5 + ** ** ** | ** ** ** | ** ** ** | ** ** ** | ** ** ** 4 + ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** 3 + ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** 2 + ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** 1 + ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** -------------------------------------------------------------------- 39 41 46 48 49 51 55 58 59 62 63 64 67 70 71 73 75 77 79 85 89 98 ExamGrade1The chart shows that in most cases only one or two students earned a given grade. However, clusters of three or more students earned grades of 58, 63, 77, 79, 85, and 89. The mode for this exam (most frequently earned exam grade) is 79.
Note: PROC CHART does not proportionally space the values of a discrete numeric variable on the horizontal axis.
Charting the Frequency of a Character Variable |
You can create charts of a character variable as well as a numeric variable. For instance, to compare enrollment among sections, PROC CHART creates a chart that shows the number of students in each section.
Creating a frequency chart of a character variable is the same as creating a frequency chart of a numeric variable. However, the main difference between charting a numeric variable and charting a character variable is how PROC CHART selects the midpoints. By default, PROC CHART uses each value of a character variable as a midpoint, as if the DISCRETE option were in effect. You can limit the selection of midpoints to a subset of the variable's values, but if you do not define a format for the chart variable, then a single bar, block, or section represents a single value of the variable.
By default, the midpoints that PROC CHART uses for character variables are in alphabetical order. However, you can easily rearrange the order of the midpoints with the MIDPOINTS= option. When you use the MIDPOINTS= option for character variables, you must enclose the value of each midpoint in single or double quotation marks, and the values must correspond to values in the data set. For example,
midpoints='Mon' 'Wed' 'Fri'
uses the three days the class sections meet as midpoints.
The following program uses the MIDPOINTS= option to create a bar chart that shows the number of students enrolled in each section:
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Section / midpoints='Mon' 'Wed' 'Fri'; title 'Enrollment for an Introductory Chemistry Course'; run;
The MIDPOINTS= option alters the chart so that the days of the week appear in chronological rather than alphabetical order.
The following output shows the bar chart:
Ordering Character Midpoints Chronologically
Enrollment for an Introductory Chemistry Course 1 Frequency | ***** ***** | ***** ***** ***** 15 + ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** 10 + ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** 5 + ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** | ***** ***** ***** -------------------------------------------- Mon Wed Fri SectionThe chart shows that the Monday and Wednesday sections have the same number of students; the Friday section has one fewer student.
You can show how a subgroup contributes to each bar or block by using the SUBGROUP= option in the BLOCK statement, HBAR statement, or VBAR statement. For example, you can use the SUBGROUP= option to explore patterns within a population (gender differences).
The SUBGROUP= option defines a variable called the subgroup variable. PROC CHART uses the first character of each value to fill in the portion of the bar or block that corresponds to that value, unless more than one value begins with the same first character. In that case, PROC CHART uses the letters A, B, C, and so on to fill in the bars or blocks.
If you assign a format to the variable, then PROC CHART uses the first character of the formatted value. The characters that PROC CHART uses in the chart and the values that they represent are shown in a legend at the bottom of the chart.
PROC CHART orders the subgroup symbols as A through Z, and as 0 through 9, with the characters in ascending order. PROC CHART calculates the height of a bar or block for each subgroup individually and rounds the percentage of the total bar up or down. So the total height of the bar might be greater or less than the height of the same bar without the SUBGROUP= option.
The following program uses GENDER as the subgroup variable to show how many members in each section are male and female:
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Section / midpoints='Mon' 'Wed' 'Fri' subgroup=Gender; title 'Enrollment for an Introductory Chemistry Course'; run;
The following output shows the bar chart:
Using Gender to Form Subgroups
Enrollment for an Introductory Chemistry Course 1 Frequency | MMMMM MMMMM | MMMMM MMMMM MMMMM 15 + MMMMM MMMMM MMMMM | MMMMM MMMMM MMMMM | MMMMM MMMMM MMMMM | MMMMM MMMMM MMMMM | MMMMM MMMMM MMMMM 10 + MMMMM MMMMM MMMMM | FFFFF MMMMM MMMMM | FFFFF MMMMM FFFFF | FFFFF FFFFF FFFFF | FFFFF FFFFF FFFFF 5 + FFFFF FFFFF FFFFF | FFFFF FFFFF FFFFF | FFFFF FFFFF FFFFF | FFFFF FFFFF FFFFF | FFFFF FFFFF FFFFF -------------------------------------------- Mon Wed Fri Section Symbol Gender Symbol Gender F F M MPROC CHART fills each bar in the chart with the characters that represent the value of the variable GENDER. The portion of the bar that is filled with Fs represents the number of observations that correspond to females; the portion that is filled with Ms represents the number of observations that correspond to males. Because the value of Gender contains a single character (F or M), the symbol that PROC CHART uses as the fill character is identical to the value of the variable.
Charting Mean Values |
PROC CHART enables you to specify what the bars or sections in the chart represent. By default, each bar, block, or section represents the frequency of the chart variable. You can also identify a variable whose values determine the sizes of the bars, blocks, or sections in the chart.
You define a variable called the sumvar variable by using the SUMVAR= option. With the SUMVAR= option, you can also use the TYPE= option to specify whether the sum of the Sumvar variable or the mean of the Sumvar variable determines the size of the bars or sections. The available types are
sums the values of the Sumvar variable in each range. Then PROC CHART uses the sums to determine the size of each bar, block, or section. SUM is the default type.
determines the mean value of the Sumvar variable in each range. Then PROC CHART uses the means to determine the size of each bar, block, or section.
The following program creates a bar chart grouped by gender to compare the mean value of all grades in each section:
options pagesize=40 linesize=80 pageno=1 nodate; proc chart data=grades; vbar Section / midpoints='Mon' 'Wed' 'Fri' group=Gender sumvar=Examgrade1 type=mean; title 'Mean Exam Grade for Introductory Chemistry Sections'; run;
The SUMVAR= option specifies that the values of ExamGrade1 determine the size of the bars. The TYPE=MEAN option specifies to compare the mean grade for each group.
The following output shows the bar chart:
Using the SUMVAR= Option to Compare Mean Values
Mean Exam Grade for Introductory Chemistry Sections 1 ExamGrade1 Mean | ***** 80 + ***** | ***** ***** | ***** ***** ***** | ***** ***** ***** 60 + ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** 40 + ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** 20 + ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** ----------------------------------------------------------------------------- Mon Wed Fri Mon Wed Fri Section |---------- F ----------| |---------- M ----------| GenderThe chart shows that the females in the Friday section achieved the highest mean grade, followed by the males in the same section.
Creating a Three-Dimensional Chart |
Complicated relationships such as the ones charted with the GROUP= option might be easier to understand if you present them as three-dimensional block charts. The following program uses the BLOCK statement to create a block chart for the numeric variable ExamGrade1:
options linesize=120 pagesize=40 pageno=1 nodate; proc chart data=grades; block Section / midpoints='Mon' 'Wed' 'Fri' sumvar=Examgrade1 type=mean group=Gender; format Examgrade1 4.1; title 'Mean Exam Grade for Introductory Chemistry Sections'; run;
The FORMAT statement specifies the number of decimals that PROC CHART uses to report the mean value of ExamGrade1 beneath each block.
Note: If the line size or page size is not sufficient to display all the bars, then PROC CHART produces a horizontal bar chart.
The following output shows the block chart:
Using a Block Chart to Compare Group Means
Mean Exam Grade for Introductory Chemistry Sections 1 Mean of ExamGrade1 by Section grouped by Gender ___ ___ /_ /| ___ /_ /| |**| | /_ /| |**| | |**| | |**| | |**| | |**| | |**| | |**| | |**| | -|**| |--------|**| |---___ -|**| |------- / |**| | / |**| | /_ /| |**| | / / |**| | / |**| | |**| | |**| | / M ___ |**| | ___ |**| | |**| | |**| | / /_ /| |**|/ /_ /| |**|/ |**| | |**|/ / |**| | |**| | |**| | / |**| | 60.3 |**| | 69.8 |**| | 75.3 / Gender /|**| |-------/|**| |-------/|**| |-------/ / |**| | / |**| | / |**| | / / |**| | / |**| | / |**| | / F / |**| | / |**| | / |**| | / / |**|/ / |**|/ / |**|/ / / / / / / 60.7 / 61.4 / 83.6 / /-------------/-------------/-------------/ Mon Wed Fri SectionThe value that is shown beneath each block is the mean of ExamGrade1 for that combination of Section and Gender. You can easily see that both females and males in the Friday section earned higher grades than their counterparts in the other sections.
FOOTNOTE 1: You can use SAS to normalize the data before the chart is created.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.