The TABULATE Procedure |
Procedure features: |
| |||||||||
Other features: |
|
Crosstabulation tables (also called contingency tables and stub-and-banner reports) show combined frequency distributions for two or more variables. This table shows frequency counts for females and males within each of four job classes. The table also shows the percentage that each frequency count represents of
Program |
options nodate pageno=1 linesize=80 pagesize=60; |
proc tabulate data=jobclass format=8.2; |
class gender occupation; |
table (occupation='Job Class' all='All Jobs') *(n='Number of employees'*f=9. pctn<gender all>='Percent of row total' pctn<occupation all>='Percent of column total' pctn='Percent of total'), |
gender='Gender' all='All Employees'/ rts=50; |
format gender gendfmt. occupation occupfmt.; |
title 'Gender Distribution'; title2 'within Job Classes'; run; |
Output |
Gender Distribution 1 within Job Classes -------------------------------------------------------------------------------- | | Gender | | | |-------------------| All | | | Female | Male |Employees| |------------------------------------------------+---------+---------+---------| |Job Class | | | | | |-----------------------+------------------------| | | | |Technical |Number of employees | 16| 18| 34| | |------------------------+---------+---------+---------| | |Percent of row total | 47.06| 52.94| 100.00| | |------------------------+---------+---------+---------| | |Percent of column total | 26.23| 29.03| 27.64| | |------------------------+---------+---------+---------| | |Percent of total | 13.01| 14.63| 27.64| |-----------------------+------------------------+---------+---------+---------| |Manager/Supervisor |Number of employees | 20| 15| 35| | |------------------------+---------+---------+---------| | |Percent of row total | 57.14| 42.86| 100.00| | |------------------------+---------+---------+---------| | |Percent of column total | 32.79| 24.19| 28.46| | |------------------------+---------+---------+---------| | |Percent of total | 16.26| 12.20| 28.46| |-----------------------+------------------------+---------+---------+---------| |Clerical |Number of employees | 14| 14| 28| | |------------------------+---------+---------+---------| | |Percent of row total | 50.00| 50.00| 100.00| | |------------------------+---------+---------+---------| | |Percent of column total | 22.95| 22.58| 22.76| | |------------------------+---------+---------+---------| | |Percent of total | 11.38| 11.38| 22.76| |-----------------------+------------------------+---------+---------+---------| |Administrative |Number of employees | 11| 15| 26| | |------------------------+---------+---------+---------| | |Percent of row total | 42.31| 57.69| 100.00| | |------------------------+---------+---------+---------| | |Percent of column total | 18.03| 24.19| 21.14| | |------------------------+---------+---------+---------| | |Percent of total | 8.94| 12.20| 21.14| |-----------------------+------------------------+---------+---------+---------| |All Jobs |Number of employees | 61| 62| 123| | |------------------------+---------+---------+---------| | |Percent of row total | 49.59| 50.41| 100.00| | |------------------------+---------+---------+---------| | |Percent of column total | 100.00| 100.00| 100.00| | |------------------------+---------+---------+---------| | |Percent of total | 49.59| 50.41| 100.00| --------------------------------------------------------------------------------
A Closer Look |
The part of the TABLE statement that defines the rows of the table uses the PCTN statistic to calculate three different percentages.
In all calculations of PCTN, the numerator is N, the frequency count for one cell of the table. The denominator for each occurrence of PCTN is determined by the denominator definition. The denominator definition appears in angle brackets after the keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE which frequency counts to sum for the denominator.
Taking a close look at the structure of the table helps you understand how PROC TABULATE uses the denominator definitions. The following simplified version of the TABLE statement clarifies the basic structure of the table:
table occupation='Job Class' all='All Jobs', gender='Gender' all='All Employees';
The table is a concatenation of four subtables. In this report, each subtable is a crossing of one class variable in the row dimension and one class variable in the column dimension. Each crossing establishes one or more categories. A category is a combination of unique values of class variables, such as female, technical or all, clerical . The following table describes each subtable.
Class variables contributing to the subtable | Description of frequency counts | Number of categories | |
---|---|---|---|
Occupation and Gender | number of females in each job or number of males in each job | 8 | |
All and Gender | number of females or number of males | 2 | |
Occupation and All | number of people in each job | 4 | |
All and All | number of people in all jobs | 1 |
The following figure highlights these subtables and the frequency counts for each category.
Illustration of the Four Subtables
The following fragment of the TABLE statement defines the denominator definitions for this report. The PCTN keyword and the denominator definitions are highlighted.
table (occupation='Job Class' all='All Jobs') *(n='Number of employees'*f=5. pctn<gender all>='Row percent' pctn<occupation all>='Column percent' pctn='Percent of total'),
Each use of PCTN nests a row of statistics within each value of Occupation and All. Each denominator definition tells PROC TABULATE which frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.
The part of the TABLE statement that calculates the row percentages and that labels the row is
pctn<gender all>='Row percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
Subtable 1: Occupation and Gender
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender within the same value of Occupation.
For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of Occupation is technical . There are two such categories: female, technical and male, technical . The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18, or 34.
Subtable 2: All and Gender
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender in the subtable.
For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male . The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.
Subtable 3: Occupation and All
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
For example, the denominator for the category clerical, all is the frequency count for that category, 28.
Note: In these table cells, because the numerator and the denominator are the same, the row percentages in this subtable are all 100.
Subtable 4: All and All
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all . The denominator for this category is 123.
Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.
The part of the TABLE statement that calculates the column percentages and labels the row is
pctn<occupation all>='Column percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
Subtable 1: Occupation and Gender
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation within the same value of Gender.
For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of Gender is male . There are four such categories: technical, male ; manager/supervisor, male ; clerical, male ; and administrative, male . The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15, or 62.
Subtable 2: All and Gender
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for All as the denominator.
For example, the denominator for the category all, female is the frequency count for that category, 61.
Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.
Subtable 3: Occupation and All
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation in the subtable.
For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all ; manager/supervisor, all ; clerical, all ; and administrative, all . The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26, or 123.
Subtable 4: All and All
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all . The frequency count for this category is 123.
Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.
The part of the TABLE statement that calculates the total percentages and labels the row is
pctn='Total percent'
If you do not specify a denominator definition, then PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. The following table summarizes the process for all subtables in this example.
Class variables contributing to the subtable | Frequency counts | Total | |
---|---|---|---|
Occupat and Gender | 16, 18, 20, 15 14, 14, 11, 15 | 123 | |
Occupat and All | 34, 35, 28, 26 | 123 | |
Gender and All | 61, 62 | 123 | |
All and All | 123 | 123 |
Consequently, the denominator for total percentages is always 123.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.