TABULATE Procedure

Example 13: Using Denominator Definitions to Display Basic Frequency Counts and Percentages

Features:
TABLE statement:
ALL class variable
denominator definitions (angle bracket operators)
N statistic
PCTN statistic
Other features:

FORMAT procedure

Details

Crosstabulation tables (also called contingency tables or stub-and-banner reports) show combined frequency distributions for two or more variables. This table shows frequency counts for females and males within each of four job classes. The table also shows the percentage that each frequency count represents the following:
  • the total women and men in that job class (row percentage)
  • the total for that gender in all job classes (column percentage)
  • the total for all employees

Program

data jobclass;
   input Gender Occupation @@;
   datalines;
1 1  1 1  1 1  1 1  1 1  1 1  1 1
1 2  1 2  1 2  1 2  1 2  1 2  1 2
1 3  1 3  1 3  1 3  1 3  1 3  1 3
1 1  1 1  1 1  1 2  1 2  1 2  1 2
1 2  1 2  1 3  1 3  1 4  1 4  1 4
1 4  1 4  1 4  1 1  1 1  1 1  1 1
1 1  1 2  1 2  1 2  1 2  1 2  1 2
1 2  1 3  1 3  1 3  1 3  1 4  1 4
1 4  1 4  1 4  1 1  1 3  2 1  2 1
2 1  2 1  2 1  2 1  2 1  2 2  2 2
2 2  2 2  2 2  2 3  2 3  2 3  2 4
2 4  2 4  2 4  2 4  2 4  2 1  2 3
2 3  2 3  2 3  2 3  2 4  2 4  2 4
2 4  2 4  2 1  2 1  2 1  2 1  2 1
2 2  2 2  2 2  2 2  2 2  2 2  2 2
2 3  2 3  2 4  2 4  2 4  2 1  2 1
2 1  2 1  2 1  2 2  2 2  2 2  2 3
2 3  2 3  2 3  2 4
;
proc format;
   value gendfmt 1='Female'
                 2='Male'
             other='*** Data Entry Error ***';
   value occupfmt 1='Technical'
                  2='Manager/Supervisor'
                  3='Clerical'
                  4='Administrative'
              other='*** Data Entry Error ***';
run;
proc tabulate data=jobclass format=8.2;
   class gender occupation;
    table (occupation='Job Class' all='All Jobs')
              *(n='Number of employees'*f=9.
              pctn<gender all>='Percent of row total'
              pctn<occupation all>='Percent of column total'
              pctn='Percent of total'),
    gender='Gender' all='All Employees'/ rts=50;
   format gender gendfmt. occupation occupfmt.;
   title 'Gender Distribution';
   title2 'within Job Classes';
run;

Program Description

Create the JOBCLASS data set. JOBCLASS contains encoded information about the gender and job class of employees at a fictitious company.
data jobclass;
   input Gender Occupation @@;
   datalines;
1 1  1 1  1 1  1 1  1 1  1 1  1 1
1 2  1 2  1 2  1 2  1 2  1 2  1 2
1 3  1 3  1 3  1 3  1 3  1 3  1 3
1 1  1 1  1 1  1 2  1 2  1 2  1 2
1 2  1 2  1 3  1 3  1 4  1 4  1 4
1 4  1 4  1 4  1 1  1 1  1 1  1 1
1 1  1 2  1 2  1 2  1 2  1 2  1 2
1 2  1 3  1 3  1 3  1 3  1 4  1 4
1 4  1 4  1 4  1 1  1 3  2 1  2 1
2 1  2 1  2 1  2 1  2 1  2 2  2 2
2 2  2 2  2 2  2 3  2 3  2 3  2 4
2 4  2 4  2 4  2 4  2 4  2 1  2 3
2 3  2 3  2 3  2 3  2 4  2 4  2 4
2 4  2 4  2 1  2 1  2 1  2 1  2 1
2 2  2 2  2 2  2 2  2 2  2 2  2 2
2 3  2 3  2 4  2 4  2 4  2 1  2 1
2 1  2 1  2 1  2 2  2 2  2 2  2 3
2 3  2 3  2 3  2 4
;
Create the GENDFMT. and OCCUPFMT. formats.PROC FORMAT creates formats for the variables Gender and Occupation.
proc format;
   value gendfmt 1='Female'
                 2='Male'
             other='*** Data Entry Error ***';
   value occupfmt 1='Technical'
                  2='Manager/Supervisor'
                  3='Clerical'
                  4='Administrative'
              other='*** Data Entry Error ***';
run;
Create the report and specify the table options.The FORMAT= option specifies the 8.2 format as the default format for the value in each table cell.
proc tabulate data=jobclass format=8.2;
Specify subgroups for the analysis.The CLASS statement identifies Gender and Occupation as class variables.
   class gender occupation;
    table (occupation='Job Class' all='All Jobs')
              *(n='Number of employees'*f=9.
              pctn<gender all>='Percent of row total'
              pctn<occupation all>='Percent of column total'
              pctn='Percent of total'),
Define the table columns and specify the amount of space for row headings.The column dimension creates a column for each formatted value of Gender and for all employees. Text in quotation marks supplies the heading for the corresponding column. The RTS= option provides 50 characters per line for row headings.
    gender='Gender' all='All Employees'/ rts=50;
Format the output. The FORMAT statement assigns formats to the variables Gender and Occupation.
   format gender gendfmt. occupation occupfmt.;
Specify the titles.
   title 'Gender Distribution';
   title2 'within Job Classes';
run;

Output

Gender Distribution within Job Classes

Details

Overview

The part of the TABLE statement that defines the rows of the table uses the PCTN statistic to calculate three different percentages.
In all calculations of PCTN, the numerator is N, the frequency count for one cell of the table. The denominator for each occurrence of PCTN is determined by the denominator definition. The denominator definition appears in angle brackets after the keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE which frequency counts to sum for the denominator.

Analyzing the Structure of the Table

Taking a close look at the structure of the table helps you understand how PROC TABULATE uses the denominator definitions. The following simplified version of the TABLE statement clarifies the basic structure of the table:
table occupation='Job Class' all='All Jobs',
      gender='Gender' all='All Employees';
The table is a concatenation of four subtables. In this report, each subtable is a crossing of one class variable in the row dimension and one class variable in the column dimension. Each crossing establishes one or more categories. A category is a combination of unique values of class variables, such as female, technical or all, clerical
The following table describes each subtable.
Contents of Subtables
Class Variables Contributing to the Subtable
Description of Frequency Counts
Number of Categories
Occupation and Gender
Number of females in each job or number of males in each job
8
All and Gender
Number of females or number of males
2
Occupation and All
Number of people in each job
4
All and All
Number of people in all jobs
1
The following figure highlights these subtables and the frequency counts for each category.
Illustration of the Four Subtables
Illustration of the Four Subtables

Interpreting Denominator Definitions

The following fragment of the TABLE statement defines the denominator definitions for this report. The PCTN keyword and the denominator definitions are highlighted.
table (occupation='Job Class' all='All Jobs')
            *(n='Number of employees'*f=5.
              pctn<gender all>='Row percent'
              pctn<occupation all>='Column percent'
              pctn='Percent of total'),
Each use of PCTN nests a row of statistics within each value of Occupation and All. Each denominator definition tells PROC TABULATE which frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.

Row Percentages

The part of the TABLE statement that calculates the row percentages and that labels the row is
   pctn<gender all>='Row percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
Subtable 1: Occupation and Gender
Subtable 1: Occupation and Gender
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks whether Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender within the same value of Occupation.
For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of Occupation is technical. There are two such categories: female, technical and male, technical. The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18, or 34.
Subtable 2: All and Gender
Subtable 2: All and Gender
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks whether Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender in the subtable.
For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male. The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.
Subtable 3: Occupation and All
Subtable 3: Occupation and All
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks whether Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
For example, the denominator for the category clerical, all is the frequency count for that category, 28.
Note: In these table cells, because the numerator and the denominator are the same, the row percentages in this subtable are all 100.
Subtable 4: All and All
Subtable 4: All and All
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks whether Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all. The denominator for this category is 123.
Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.

Column Percentages

The part of the TABLE statement that calculates the column percentages and labels the row is
   pctn<occupation all>='Column percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
Subtable 1: Occupation and Gender
Subtable 1: Occupation and Gender
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks whether Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation within the same value of Gender.
For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of Gender is male. There are four such categories: technical, male; manager/supervisor, male; clerical, male; and administrative, male. The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15, or 62.
Subtable 2: All and Gender
Subtable 2: All and Gender
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks whether Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for All as the denominator.
For example, the denominator for the category all, female is the frequency count for that category, 61.
Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.
Subtable 3: Occupation and All
Subtable 3: Occupation and All
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks whether Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation in the subtable.
For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all; manager/supervisor, all; clerical, all; and administrative, all. The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26, or 123.
Subtable 4: All and All
Subtable 4: All and All
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks whether Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all. The frequency count for this category is 123.
Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.

Total Percentages

The part of the TABLE statement that calculates the total percentages and labels the row is
   pctn='Total percent'
If you do not specify a denominator definition, then PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. The following table summarizes the process for all subtables in this example.
Denominators for Total Percentages
Class Variables Contributing to the Subtable
Frequency Counts
Total
Occupat and Gender
16, 18, 20, 15 14, 14, 11, 15
123
Occupat and All
34, 35, 28, 26
123
Gender and All
61, 62
123
All and All
123
123
Consequently, the denominator for total percentages is always 123.