Creating Summary Tables with the TABULATE Procedure |
Required Statements for the TABULATE Procedure |
The TABULATE procedure requires three statements, usually in the following order:
Note that there can be multiple CLASS statements, VAR statements and TABLE statements.
Begin with the PROC TABULATE Statement |
The TABULATE procedure begins with a PROC TABULATE statement. Many options are available with the PROC TABULATE statement; however, most of the examples in this section use only two options, the DATA= option and the FORMAT= option. The PROC TABULATE statement that follows is used for all of the examples in this section:
proc tabulate data=year_sales format=comma10.;
You can direct PROC TABULATE to use a specific SAS data set with the DATA= option. If you omit the DATA= option in the current job or session, then the TABULATE procedure uses the SAS data set that was created most recently.
You can specify a default format for PROC TABULATE to apply to the value in each cell in the table with the FORMAT= option. You can specify any valid SAS numeric format or user-defined format.
Specify Class Variables with the CLASS Statement |
Use the CLASS statement to specify which variables are class variables. Class variables (that is, classification variables) contain values that are used to form categories. In summary tables, the categories are used as the column, row, and page headings. The categories are crossed to obtain descriptive statistics. See Crossing Variables for an example of crossing categories (variable values).
Class variables can be either character or numeric. The default statistic for class variables is N, which is the frequency or number of observations in the data set for which there are nonmissing variable values.
The following CLASS statement specifies the variables SalesRep and Type as class variables:
class SalesRep Type;
For important information about how PROC TABULATE behaves when class variables that have missing values are listed in a CLASS statement but are not used in a TABLE statement, see Identifying Missing Values for Class Variables.
Specify Analysis Variables with the VAR Statement |
Use the VAR statement to specify which variables are analysis variables. Analysis variables contain numeric values for which you want to compute statistics. The default statistic for analysis variables is SUM.
The following VAR statement specifies the variable AmountSold as an analysis variable:
var AmountSold;
Define the Table Structure with the TABLE Statement |
Use the TABLE statement to define the structure of the table that you want PROC TABULATE to produce. A TABLE statement consists of one to three dimension expressions, separated by commas. Dimension expressions define the columns, rows, and pages of a summary table. Options can follow dimension expressions. You must specify at least one TABLE statement, because there is no default table in a PROC TABULATE step. Here are three variations of the syntax for a basic TABLE statement:
TABLE column-expression; TABLE row-expression, column-expression; TABLE page-expression, row-expression, column-expression;
Here is an example of a basic TABLE statement with three dimension expressions:
table SalesRep, Type, AmountSold;
This TABLE statement defines a three-dimensional summary table that places the values of the variable AmountSold in the column dimension, the values of the variable Type in the row dimension, and the values of the variable SalesRep in the page dimension.
Here are restrictions on the TABLE statement:
Every variable that is used in a dimension expression in a TABLE statement must appear in either a CLASS statement or a VAR statement, but not both.
All analysis variables must be in the same dimension and cannot be crossed. Therefore, only one dimension of any TABLE statement can contain analysis variables.
Identifying Missing Values for Class Variables |
You can identify missing values for class variables with the MISSING option. By default, if an observation contains a missing value for any class variable, that observation will be excluded from all tables even if the variable does not appear in the TABLE statement for one or more tables. Therefore, it is helpful to run your program at least once with the MISSING option to identify missing values.
The MISSING option creates a separate category in the summary table for missing values. It can be used with the PROC TABULATE statement or the CLASS statement. If you specify the MISSING option in the PROC TABULATE statement, the procedure considers missing values as valid levels for all class variables:
proc tabulate data=year_sales format=comma10. missing; class SalesRep; class Month Quarter; var AmountSold;
Because the MISSING option is in the PROC TABULATE statement in this example, observations with missing values for SalesRep, Month, or Quarter will display in the summary table.
If you specify the MISSING option in a CLASS statement, PROC TABULATE considers missing values as valid levels for the class variable(s) that are specified in that CLASS statement:
proc tabulate data=year_sales format=comma10.; class SalesRep; class Month Quarter / missing; var AmountSold;
Because the MISSING option is in the second CLASS statement, observations with missing values for Month or Quarter will display in the summary table, but observations with a missing value for SalesRep will not display.
If you have class variables with missing values in your data set, then you must decide whether or not the observations with the missing values should be omitted from every table. If the observations should not be omitted, then you can fill in the missing values where appropriate or continue to run the PROC TABULATE step with the MISSING option. For other options for handling missing values, see "Handling Missing Data" in PROC TABULATE by Example. For general information about missing values, see "Missing Values" in SAS Language Reference: Concepts.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.