GROUP BY Clause

Specifies how to group the data for summarizing.

See: Creating a View from a Query's Result
Joining Two Tables and Calculating a New Value

Syntax

GROUP BY group-by-item <, …, group-by-item>

Required Argument

group-by-item

is one of the following:

integer

is a positive integer that equates to a column's position.

column-name

is the name of a column or a column alias. See column-name.

sql-expression

Details

  • You can specify more than one group-by-item to get more detailed reports. Both the grouping of multiple items and the BY statement of a PROC step are evaluated in similar ways. If more than one group-by-item is specified, then the first one determines the major grouping.
  • Integers can be substituted for column names (that is, SELECT object-items) in the GROUP BY clause. For example, if the group-by-item is 2, then the results are grouped by the values in the second column of the SELECT clause list. Using integers can shorten your coding and enable you to group by the value of an unnamed expression in the SELECT list. Note that if you use a floating-point value (for example, 2.3), then PROC SQL ignores the decimal portion.
  • The data does not have to be sorted in the order of the group-by values because PROC SQL handles sorting automatically. You can use the ORDER BY clause to specify the order in which rows are displayed in the result table.
  • If you specify a GROUP BY clause in a query that does not contain a summary function, then your clause is transformed into an ORDER BY clause and a message to that effect is written to the SAS log.
  • You can group the output by the values that are returned by an expression. For example, if X is a numeric variable, then the output of the following is grouped by the integer portion of values of X:
    select x, sum(y)
    from table1
    group by int(x);
    Similarly, if Y is a character variable, then the output of the following is grouped by the second character of values of Y:
    select sum(x), y
    from table1
    group by substring(y from 2 for 1);
    Note that an expression that contains only numeric literals (and functions of numeric literals) or only character literals (and functions of character literals) is ignored.
    An expression in a GROUP BY clause cannot be a summary function. For example, the following GROUP BY clause is not valid:
       group by sum(x)