BY Statement

Processes data and orders output according to the BY group.
Used by: GAREABAR, GBARLINE, GCHART, GCONTOUR, GMAP, GPLOT, GRADAR, GREDUCE, GTILE, G3D, G3GRID
Type: DATA step statement

Syntax

BY<DESCENDING> variable-1
<...<DESCENDING> variable-n>
<NOTSORTED> ;

Required Argument

variable
specifies the variable that the procedure uses to form BY groups. You can specify more than one variable. By default, the procedure expects observations in the data set to be sorted in ascending order by all the variables that you specify or to be indexed appropriately.

Optional Arguments

DESCENDING
indicates that the data set is sorted in descending order by the specified variable. The option affects only the variable that immediately follows the option name, and must be repeated before every variable that is not sorted in ascending order. For example, this BY statement indicates that observations in the input data set are arranged in descending order of VAR1 values and ascending order of VAR2 values:
by descending var1 var2;
This BY statement indicates that the input data set is sorted in descending order of both VAR1 and VAR2 values:
by descending var1 descending var2;
NOTSORTED
specifies that observations with the same BY value are grouped together, but are not necessarily sorted in alphabetical or numeric order. The observations can be grouped in another way (for example, in chronological order).
NOTSORTED can appear anywhere in the BY statement and affects all variables specified in the statement. NOTSORTED overrides DESCENDING if both appear in the same BY statement.
The requirement for ordering or indexing observations according to the values of BY variables is suspended when you use the NOTSORTED option. In fact, the procedure does not use an index if you specify NOTSORTED. For NOTSORTED, the procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same value for the BY variables are not contiguous, the procedure treats each new value that it encounters as the first observation in a new BY group. The procedure creates a graph for that value, even if it is only one observation.

Details

Description: BY Statement

The BY statement divides the observations from an input data set into groups for processing. Each set of contiguous observations with the same value for a specified variable is called a BY group. A variable that defines BY groups is called a BY variable and is the variable that is specified in the BY statement. When you use a BY statement, the graphics procedure performs the following operations:
  • processes each group of observations independently
  • generates a separate graph or output for each BY group
  • automatically adds a heading called a BY line to each graph identifying the BY group represented in the graph
  • adds BY statement information below the Description field of the catalog entry
By default, the procedure expects the observations in the input data set to be sorted in ascending order of the BY variable values.
Note: The BY statement in SAS/GRAPH is essentially the same as the BY statement in Base SAS. However, the effect on the output is different when it is used with SAS/GRAPH procedures.

Preparing Data for BY-Group Processing

Unless you specify the NOTSORTED option, observations in the input data set must be in ascending numeric or alphabetic order. To prepare the data set, either sort it with the SORT procedure using the same BY statement that you plan to use in the target SAS/GRAPH procedure or create an appropriate index on the BY variables.
If the procedure encounters an observation that is out of the proper order, it issues an error message.
If you need to group data in some other order, you can still use BY-group processing. To do so, process the data so that observations are arranged in contiguous groups that have the same BY-variable values and specify the NOTSORTED option in the BY statement.
For an example of sorting the input data set, see Using BY-group Processing to Generate a Series of Charts.

Controlling BY Lines

Understanding Default Behavior

By default, the BY statement prints a BY line above each graph that contains the variable name followed by an equal sign and the variable value. For example, if you specify BY SITE in the procedure, the default heading when the value of SITE is London would be SITE=London.

Suppressing the BY Line

To suppress the entire BY line, use the NOBYLINE option in an OPTION statement or specify HBY=0 in the GOPTIONS statement. See Using BY-group Processing to Generate a Series of Charts.

Suppressing the Name of the BY Variable

To suppress the variable name and the equal sign in the heading and leave only the BY value, use the LABEL statement to assign a null label ("00"X) to the BY variable. For example, this statement assigns a null label to the SITE variable:
label site="00"x;

Controlling the Appearance of the BY Line

To control the color, font, and height of the BY lines, use the following graphics options in a GOPTIONS statement:
CBY=BY-line-color
specifies the color for BY lines.
FBY=font
specifies the font for BY lines.
HBY=n<units>
specifies the height for BY lines.
For a description of each option, see the Graphics Options and Device Parameters Dictionary.

Naming the Catalog Entries

The catalog entries generated with BY-group processing always use incremental naming. This means that the first entry created by the procedure uses the base name and subsequent entries increment that name. The base name is either the default entry name for the procedure (for example, GPLOT) or the name specified with the NAME= option in the action statement. Incrementing the base name automatically appends a number to each subsequent entry (for example, GPLOT1, GPLOT2, and so on). See Specifying the Catalog Name and Entry Name for Your GRSEGs . For an example of incremented catalog names, see Combining Graphs and Reports in a Web Page.

Using the BY Statement

Overview

This section describes the following:
  • the effect of BY-group processing on the GCHART, GMAP, and GPLOT procedures
  • the interaction between BY-group and RUN-group processing
  • the requirements for using BY-group processing with the Annotate facility
  • how to include BY information in axis labels, reference labels and tick mark values, donut labels, legend labels and values, and titles, notes, and footnotes
  • how patterns and symbols are assigned to BY-groups
  • the effect of using BY-group processing with the ODS HTML statement
For additional information about any of these topics, refer to the appropriate chapter.

Using the BY Statement with the GCHART Procedure

When you use BY-group processing with the GCHART procedure, you can do the following tasks:
  • With the BLOCK, HBAR, and VBAR statements, you can use the PATTERNID=BY option to assign patterns according to BY groups. With PATTERNID=BY, each BY group uses a different PATTERN definition, but all bars or blocks within a BY group use the same pattern. For further information, see Example: PATTERN and SYMBOL Definitions with BY Groups in the GCHART Procedure. .
  • With the BLOCK statement, you can use the BLOCKMAX= option to produce the same block-height scaling in all block charts in a BY group.
  • With the HBAR or VBAR statement, you can use the RAXIS= option to produce the same response axis scaling in all horizontal or vertical bar charts in a BY group.
  • With the DONUT statement, you can use the LABEL= option to substitute a BY variable value or name for #BYVAL or #BYVAR in a text string to display in the donut hole label.
With the PIE and STAR statements, the effect of a BY statement is similar to that of the GROUP= option. The exception is that the GROUP= option enables you to put more than one graph on a single page while the BY statement does not. Do not use a BY variable as the group variable in STAR or PIE statements.

Using the BY Statement with the GMAP Procedure

By default, BY-group processing affects both the map data set and the response data set. This means that you get separate, individual output for each map area common to both data sets. For example, suppose the map data set REGION contains six states and the response data set contains the same six states. If you specify BY STATE in the GMAP procedure, the resulting output is six graphs with one state on each graph.
If you use the ALL option in the PROC GMAP statement and you also use the BY statement, you get one output for each map area in the response data set, but that output displays all the map areas in the map data set. Only one map area per output contains response data information; the others are empty. For example, suppose you create a block map using the data sets REGION and SALES, specify BY STATE, and include the ALL option in the PROC GMAP statement. The resulting output is six graphs with six states on each graph. One state per graph has a block; the remaining five are empty. The UNIFORM option applies colors and heights uniformly across all BY-groups.

Using the BY Statement with the GPLOT Procedure

You can use the UNIFORM option in the PROC GPLOT statement to produce the same axis scaling for all graphs in a BY group. By default, the range of the axes can vary from graph to graph, but UNIFORM forces the scaling to be the same for all graphs generated by the procedure.
The UNIFORM option applies colors and heights uniformly across all BY-groups.

Using the BY Statement with the RUN Groups

If you use the BY statement with a procedure that processes data and supports RUN-group processing (the GCHART, GMAP, and GPLOT procedures), then each time you submit an action statement or a RUN statement, you get a separate graph for each value of the BY variable. For example, each of these two RUN-groups produces a separate plot for every value of the BY variable SITE:
/* first run group*/
proc gplot data=sales;
   title1 "Sales Summary";
   by site;
   plot sales*model_a;
run;
   /* second run group */
   plot sales*model_b;
run;
quit;
The BY statement stays in effect for every subsequent RUN group until you submit another BY statement or exit the procedure. Variables in subsequent BY statements replace any previous BY variables.
You can also turn off BY-group processing by submitting a null BY statement (BY;) in a RUN group. Do this with care however, because the null BY statement turns off BY-group processing and the RUN group generates a graph.
For more information, see RUN-Group Processing.

Using the BY Statement with the Annotate Facility

If a procedure that is using BY-group processing also specifies annotation with the ANNOTATE= option in the PROC statement, the same annotation is applied to every graph generated by the procedure.
If you specify annotation with the ANNOTATE= option in the action statements for a procedure, the BY-group processing is applied to the Annotate data set. In this way, you can customize the annotation for the output from each BY group by including the BY variable in the Annotate data set and by using each BY-variable value as a condition for the annotation to be applied to the output for that value.

Using the BY Statement with AXIS, LEGEND, TITLE, FOOTNOTE, and NOTE Statements

AXIS and LEGEND statements can automatically include the BY variable name or BY variable value in the text that they produce for labels, reference labels, values for major tick marks, and legend labels and values. In addition, TITLE, FOOTNOTE, and NOTE statements can automatically include the BY lines in the text that they produce. To insert BY line information into the text strings used by these statements, use the appropriate #BYVAR, #BYVAL, and #BYLINE substitution options. For an example, see Using BY-group Processing to Generate a Series of Charts.

Using the BY Statement with PATTERN and SYMBOL Definitions

By default, when using a BY statement, the graph for each BY group uses the same patterns or symbols in their defined order. For example, if the BY variable contains four values and there are two response levels for each BY value, the PATTERN1 and PATTERN2 or SYMBOL1 and SYMBOL2 statements are used for each graph. Each BY-group starts over with PATTERN1 or SYMBOL1. The UNIFORM option in the GMAP procedure changes this behavior.

Example: PATTERN and SYMBOL Definitions with BY Groups in the GCHART Procedure

The GCHART procedure, when used with SYMBOL or PATTERN definitions, assigns the symbols or patterns in order to each BY group. For example, if the BY variable REGION has four values—East, North, South, and West—the patterns are assigned to the BY-groups in this order:
  1. PATTERN1 is assigned to East.
  2. PATTERN2 is assigned to North.
  3. PATTERN3 is assigned to South.
  4. PATTERN4 is assigned to West..
If you create sets of graphs from several data sets containing the variable REGION, and if you want the same pattern assigned to the same region each time, you must make sure that REGION always has the same four values. Otherwise, the patterns might not be the same across graphs. For example, if the value North is missing from the data, the patterns are assigned as follows:
  1. PATTERN1 is assigned to East.
  2. PATTERN2 is assigned to South.
  3. PATTERN3 is assigned to West..
In this case, South is assigned pattern 2 instead of pattern 3 and West is assigned pattern 3 instead of pattern 4. To avoid this, include the value North for the variable REGION, but assign it a missing value for all other variables.