BY-Group Processing in the DATA Step

How the DATA Step Identifies BY Groups

In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is

the first one in a BY group
the last one in a BY group
neither the first nor the last one in a BY group
both first and last, as is the case when there is only one observation in a BY group

You can take actions conditionally, based on whether you are processing the first or the last observation in a BY group.

How SAS Determines FIRST.VARIABLE and LAST.VARIABLE

When an observation is the first in a BY group, SAS sets the value of FIRST.variable to 1 for the variable whose value changed, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of FIRST.variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the value of LAST.variable to 1 for the variable whose value changes on the next observation, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of LAST.variable is 0. For the last observation in a data set, the value of all LAST.variable variables are set to 1.

Grouping Observations by State, City, Zip Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

by State City ZipCode;

SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups				Corresponding FIRST. and LAST. Values
State	City	ZipCode	Street	FIRST. State	LAST. State	FIRST. City	LAST. City	FIRST. ZipCode	LAST. ZipCode
AZ	Tucson	85730	Glen Pl	1	1	1	1	1	1
FL	Miami	33133	Rice St	1	0	1	0	1	0
FL	Miami	33133	Tom Ave	0	0	0	0	0	0
FL	Miami	33133	Surrey Dr	0	0	0	0	0	1
FL	Miami	33146	Nervia St	0	0	0	0	1	0
FL	Miami	33146	Corsica St	0	1	0	1	0	1
OH	Miami	45056	Myrtle St	1	1	1	1	1	1

Grouping Observations by City, State, Zip Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: City, State, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

by City State ZipCode;

SAS creates the following temporary variables: FIRST.City, LAST.City, FIRST.State, LAST.State, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups				Corresponding FIRST. and LAST. Values
City	State	ZipCode	Street	FIRST. City	LAST. City	FIRST. State	LAST. State	FIRST. ZipCode	LAST. ZipCode
Miami	FL	33133	Rice St	1	0	1	0	1	0
Miami	FL	33133	Tom Ave	0	0	0	0	0	0
Miami	FL	33133	Surrey Dr	0	0	0	0	0	1
Miami	FL	33146	Nervia St	0	0	0	0	1	0
Miami	FL	33146	Corsica St	0	0	0	1	0	1
Miami	OH	45056	Myrtle St	0	1	1	1	1	1
Tucson	AZ	85730	Glen Pl	1	1	1	1	1	1

Grouping Observations: Another Example

The value of FIRST.variable can be affected by a change in a previous value, even if the current value of the variable remains the same.

In this example, the value of FIRST.variable and LAST.variable are dependent on sort order, and not just by the value of the BY variable. For observation 3, the value of FIRST.Y is set to 1 because BLUEBERRY is a new value for Y . This change in Y causes FIRST.Z to be set to 1 as well, even though the value of Z did not change.

options pageno=1 nodate linesize=80 pagesize=60; 

data testfile;
   input x $ y $ 9-17 z $ 19-26; 
   datalines; 
apple   banana    coconut
apple   banana    coconut 
apricot blueberry citron
; 

data _null_;
   set testfile;
   by x y z;
   if _N_=1 then put 'Grouped by X Y Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run; 

data _null_;
   set testfile;
   by y x z;
   if _N_=1 then put 'Grouped by Y X Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;

Partial SAS Log Showing the Results of Processing with BY Variables

Grouped by X Y Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1

Grouped by Y X Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=1 LAST.z=1

Top of Page