Previous Page | Next Page

BY-Group Processing in the DATA Step

How the DATA Step Identifies BY Groups


Processing Observations in a BY Group

In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is

You can take actions conditionally, based on whether you are processing the first or the last observation in a BY group.

How SAS Determines FIRST.VARIABLE and LAST.VARIABLE

When an observation is the first in a BY group, SAS sets the value of FIRST.variable to 1 for the variable whose value changed, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of FIRST.variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the value of LAST.variable to 1 for the variable whose value changes on the next observation, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of LAST.variable is 0. For the last observation in a data set, the value of all LAST.variable variables are set to 1.


Grouping Observations by State, City, Zip Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

by State City ZipCode;

SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups
Corresponding FIRST. and LAST. Values

State



City



ZipCode



Street



FIRST. State



LAST. State


FIRST. City



LAST. City



FIRST. ZipCode



LAST. ZipCode





AZ Tucson 85730 Glen Pl 1 1 1 1 1 1

FL Miami 33133 Rice St 1 0 1 0 1 0

FL Miami 33133 Tom Ave 0 0 0 0 0 0

FL Miami 33133 Surrey Dr 0 0 0 0 0 1

FL Miami 33146 Nervia St 0 0 0 0 1 0

FL Miami 33146 Corsica St 0 1 0 1 0 1

OH Miami 45056 Myrtle St 1 1 1 1 1 1


Grouping Observations by City, State, Zip Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: City, State, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

by City State ZipCode;

SAS creates the following temporary variables: FIRST.City, LAST.City, FIRST.State, LAST.State, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups
Corresponding FIRST. and LAST. Values

City



State



ZipCode



Street



FIRST. City



LAST. City



FIRST. State



LAST. State



FIRST. ZipCode



LAST. ZipCode





Miami FL 33133 Rice St 1 0 1 0 1 0

Miami FL 33133 Tom Ave 0 0 0 0 0 0

Miami FL 33133 Surrey Dr 0 0 0 0 0 1

Miami FL 33146 Nervia St 0 0 0 0 1 0

Miami FL 33146 Corsica St 0 0 0 1 0 1

Miami OH 45056 Myrtle St 0 1 1 1 1 1

Tucson AZ 85730 Glen Pl 1 1 1 1 1 1


Grouping Observations: Another Example

The value of FIRST.variable can be affected by a change in a previous value, even if the current value of the variable remains the same.

In this example, the value of FIRST.variable and LAST.variable are dependent on sort order, and not just by the value of the BY variable. For observation 3, the value of FIRST.Y is set to 1 because BLUEBERRY is a new value for Y . This change in Y causes FIRST.Z to be set to 1 as well, even though the value of Z did not change.

options pageno=1 nodate linesize=80 pagesize=60; 

data testfile;
   input x $ y $ 9-17 z $ 19-26; 
   datalines; 
apple   banana    coconut
apple   banana    coconut 
apricot blueberry citron
; 

data _null_;
   set testfile;
   by x y z;
   if _N_=1 then put 'Grouped by X Y Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run; 

data _null_;
   set testfile;
   by y x z;
   if _N_=1 then put 'Grouped by Y X Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run; 

Partial SAS Log Showing the Results of Processing with BY Variables

Grouped by X Y Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1

Grouped by Y X Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=1 LAST.z=1

Previous Page | Next Page | Top of Page