How the DATA Step Identifies BY Groups

Processing Observations in a BY Group

In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is one of the following positions:
  • the first one in a BY group
  • the last one in a BY group
  • neither the first nor the last one in a BY group
  • both first and last, as is the case when there is only one observation in a BY group
You can take actions conditionally, based on whether you are processing the first or the last observation in a BY group.

Using a Name Literal as the BY-Group Variable

When you designate a name literal as the BY variable in BY-group processing and you want to refer to the corresponding FIRST. or LAST. temporary variables, you must include the FIRST. or LAST. portion of the two-level variable name within quotation marks. Here is an example:
data sedanTypes; 
   set cars; 
   by 'Sedan Types'n; 								
   if 'first.Sedan Types'n then type=1;   
run;  													
For more information about BY-Group Processing and how SAS creates the temporary variables, FIRST and LAST, see How SAS Determines FIRST.variable and LAST.variable and How SAS Identifies the Beginning and End of a BY Group in SAS Statements: Reference..

How SAS Determines FIRST.variable and LAST.variable

When an observation is the first in a BY group, SAS sets the value of FIRST.variable to 1 for the variable whose value changed, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of FIRST.variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the value of LAST.variable to 1 for the variable whose value changes on the next observation, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of LAST.variable is 0. For the last observation in a data set, the value of all LAST.variable variables are set to 1.
Note: See SAS Name Literals for more information about SAS name literals.

Example 1: Grouping Observations by State, City, ZIP Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:
by State City ZipCode;
SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.
options pageno=1 nodate linesize=80 pagesize=60; 
data testfile;
   input State $ ZipCode $ City $ Street $ 19-33; 
   datalines; 
AZ 85730 Tucson   Gleeson Place
FL 33133 Miami    Rice Street
FL 33133 Miami    Thomas Avenue
FL 33133 Miami    Surrey Drive 
FL 33146 Miami    Nervia Street 
FL 33146 Miami    Corsica Street
OH 45056 Miami    Myrtle Street
; 
data test2;
   set testfile;
   by State City ZipCode;
   put _N_= state= first.state= last.state= first.city= last.city=
      first.zipcode= last.zipcode= ;
run; 
NOTE: PROCEDURE PRINTTO used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
79   options pageno=1 nodate linesize=80 pagesize=60;
80   data testfile;
81      input State $ ZipCode $ City $ Street $ 19-33;
82      datalines;
NOTE: The data set WORK.TESTFILE has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
90   ;
91   data test2;
92      set testfile;
93      by State City ZipCode;
94      put _N_= state= first.state= last.state= first.city= last.city=
95         first.zipcode= last.zipcode= ;
96   run;
_N_=1 State=AZ FIRST.State=1 LAST.State=1 FIRST.City=1 LAST.City=1
FIRST.ZipCode=1 LAST.ZipCode=1
_N_=2 State=FL FIRST.State=1 LAST.State=0 FIRST.City=1 LAST.City=0
FIRST.ZipCode=1 LAST.ZipCode=0
_N_=3 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=0 LAST.ZipCode=0
_N_=4 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=0 LAST.ZipCode=1
_N_=5 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=1 LAST.ZipCode=0
_N_=6 State=FL FIRST.State=0 LAST.State=1 FIRST.City=0 LAST.City=1
FIRST.ZipCode=0 LAST.ZipCode=1
_N_=7 State=OH FIRST.State=1 LAST.State=1 FIRST.City=1 LAST.City=1
FIRST.ZipCode=1 LAST.ZipCode=1
NOTE: There were 7 observations read from the data set WORK.TESTFILE.
NOTE: The data set WORK.TEST2 has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
97   proc printto; run;
BY Groups for State, City, and Zipcode
Observations in Four BY Groups
Corresponding FIRST. and LAST. Values
State
City
ZipCode
Street
FIRST. State
LAST. State
FIRST. City
LAST. City
FIRST. ZipCode
LAST. ZipCode
AZ
Tucson
85730
Glen Pl
1
1
1
1
1
1
FL
Miami
33133
Rice St
1
0
1
0
1
0
FL
Miami
33133
Tom Ave
0
0
0
0
0
0
FL
Miami
33133
Surrey Dr
0
0
0
0
0
1
FL
Miami
33146
Nervia St
0
0
0
0
1
0
FL
Miami
33146
Corsica St
0
1
0
1
0
1
OH
Miami
45056
Myrtle St
1
1
1
1
1
1

Example 2: Grouping Observations by City, State, ZIP Code, and Street

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups: City, State, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:
by City State ZipCode;
SAS creates the following temporary variables: FIRST.City, LAST.City, FIRST.State, LAST.State, FIRST.ZipCode, and LAST.ZipCode.
Grouping Observations by City, State, ZIP Code, and Street
Observations in Four BY Groups
Corresponding FIRST. and LAST. Values
City
State
ZipCode
Street
FIRST. City
LAST. City
FIRST. State
LAST. State
FIRST. ZipCode
LAST. ZipCode
Miami
FL
33133
Rice St
1
0
1
0
1
0
Miami
FL
33133
Tom Ave
0
0
0
0
0
0
Miami
FL
33133
Surrey Dr
0
0
0
0
0
1
Miami
FL
33146
Nervia St
0
0
0
0
1
0
Miami
FL
33146
Corsica St
0
0
0
1
0
1
Miami
OH
45056
Myrtle St
0
1
1
1
1
1
Tucson
AZ
85730
Glen Pl
1
1
1
1
1
1

Example 3: A Change Affecting FIRST.variable

The value of FIRST.variable can be affected by a change in a previous value, even if the current value of the variable remains the same.
In this example, the value of FIRST.variable and LAST.variable are dependent on sort order, and not just by the value of the BY variable. For observation 3, the value of FIRST.Y is set to 1 because BLUEBERRY is a new value for Y. This change in Y causes FIRST.Z to be set to 1 as well, even though the value of Z did not change.
options pageno=1 nodate linesize=80 pagesize=60; 

data testfile;
   input x $ y $ 9-17 z $ 19-26; 
   datalines; 
apple   banana    coconut
apple   banana    coconut 
apple   blueberry citron
apricot blueberry citron
; 

data _null_;
   set testfile;
   by x y z;
   if _N_=1 then put 'Grouped by X Y Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run; 

data _null_;
   set testfile;
   by y x z;
   if _N_=1 then put 'Grouped by Y X Z';
   put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;
Partial SAS Log Showing the Results of Processing with BY Variables
Grouped by X Y Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1

Grouped by Y X Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=1
LAST.z=1