RETAIN Statement

Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.
Valid in: DATA step
Category: Information
Type: Declarative

Syntax

Without Arguments

If you do not specify an argument, the RETAIN statement causes the values of all variables that are created with INPUT or assignment statements to be retained from one iteration of the DATA step to the next.

Arguments

element-list
specifies variable names, variable lists, or array names whose values you want retained.
Tips:If you specify _ALL_, _CHAR_, or _NUMERIC_, only the variables that are defined before the RETAIN statement are affected.

If a variable name is specified only in the RETAIN statement and you do not specify an initial value, the variable is not written to the data set, and a note stating that the variable is uninitialized is written to the SAS log. If you specify an initial value, the variable is written to the data set.

initial-value
specifies an initial value, numeric or character, for one or more of the preceding elements.
Tip:If you omit initial-value, the initial value is missing. Initial-value is assigned to all the elements that precede it in the list. All members of a variable list, therefore, are given the same initial value.
See:(initial-value) and (initial-value-list)
(initial-value)
specifies an initial value, numeric or character, for a single preceding element or for the first in a list of preceding elements.
(initial-value-list)
specifies an initial value, numeric or character, for individual elements in the preceding list. SAS matches the first value in the list with the first variable in the list of elements, the second value with the second variable, and so on.
Element values are enclosed in quotation marks. To specify one or more initial values directly, use the following format:
(initial-value(s))
To specify an iteration factor and nested sublists for the initial values, use the following format:
<constant-iter-value*> <(>constant value | constant-sublist<)>
Restriction:If you specify both an initial-value-list and an element-list, then element-list must be listed before initial-value-list in the RETAIN statement.
Tips:You can separate initial values by blank spaces or commas.

You can also use a shorthand notation for specifying a range of sequential integers. The increment is always +1.

You can assign initial values to both variables and temporary data elements.

If there are more variables than initial values, the remaining variables are assigned an initial value of missing and SAS issues a warning message.

Details

Default DATA Step Behavior

Without a RETAIN statement, SAS automatically sets variables that are assigned values by an INPUT or assignment statement to missing before each iteration of the DATA step.

Assigning Initial Values

Use a RETAIN statement to specify initial values for individual variables, a list of variables, or members of an array. If a value appears in a RETAIN statement, variables that appear before it in the list are set to that value initially. (If you assign different initial values to the same variable by naming it more than once in a RETAIN statement, SAS uses the last value.) You can also use RETAIN to assign an initial value other than the default value of 0 to a variable whose value is assigned by a sum statement.

Redundancy

It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next:
  • variables that are read with a SET, MERGE, MODIFY or UPDATE statement
  • a variable whose value is assigned in a sum statement
  • the automatic variables _N_, _ERROR_, _I_, _CMD_, and _MSG_
  • variables that are created by the END= or IN= option in the SET, MERGE, MODIFY, or UPDATE statement or by options that create variables in the FILE and INFILE statements
  • data elements that are specified in a temporary array
  • array elements that are initialized in the ARRAY statement
  • elements of an array that have assigned initial values to any or all of the elements in the ARRAY statement.
You can, however, use a RETAIN statement to assign an initial value to any of the previous items, with the exception of _N_ and _ERROR_.

Comparisons

The RETAIN statement specifies variables whose values are not set to missing at the beginning of each iteration of the DATA step. The KEEP statement specifies variables that are to be included in any data set that is being created.

Examples

Example 1: Basic Usage

  • This RETAIN statement retains the values of variables MONTH1 through MONTH5 from one iteration of the DATA step to the next:
    retain month1-month5;
  • This RETAIN statement retains the values of nine variables and sets their initial values:
    retain month1-month5 1 year 0 a b c 'XYZ';
    The values of MONTH1 through MONTH5 are set initially to 1; YEAR is set to 0; variables A, B, and C are each set to the character value XYZ.
  • This RETAIN statement assigns the initial value 1 to the variable MONTH1 only:
    retain month1-month5 (1);
    Variables MONTH2 through MONTH5 are set to missing initially.
  • This RETAIN statement retains the values of all variables that are defined earlier in the DATA step but not the values that are defined afterwards:
    retain _all_;
  • All of these statements assign initial values of 1 through 4 to VAR1 through VAR4:
    • retain var1-var4 (1 2 3 4);
    • retain var1-var4 (1,2,3,4);
    • retain var1-var4(1:4);

Example 2: Overview of the RETAIN Operation

This example shows how to use variable names and array names as elements in the RETAIN statement and shows assignment of initial values with and without parentheses:
data _null_;
   array City{3} $ City1-City3;
   array cp{3} Citypop1-Citypop3;
   retain Year Taxyear 1999 City ' ' 
          cp (10000,50000,100000);
   file file-specification print;
   put 'Values at beginning of DATA step:' 
       / @3 _all_ /;
   input Gain;
   do i=1 to 3;
      cp{i}=cp{i}+Gain;
   end;
   put 'Values after adding Gain to city populations:'
       / @3 _all_;
   datalines;
5000
10000
;
Here are the initial values assigned by RETAIN:
  • Year and Taxyear are assigned the initial value 1999.
  • City1, City2, and City3 are assigned missing values.
  • Citypop1 is assigned the value 10000.
  • Citypop2 is assigned 50000.
  • Citypop3 is assigned 100000.
Here are the lines written by the PUT statements:
Values at beginning of DATA step:
  City1=  City2=  City3=  Citypop1=10000 
  Citypop2=50000 Citypop3=100000
Year=1999 Taxyear=1999 Gain=. i=. 
_ERROR_=0 _N_=1
Values after adding GAIN to city populations:
  City1=  City2=  City3=  Citypop1=15000 
  Citypop2=55000 Citypop3=105000
Year=1999 Taxyear=1999 Gain=5000 i=4 
_ERROR_=0 _N_=1
Values at beginning of DATA step:
  City1=  City2=  City3=  Citypop1=15000 
  Citypop2=55000 Citypop3=105000
Year=1999 Taxyear=1999 Gain=. i=. 
_ERROR_=0 _N_=2
Values after adding GAIN to city populations:
  City1=  City2=  City3=  Citypop1=25000 
  Citypop2=65000 Citypop3=115000
Year=1999 Taxyear=1999 Gain=10000 i=4 
_ERROR_=0 _N_=2
Values at beginning of DATA step:
  City1=  City2=  City3=  Citypop1=25000 
  Citypop2=65000 Citypop3=115000
Year=1999 Taxyear=1999 Gain=. i=. 
_ERROR_=0 _N_=3
The first PUT statement is executed three times, whereas the second PUT statement is executed only twice. The DATA step ceases execution when the INPUT statement executes for the third time and reaches the end of the file.

Example 3: Selecting One Value from a Series of Observations

In this example, the data set ALLSCORES contains several observations for each identification number and variable ID. Different observations for a particular ID value might have different values of the variable GRADE. This example creates a new data set, CLASS.BESTSCORES, which contains one observation for each ID value. The observation must have the highest GRADE value of all observations for that ID in BESTSCORES.
libname class 'SAS-library';
proc sort data=class.allscores;
   by id;
run;
data class.bestscores;
   drop grade;
   set class.allscores;
   by id;
      /* Prevents HIGHEST from being reset*/
      /* to missing for each iteration.   */
   retain highest;
      /* Sets HIGHEST to missing for each */
      /* different ID value.              */
   if first.id then highest=.;
      /* Compares HIGHEST to GRADE in     */
      /* current iteration and resets     */
      /* value if GRADE is higher.        */
   highest=max(highest,grade);
   if last.id then output;
run;