Previous Page | Next Page

Finding Shortcuts in Programming

Performing the Same Action for a Series of Variables


Using a Series of IF-THEN statements

In the data set MYLIB.ATTRACTIONS, the variables Museums, Galleries, and Other contain missing values when the tour does not feature that kind of attraction. To change the missing values to 0, you can write a series of IF-THEN statements with assignment statements, as the following program illustrates:

   /* same action for different variables */
data changes;
   set mylib.attractions;
   if Museums = . then Museums = 0;
   if Galleries = . then Galleries = 0;
   if Other = . then Other = 0;
run;

The pattern of action is the same in the three IF-THEN statements; only the variable name is different. To make the program easier to read, you can write SAS statements that perform the same action several times, changing only the variable that is affected. This technique is called array processing, and consists of the following three steps:

  1. grouping variables into arrays

  2. repeating the action

  3. selecting the current variable to be acted upon


Grouping Variables into Arrays

In DATA step programming you can put variables into a temporary group called an array. To define an array, use an ARRAY statement. A simple ARRAY statement has the following form:

ARRAY array-name{number-of-variables} variable-1 < . . . variable-n>;

The array-name is a SAS name that you choose to identify the group of variables. The number-of-variables, enclosed in braces, tells SAS how many variables you are grouping, and variable-1< . . . variable-n> lists their names.

Note:    If you have worked with arrays in other programming languages, note that arrays in SAS are different from those in many other languages. In SAS, an array is simply a convenient way of temporarily identifying a group of variables by assigning an alias to them. It is not a permanent data structure; it exists only for the duration of the DATA step. The array-name identifies the array and distinguishes it from any other arrays in the same DATA step; it is not a variable.  [cautionend]

The following ARRAY statement lists the three variables Museums, Galleries, and Other:

array changelist{3} Museums Galleries Other;

This statement tells SAS to do the following:

In addition, by listing a variable in an ARRAY statement, you assign the variable an extra name with the form array-name {position}, where position is the position of the variable in the list (1, 2, or 3 in this case). The position can be a number, or the name of a variable whose value is the number. This additional name is called an array reference, and the position is called the subscript. The previous ARRAY statement assigns to Museums the array reference CHANGELIST{1}; Galleries, CHANGELIST{2}; and Other, CHANGELIST{3}. From that point in the DATA step, you can refer to the variable by either its original name or by its array reference. For example, the names Museums and CHANGELIST{1} are equivalent.


Repeating the Action

To tell SAS to perform the same action several times, use an iterative DO loop of the following form:

DO index-variable=1 TO number-of-variables-in-array;
...SAS statements...

END;

An iterative DO loop begins with an iterative DO statement, contains other SAS statements, and ends with an END statement. The loop is processed repeatedly (iterated) according to the directions in the iterative DO statement. The iterative DO statement contains an index-variable whose name you choose and whose value changes in each iteration of the loop. In array processing, you usually want the loop to execute as many times as there are variables in the array; therefore, you specify that the values of index-variable are 1 TO number-of-variables-in-array. By default, SAS increases the value of index-variable by 1 before each new iteration of the loop. When the value becomes greater than number-of-variables-in-array, SAS stops processing the loop. By default, SAS adds the index variable to the data set that is being created.

An iterative DO loop that processes three times and has an index variable named Count looks like this:

do Count = 1 to 3;
   SAS statements
end;

The first time the loop is processed, the value of Count is 1; the second time, the value is 2; and the third time, the value is 3. At the beginning of the fourth execution, the value of Count is 4, exceeding the specified range of 1 TO 3. SAS stops processing the loop.


Selecting the Current Variable

Now that you have grouped the variables and you know how many times the loop will be processed, you must tell SAS which variable in the array to use in each iteration of the loop. Recall that variables in an array can be identified by their array references, and that the subscript of the reference can be a variable name as well as a number. Therefore, you can write programming statements in which the index variable of the DO loop is the subscript of the array reference:

array-name {index-variable}

When the value of the index variable changes, the subscript of the array reference (and, therefore, the variable that is referenced) also changes.

The following statement uses the index variable Count as the subscript of array references:

if changelist{Count} = . then changelist{Count} = 0;

You can place this statement inside an iterative DO loop. When the value of Count is 1, SAS reads the array reference as CHANGELIST{1} and processes the IF-THEN statement on CHANGELIST{1}, that is, Museums. When Count has the value 2 or 3, SAS processes the statement on CHANGELIST{2}, Galleries, or CHANGELIST{3}, Other. The complete iterative DO loop with array references looks like this:

do Count = 1 to 3;
   if changelist{Count} = . then changelist{Count} = 0;
end;

These statements tell SAS to do the following:

The following DATA step uses the ARRAY statement and iterative DO loop:

options pagesize=60 linesize=80 pageno=1 nodate;
data changes;
   set mylib.attractions;
   array changelist{3} Museums Galleries Other;
   do Count = 1 to 3;
      if changelist{Count} = . then changelist{Count} = 0;
   end;
run;

proc print data=changes;
   title 'Tour Attractions';
run;

The following output displays the results:

Using an Array and an Iterative DO Loop to Produce a Data Set

                                Tour Attractions                               1

                                                   Tour        Years
  Obs   City        Museums   Galleries   Other    Guide    Experience   Count

   1    Rome           4          3         0     D'Amico        2         4  
   2    Paris          5          0         1     Lucas          5         4  
   3    London         3          2         0     Wilson         3         4  
   4    New York       5          1         2     Lucas          5         4  
   5    Madrid         0          0         5     Torres         4         4  
   6    Amsterdam      3          3         0                    .         4  

The data set CHANGES shows that the missing values for the variables Museums, Galleries, and Other are now zero. In addition, the data set contains the variable Count with the value 4 (the value that caused processing of the loop to cease in each observation). To exclude Count from the data set, use a DROP= data set option:

options pagesize=60 linesize=80 pageno=1 nodate;
data changes2 (drop=Count);
   set mylib.attractions;
   array changelist{3} Museums Galleries Other;
   do Count = 1 to 3;
      if changelist{Count} = . then changelist{count} = 0;
   end;
run;

proc print data=changes2;
   title 'Tour Attractions';
run;

The following output displays the results:

Dropping the Index Variable from a Data Set

                                Tour Attractions                               1

                                                         Tour         Years
   Obs    City         Museums    Galleries    Other     Guide     Experience

    1     Rome            4           3          0      D'Amico         2    
    2     Paris           5           0          1      Lucas           5    
    3     London          3           2          0      Wilson          3    
    4     New York        5           1          2      Lucas           5    
    5     Madrid          0           0          5      Torres          4    
    6     Amsterdam       3           3          0                      .    

Previous Page | Next Page | Top of Page