Previous Page | Next Page

Starting with Raw Data: Beyond the Basics

Creating Multiple Observations from a Single Record


Using the Double Trailing @ Line-Hold Specifier

Sometimes you may need to create multiple observations from a single record of raw data. One way to tell SAS how to read such a record is to use the other line-hold specifier, the double trailing at-sign (@@ or "double trailing @"). The double trailing @ not only prevents SAS from reading a new record into the input buffer when a new INPUT statement is encountered, but it also prevents the record from being released when the program returns to the top of the DATA step. (Remember that the trailing @ does not hold a record in the input buffer across iterations of the DATA step.)

For example, this DATA step uses the double trailing @ in the INPUT statement:

data body_fat;
   input Gender $ PercentFat @@;
   datalines; 
m 13.3 f 22    
m 22   f 23.2    
m 16   m 12    
;

proc print data=body_fat;
    title 'Results of Body Fat Testing';
run;

The following output shows the resulting data set:

Data Set Created with Double Trailing @

                          Results of Body Fat Testing                          1

                                             Percent
                            Obs    Gender      Fat

                             1       m         13.3 
                             2       f         22.0 
                             3       m         22.0 
                             4       f         23.2 
                             5       m         16.0 
                             6       m         12.0 

Understanding How the Double Trailing @ Affects DATA Step Execution

To understand how the data records in the previous example were read, look at the data lines that were used in the previous DATA step:

m 13.3 f 22    
m 22   f 23.2    
m 16   m 12    

Each record contains the raw data for two observations instead of one. Consider this example in terms of the flow of the DATA step, as explained in Introduction to DATA Step Processing.

When SAS reaches the end of the DATA step, it returns to the top of the program and begins the next iteration, executing until there are no more records to read. Each time it returns to the top of the DATA step and executes the INPUT statement, it automatically reads a new record into the input buffer. The second set of data values in each record, therefore, would never be read:

m 13.3 f 22
m 22   f 23.2
m 16   m 12

To allow the second set of data values in each record to be read, the double trailing @ tells SAS to hold the record in the input buffer. Each record is held in the input buffer until the end of the record is reached. The program does not automatically place the next record into the input buffer each time the INPUT statement is executed, and the current record is not automatically released when it returns to the top of the DATA step. As a result, the pointer location is maintained on the current record which enables the program to read each value in that record. Each time the DATA step completes an iteration, an observation is written to the data set.

The next five figures demonstrate what happens in the input buffer when a double trailing @ appears in the INPUT statement, as in this example:

input Gender $ PercentFat @@;

The first figure shows that all values in the program data vector are set to missing. The INPUT statement reads the first record into the input buffer. The program begins to read values from the current pointer location, which is the beginning of the input buffer.

First Iteration: First Record Is Read

[First Iteration: First Record Is Read]

The following figure shows that the value m is written to the program data vector. When the pointer reaches the blank space that follows 13.3, the complete value for the variable PercentFat has been read. The pointer stops in the next column, and the value 13.3 is written to the program data vector.

First Observation Is Created

[First Observation Is Created]

There are no other variables in the INPUT statement and no more statements in the DATA step, so three actions take place:
  1. The first observation is written to the data set.

  2. The DATA step begins its next iteration.

  3. The values in the program data vector are set to missing.

The following figure shows the current position of the pointer. SAS is ready to read the next piece of data in the same record.

Second Iteration: First Record Remains in the Input Buffer

[Second Iteration: First Record Remains in the Input Buffer]

The following figure shows that the INPUT statement reads the next two values from the input buffer and writes them to the program data vector.

Second Observation Is Created

[Second Observation Is Created]

When the DATA step completes the second iteration, the values in the program data vector are written to the data set as the second observation. Then the DATA step begins its third iteration. Values in the program data vector are set to missing, and the INPUT statement executes. The pointer, which is now at column 13 (two columns to the right of the last data value that was read), continues reading. Because this is list input, the pointer scans for the next nonblank character to begin reading the next value. When the pointer reaches the end of the input buffer and fails to find a nonblank character, SAS reads a new record into the input buffer.

The final figure shows that values for the third observation are read from the beginning of the second record.

Third Iteration: Second Record Is Read into the Input Buffer

[Third Iteration: Second Record Is Read into the Input Buffer]

The process continues until SAS reads all the records. The resulting SAS data set contains six observations instead of three.

Note:   Although this program successfully reads all of the data in the input records, SAS writes a message to the log noting that the program had to go to a new line.  [cautionend]

Previous Page | Next Page | Top of Page