SAS Institute. The Power to Know

Step-by-Step Programming with Base SAS(R) Software

space
Previous Page | Next Page

Starting with Raw Data: The Basics

Reading Unaligned Data


Understanding List Input

The simplest form of the INPUT statement uses list input. List input is used to read data values that are separated by a delimiter character (by default, a blank space). With list input, SAS reads a data value until it encounters a blank space. SAS assumes the value has ended and assigns the data to the appropriate variable in the program data vector. SAS continues to scan the record until it reaches a nonblank character again. SAS reads a data value until it encounters a blank space or the end of the input record.


Program: Basic List Input

This program uses the health and fitness club data from Introduction to DATA Step Processing to illustrate a DATA step that uses list input in an INPUT statement.

data club1;
   input IdNumber Name $ Team $ StartWeight EndWeight;3 
   datalines;1 
1023 David red 189 165 2 
1049 Amelia yellow 145 124 
1219 Alan red 210 192 
1246 Ravi yellow 194 177 
1078 Ashley red 127 118 
1221 Jim yellow 220 . 2 
; 1  

proc print data=club1;
   title 'Weight of Club Members';
run;

The following list corresponds to the numbered items in the preceding program:

[1] The DATALINES statement marks the beginning of the data lines. The semicolon that follows the data lines marks the end of the data lines and the end of the DATA step.

[2] Each data value in the raw data record is separated from the next by at least one blank space. The last record contains a missing value, represented by a period, for the value of EndWeight.

[3] The variable names in the INPUT statement are specified in exactly the same order as the fields in the raw data records.

The output that follows shows the resulting data set. The PROC PRINT statement that follows the DATA step produces this listing.

Data Set Created with List Input

                             Weight of Club Members                            1

                      Id                           Start      End
             Obs    Number    Name      Team      Weight    Weight

              1      1023     David     red         189       165 
              2      1049     Amelia    yellow      145       124 
              3      1219     Alan      red         210       192 
              4      1246     Ravi      yellow      194       177 
              5      1078     Ashley    red         127       118 
              6      1221     Jim       yellow      220         . 

Program: When the Data Is Delimited by Characters, Not Blanks

This program also uses the health and fitness club data but notice that here the data is delimited by a comma instead of a blank space, the default delimiter.

options pagesize=60 linesize=80 pageno=1 nodate;
data club1; 
   infile datalines2  dlm=','3 ;
   input IdNumber Name $ Team $ StartWeight EndWeight;
   datalines;
1023,David,red,189,1651  
1049,Amelia,yellow,145,124 
1219,Alan,red,210,192 
1246,Ravi,yellow,194,177 
1078,Ashley,red,127,118 
1221,Jim,yellow,220,. 
;  
proc print data=club1;
   title 'Weight of Club Members';
run;

The following list corresponds to the numbered items in the preceding output:

[1] These data values are separated by commas instead of blanks.

[2] List input, by default, scans the input records, looking for blank spaces to delimit each data value. The DLM= option enables list input to recognize a character, here a comma, as the delimiter.

[3] This example required the DLM= option, which is available only in the INFILE statement. Usually this statement is used only when the input data resides in an external file. The DATALINES specification, however, enables you to take advantage of INFILE statement options, when you are reading data records from the job stream.

Reading Data Delimited by Commas

                             Weight of Club Members                            1

                      Id                           Start      End
             Obs    Number    Name      Team      Weight    Weight

              1      1023     David     red         189       165 
              2      1049     Amelia    yellow      145       124 
              3      1219     Alan      red         210       192 
              4      1246     Ravi      yellow      194       177 
              5      1078     Ashley    red         127       118 
              6      1221     Jim       yellow      220         . 

List Input: Points to Remember

The points to remember when you use list input are:

  • Use list input when each field is separated by at least one blank space or delimiter.

  • Specify each field in the order that they appear in the records of raw data.

  • Represent missing values by a placeholder such as a period. (Under the default behavior, a blank field causes the variable names and values to become mismatched.)

  • Character values cannot contain embedded blanks.

  • The default length of character variables is eight bytes. SAS truncates a longer value when it writes the value to the program data vector. (To read a character variable that contains more than eight characters with list input, use a LENGTH statement. See Defining Enough Storage Space for Variables.)

  • Data must be in standard character or numeric format (that is, it can be read without an informat).

Note:   List input requires the fewest specifications in the INPUT statement. However, the restrictions that are placed on the data may require that you learn to use other styles of input to read your data. For example, column input, which is discussed in the next section, is less restrictive. This section has introduced only simple list input. See Understanding How to Make List Input More Flexible to learn about modified list input.  [cautionend]

space
Previous Page | Next Page | Top of Page