Previous Page | Next Page

Starting with Raw Data: The Basics

Reading Data That Is Aligned in Columns


Understanding Column Input

With column input, data values occupy the same fields within each data record. When you use column input in the INPUT statement, list the variable names and specify column positions that identify the location of the corresponding data fields. You can use column input when your raw data is in fixed columns and does not require the use of informats to be read.


Program: Reading Data Aligned in Columns

The following program also uses the health and fitness club data, but now two more data values are missing. The data is aligned in columns and SAS reads the data with column input:

data club1;
   input IdNumber 1-4 Name $ 6-11 Team $ 13-18 StartWeight 20-22
         EndWeight 24-26;
   datalines; 
1023 David  red    189 165 
1049 Amelia yellow 145 
1219 Alan   red    210 192 
1246 Ravi   yellow     177 
1078 Ashley red    127 118 
1221 Jim    yellow 220 
;

proc print data=club1;
   title 'Weight Club Members'; 
run;

The specification that follows each variable name indicates the beginning and ending columns in which the variable value will be found. Note that with column input you are not required to indicate missing values with a placeholder such as a period.

The following output shows the resulting data set. Missing numeric values occur three times in the data set, and are indicated by periods.

Data Set Created with Column Input

                              Weight Club Members                              1

                      Id                           Start      End
             Obs    Number    Name      Team      Weight    Weight

              1      1023     David     red         189       165 
              2      1049     Amelia    yellow      145         . 
              3      1219     Alan      red         210       192 
              4      1246     Ravi      yellow        .       177 
              5      1078     Ashley    red         127       118 
              6      1221     Jim       yellow      220         . 

Understanding Some Advantages of Column Input over Simple List Input

Here are several advantages of using column input:


Reading Embedded Blanks and Creating Longer Variables

This DATA step uses column input to create a new data set named CLUB2. The program still uses the health and fitness club weight data. However, the data has been modified to include members' first and last names. Now the second data field in each record or raw data contains an embedded blank and is 18 bytes long.

data club2;
   input IdNumber 1-4 Name $ 6-23 Team $ 25-30 StartWeight 32-34
         EndWeight 36-38;
   datalines;
1023 David Shaw         red    189 165 
1049 Amelia Serrano     yellow 145 124
1219 Alan Nance         red    210 192 
1246 Ravi Sinha         yellow 194 177 
1078 Ashley McKnight    red    127 118 
1221 Jim Brown          yellow 220 
;  

proc print data=club2;
   title 'Weight Club Members'; 
run;

The following output shows the resulting data set.

Data Set Created with Column Input (Embedded Blanks)

                              Weight Club Members                              1

                  Id                                    Start      End
         Obs    Number    Name               Team      Weight    Weight

          1      1023     David Shaw         red         189       165 
          2      1049     Amelia Serrano     yellow      145       124 
          3      1219     Alan Nance         red         210       192 
          4      1246     Ravi Sinha         yellow      194       177 
          5      1078     Ashley McKnight    red         127       118 
          6      1221     Jim Brown          yellow      220         . 

Program: Skipping Fields When Reading Data Records

Column input also enables you to skip over fields or to read the fields in any order. This example uses column input to read the same health and fitness club data, but it reads the value for the variable Team first and omits the variable IdNumber altogether.

You can read or reread part of a value when using column input. For example, because the team names begin with different letters, this program saves storage space by reading only the first character in the field that contains the team name. Note the INPUT statement:

data club2;
   input Team $ 25 Name $ 6-23 StartWeight 32-34 EndWeight 36-38;
   datalines; 
1023 David Shaw         red    189 165  
1049 Amelia Serrano     yellow 145 124  
1219 Alan Nance         red    210 192  
1246 Ravi Sinha         yellow 194 177  
1078 Ashley McKnight    red    127 118  
1221 Jim Brown          yellow 220 
;

proc print data=club2;
   title 'Weight Club Members'; 
run;

The following output shows the resulting data set. The variable that contains the identification number is no longer in the data set. Instead, Team is the first variable in the new data set, and it contains only one character to represent the team value.

Data Set Created with Column Input (Skipping Fields)

                              Weight Club Members                              1

                                                  Start      End
               Obs    Team    Name               Weight    Weight

                1      r      David Shaw           189       165 
                2      y      Amelia Serrano       145       124 
                3      r      Alan Nance           210       192 
                4      y      Ravi Sinha           194       177 
                5      r      Ashley McKnight      127       118 
                6      y      Jim Brown            220         . 

Column Input: Points to Remember

Remember the following rules when you use column input:

Previous Page | Next Page | Top of Page