Previous Page | Next Page

Starting with Raw Data: The Basics

Reading Data That Requires Special Instructions


Understanding Formatted Input

Sometimes the INPUT statement requires special instructions to read the data correctly. For example, SAS can read numeric data that is in special formats such as binary, packed decimal, or date/time. SAS can also read numeric values that contain special characters such as commas and currency symbols. In these situations, use formatted input. Formatted input combines the features of column input with the ability to read nonstandard numeric or character values. The following data shows formatted input:


Program: Reading Data That Requires Special Instructions

The data in this program includes numeric values that contain a comma, which is an invalid character for a numeric variable:

data january_sales;
   input Item $ 1-16 Amount comma5.;
   datalines; 
trucks          1,382 
vans            1,235 
sedans          2,391 
;  

proc print data=january_sales;
   title 'January Sales in Thousands'; 
run;

The INPUT statement cannot read the values for the variable Amount as valid numeric values without the additional instructions provided by an informat. The informat COMMA5. enables the INPUT statement to read and store this data as a valid numeric value.

The following figure shows that the informat COMMA5. instructs the program to read five characters of data (the comma counts as part of the length of the data), to remove the comma from the data, and to write the resulting numeric value to the program data vector. Note that the name of an informat always ends in a period (.).

Reading a Value with an Informat

[Reading a Value with an Informat]

The following figure shows that the data values are read into the input buffer exactly as they occur in the raw data records, but they are written to the program data vector (and then to the data set as an observation) as valid numeric values without any special characters.

Input Value Compared to Variable Value

[Input Value Compared to Variable Value]

The following output shows the resulting data set. The values for Amount contain only numbers. Note that the commas are removed.

Data Set Created with Column and Formatted Input

                           January Sales in Thousands                          1

                            Obs     Item     Amount

                             1     trucks     1382 
                             2     vans       1235 
                             3     sedans     2391 

In a report, you might want to include the comma in numeric values to improve readability. Just as the informat gives instructions on how to read a value and to remove the comma, a format gives instructions to add characters to variable values in the output. See Writing Output without Creating a Data Set for an example.


Understanding How to Control the Position of the Pointer

As the INPUT statement reads data values, it uses an input pointer to keep track of the position of the data in the input buffer. Column-pointer controls provide additional control over pointer movement and are especially useful with formatted input. Column-pointer controls tell how far to advance the pointer before SAS reads the next value. In this example, SAS reads data lines with a combination of column and formatted input:

data january_sales;
   input Item $ 1-16 Amount comma5.;
   datalines; 
trucks          1,382 
vans            1,235
sedans          2,391 
;  

In the next example, SAS reads data lines by using formatted input with a column-pointer control:

data january_sales;
   input Item $10. @17 Amount comma5.;
   datalines;
trucks          1,382  
vans            1,235  
sedans          2,391 
;

After SAS reads the first value for the variable Item, the pointer is left in the next position, column 11. The absolute column-pointer control, @17, then directs the pointer to move to column 17 in the input buffer. Now, it is in the correct position to read a value for the variable Amount.

In the following program, the relative column-pointer control, +6, instructs the pointer to move six columns to the right before SAS reads the next data value.

data january_sales;
   input Item $10. +6 Amount comma5.;
   datalines;    
trucks          1,382    
vans            1,235    
sedans          2,391
;

The data in these two programs is aligned in columns. As with column input, you instruct the pointer to move from field to field. With column input you use column specifications; with formatted input you use the length that is specified in the informat together with pointer controls.


Formatted Input: Points to Remember

Remember the following rules when you use formatted input:

Previous Page | Next Page | Top of Page