Starting with Raw Data: The Basics |
Understanding Formatted Input |
Sometimes the INPUT statement requires special instructions to read the data correctly. For example, SAS can read numeric data that is in special formats such as binary, packed decimal, or date/time. SAS can also read numeric values that contain special characters such as commas and currency symbols. In these situations, use formatted input. Formatted input combines the features of column input with the ability to read nonstandard numeric or character values. The following data shows formatted input:
Program: Reading Data That Requires Special Instructions |
The data in this program includes numeric values that contain a comma, which is an invalid character for a numeric variable:
data january_sales; input Item $ 1-16 Amount comma5.; datalines; trucks 1,382 vans 1,235 sedans 2,391 ; proc print data=january_sales; title 'January Sales in Thousands'; run;
The INPUT statement cannot read the values for the variable Amount as valid numeric values without the additional instructions provided by an informat. The informat COMMA5. enables the INPUT statement to read and store this data as a valid numeric value.
The following figure shows that the informat COMMA5. instructs the program to read five characters of data (the comma counts as part of the length of the data), to remove the comma from the data, and to write the resulting numeric value to the program data vector. Note that the name of an informat always ends in a period (.).
Reading a Value with an Informat
The following figure shows that the data values are read into the input buffer exactly as they occur in the raw data records, but they are written to the program data vector (and then to the data set as an observation) as valid numeric values without any special characters.
Input Value Compared to Variable Value
The following output shows the resulting data set. The values for Amount contain only numbers. Note that the commas are removed.
Data Set Created with Column and Formatted Input
January Sales in Thousands 1 Obs Item Amount 1 trucks 1382 2 vans 1235 3 sedans 2391
In a report, you might want to include the comma in numeric values to improve readability. Just as the informat gives instructions on how to read a value and to remove the comma, a format gives instructions to add characters to variable values in the output. See Writing Output without Creating a Data Set for an example.
Understanding How to Control the Position of the Pointer |
As the INPUT statement reads data values, it uses an input pointer to keep track of the position of the data in the input buffer. Column-pointer controls provide additional control over pointer movement and are especially useful with formatted input. Column-pointer controls tell how far to advance the pointer before SAS reads the next value. In this example, SAS reads data lines with a combination of column and formatted input:
data january_sales; input Item $ 1-16 Amount comma5.; datalines; trucks 1,382 vans 1,235 sedans 2,391 ;
In the next example, SAS reads data lines by using formatted input with a column-pointer control:
data january_sales; input Item $10. @17 Amount comma5.; datalines; trucks 1,382 vans 1,235 sedans 2,391 ;
After SAS reads the first value for the variable Item, the pointer is left in the next position, column 11. The absolute column-pointer control, @17, then directs the pointer to move to column 17 in the input buffer. Now, it is in the correct position to read a value for the variable Amount.
In the following program, the relative column-pointer control, +6, instructs the pointer to move six columns to the right before SAS reads the next data value.
data january_sales; input Item $10. +6 Amount comma5.; datalines; trucks 1,382 vans 1,235 sedans 2,391 ;
The data in these two programs is aligned in columns. As with column input, you instruct the pointer to move from field to field. With column input you use column specifications; with formatted input you use the length that is specified in the informat together with pointer controls.
Formatted Input: Points to Remember |
Remember the following rules when you use formatted input:
SAS reads formatted input data until it has read the number of columns that the informat indicates. This method of reading the data is different from list input, which reads until a blank space (or other defined delimiter character) is reached.
You can position the pointer to read the next value by using pointer controls.
You can read data stored in nonstandard form such as packed decimal, or data that contains commas.
You have the flexibility of using informats with all the features of column input, as described in Column Input: Points to Remember.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.