Starting with Raw Data: The Basics |
Understanding How to Make List Input More Flexible |
While list input is the simplest to code, remember that it places restrictions on your data. By using format modifiers, you can take advantage of the simplicity of list input without the inconvenience of the usual restrictions. For example, you can use modified list input to do the following:
Create character variables that are longer than the default length of eight bytes.
Read numeric data with special characters like commas, dashes, and currency symbols.
Creating Longer Variables and Reading Numeric Data That Contains Special Characters |
By simply modifying list input with the colon format modifier (:) you can read
To use the colon format modifier with list input, place the colon between the variable name and the informat. As in simple list input, at least one blank (or other defined delimiter character) must separate each value from the next, and character values cannot contain embedded blanks (or other defined delimiter characters). Consider this DATA step:
data january_sales; input Item : $12. Amount : comma5.; datalines; Trucks 1,382 Vans 1,235 Sedans 2,391 SportUtility 987 ; proc print data=january_sales; title 'January Sales in Thousands'; run;
The variable Item has a length of 12, and the variable Amount requires an informat (in this case, COMMA5.) that removes commas from numbers so that they are read as valid numeric values. The data values are not aligned in columns as was required in the last example, which used formatted input to read the data.
The following output shows the resulting data set.
Data Set Created with Modified List Input (: comma5.)
January Sales in Thousands 1 Obs Item Amount 1 Trucks 1382 2 Vans 1235 3 Sedans 2391 4 SportUtility 987
Reading Character Data That Contains Embedded Blanks |
Because list input uses a blank space to determine where one value ends and the next one begins, values normally cannot contain blanks. However, with the ampersand format modifier (&) you can use list input to read data that contains single embedded blanks. The only restriction is that at least two blanks must divide each value from the next data value in the record.
To use the ampersand format modifier with list input, place the ampersand between the variable name and the informat. The following DATA step uses the ampersand format modifier with list input to create the data set CLUB2. Note that the data is not in fixed columns; therefore, column input is not appropriate.
data club2; input IdNumber Name & $18. Team $ StartWeight EndWeight; datalines; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 1221 Jim Brown yellow 220 . ; proc print data=club2; title 'Weight Club Members'; run;
The character variable Name, with a length of 18, contains members' first and last names separated by one blank space. The data lines must have two blank spaces between the values for the variable Name and the variable Team for the INPUT statement to correctly read the data.
The following output shows the resulting data set.
Data Set Created with Modified List Input (& $18.)
Weight Club Members 1 Id Start End Obs Number Name Team Weight Weight 1 1023 David Shaw red 189 165 2 1049 Amelia Serrano yellow 145 124 3 1219 Alan Nance red 210 192 4 1246 Ravi Sinha yellow 194 177 5 1078 Ashley McKnight red 127 118 6 1221 Jim Brown yellow 220 .
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.