Starting with Raw Data: The Basics |
Understanding Column Input |
With column input, data values occupy the same fields within each data record. When you use column input in the INPUT statement, list the variable names and specify column positions that identify the location of the corresponding data fields. You can use column input when your raw data is in fixed columns and does not require the use of informats to be read.
Program: Reading Data Aligned in Columns |
The following program also uses the health and fitness club data, but now two more data values are missing. The data is aligned in columns and SAS reads the data with column input:
data club1; input IdNumber 1-4 Name $ 6-11 Team $ 13-18 StartWeight 20-22 EndWeight 24-26; datalines; 1023 David red 189 165 1049 Amelia yellow 145 1219 Alan red 210 192 1246 Ravi yellow 177 1078 Ashley red 127 118 1221 Jim yellow 220 ; proc print data=club1; title 'Weight Club Members'; run;
The specification that follows each variable name indicates the beginning and ending columns in which the variable value will be found. Note that with column input you are not required to indicate missing values with a placeholder such as a period.
The following output shows the resulting data set. Missing numeric values occur three times in the data set, and are indicated by periods.
Data Set Created with Column Input
Weight Club Members 1 Id Start End Obs Number Name Team Weight Weight 1 1023 David red 189 165 2 1049 Amelia yellow 145 . 3 1219 Alan red 210 192 4 1246 Ravi yellow . 177 5 1078 Ashley red 127 118 6 1221 Jim yellow 220 .
Understanding Some Advantages of Column Input over Simple List Input |
Here are several advantages of using column input:
With column input, character variables can contain embedded blanks.
Column input also enables the creation of variables that are longer than eight bytes. In the preceding example, the variable Name in the data set CLUB1 contains only the members' first names. By using column input, you can read the first and last names as a single value. These differences between input styles are possible for two reasons:
Column input enables you to skip some data fields when reading records of raw data. It also enables you to read the data fields in any order and reread some fields or parts of fields.
Reading Embedded Blanks and Creating Longer Variables |
This DATA step uses column input to create a new data set named CLUB2. The program still uses the health and fitness club weight data. However, the data has been modified to include members' first and last names. Now the second data field in each record or raw data contains an embedded blank and is 18 bytes long.
data club2; input IdNumber 1-4 Name $ 6-23 Team $ 25-30 StartWeight 32-34 EndWeight 36-38; datalines; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 1221 Jim Brown yellow 220 ; proc print data=club2; title 'Weight Club Members'; run;
The following output shows the resulting data set.
Data Set Created with Column Input (Embedded Blanks)
Weight Club Members 1 Id Start End Obs Number Name Team Weight Weight 1 1023 David Shaw red 189 165 2 1049 Amelia Serrano yellow 145 124 3 1219 Alan Nance red 210 192 4 1246 Ravi Sinha yellow 194 177 5 1078 Ashley McKnight red 127 118 6 1221 Jim Brown yellow 220 .
Program: Skipping Fields When Reading Data Records |
Column input also enables you to skip over fields or to read the fields in any order. This example uses column input to read the same health and fitness club data, but it reads the value for the variable Team first and omits the variable IdNumber altogether.
You can read or reread part of a value when using column input. For example, because the team names begin with different letters, this program saves storage space by reading only the first character in the field that contains the team name. Note the INPUT statement:
data club2; input Team $ 25 Name $ 6-23 StartWeight 32-34 EndWeight 36-38; datalines; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 1221 Jim Brown yellow 220 ; proc print data=club2; title 'Weight Club Members'; run;
The following output shows the resulting data set. The variable that contains the identification number is no longer in the data set. Instead, Team is the first variable in the new data set, and it contains only one character to represent the team value.
Data Set Created with Column Input (Skipping Fields)
Weight Club Members 1 Start End Obs Team Name Weight Weight 1 r David Shaw 189 165 2 y Amelia Serrano 145 124 3 r Alan Nance 210 192 4 y Ravi Sinha 194 177 5 r Ashley McKnight 127 118 6 y Jim Brown 220 .
Column Input: Points to Remember |
Remember the following rules when you use column input:
Character variables can be up to 32,767 bytes (32KB) in length and are not limited to the default length of eight bytes.
A placeholder is not required to indicate a missing data value. A blank field is read as missing and does not cause other values to be read incorrectly.
You can read standard character and numeric data only. Informats are ignored.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.