Using the INPUT Statement

Once you have referenced the data file that contains your data with an INFILE statement, you need to tell IML the following information about how the data are arranged:

  • the number of variables and their names

  • each variable’s type, either numeric or character

  • the format of each variable’s values

  • the columns that correspond to each variable

In other words, you must tell IML how to read the data.

The INPUT statement describes the arrangement of values in an input record. The INPUT statement reads records from a file specified in the previously executed INFILE statement, reading the values into IML variables.

There are two ways to describe a record’s values in an IML INPUT statement:

  • list (or scanning) input

  • formatted input

Following are several examples of valid INPUT statements for the class data file, depending, of course, on how the data are stored.

If the data are stored with a blank or a comma between fields, then list input can be used. For example, the INPUT statement for the class data file might look as follows:

   infile inclass;
   input name $ sex $ age height weight;

These statements tell IML the following:

  • There are five variables: NAME, SEX, AGE, HEIGHT and WEIGHT.

  • Data fields are separated by commas or blanks.

  • NAME and SEX are character variables, as indicated by the dollar sign ($).

  • AGE, HEIGHT, and WEIGHT are numeric variables, the default.

The data must be stored in the same order in which the variables are listed in the INPUT statement. Otherwise, you can use formatted input, which is column specific. Formatted input is the most flexible and can handle any data file. Your INPUT statement for the class data file might look as follows:

   infile inclass;
   input @1 name $char8. @10 sex $char1. @15 age 2.0
      @20 height 4.1 @25 weight 5.1;

These statements tell IML the following:

  • NAME is a character variable; its value begins in column 1 (indicated by @1) and occupies eight columns ($CHAR8.).

  • SEX is a character variable; its value is found in column 10 ($CHAR1.).

  • AGE is a numeric variable; its value is found in columns 15 and 16 and has no decimal places (2.0).

  • HEIGHT is a numeric variable found in columns 20 through 23 with one decimal place implied (4.1).

  • WEIGHT is a numeric variable found in columns 25 through 29 with one decimal place implied (5.1).

The next sections discuss these two modes of input.

List Input

If your data are recorded with a comma or one or more blanks between data fields, you can use list input to read your data. If you have missing values—that is, unknown values—they must be represented by a period (.) rather than a blank field.

When IML looks for a value, it skips past blanks and tab characters. Then it scans for a delimiter to the value. The delimiter is a blank, a comma, or the end of the record. When the ampersand (&) format modifier is used, IML looks for two blanks, a comma, or the end of the record.

The general form of the INPUT statement for list input is as follows:

INPUT variable <$> <&> <…variable <$> > <&> > ;

where

variable

names the variable to be read by the INPUT statement.

$

indicates that the preceding variable is character.

&

indicates that a character value can have a single embedded blank. Because a blank normally indicates the end of a data value, use the ampersand format modifier to indicate the end of the value with at least two blanks or a comma.

With list input, IML scans the input lines for values. Consider using list input in the following cases:

  • when blanks or commas separate input values

  • when periods rather than blanks represent missing values

List input is the default in several situations. Descriptions of these situations and the behavior of IML follow:

  • If no input format is specified for a variable, IML scans for a number.

  • If a single dollar sign or ampersand format modifier is specified, IML scans for a character value. The ampersand format modifier enables single embedded blanks to occur.

  • If a format is given with width unspecified or zero, IML scans for the first blank or comma.

If the end of a record is encountered before IML finds a value, then the behavior is as described by the record overflow options in the INFILE statement discussed in the section Using the INFILE Statement.

When you read with list input, the order of the variables listed in the INPUT statement must agree with the order of the values in the data file. For example, consider the following data:

   Alice    f    10   61   97
   Beth     f    11   64   105
   Bill     m    12   63   110

You can use list input to read these data by specifying the following INPUT statement:

   input name $ sex $ age height weight;

Note: This statement implies that the variables are stored in the order given. That is, each line of data contains a student’s name, sex, age, height, and weight in that order and separated by at least one blank or by a comma.

Formatted Input

The alternative to list input is formatted input. An INPUT statement reading formatted input must have a SAS informat after each variable. An informat gives the data type and field width of an input value. Formatted input can be used with pointer controls and format modifiers. Note, however, that neither pointer controls nor format modifiers are necessary for formatted input.

Pointer Control Features

Pointer controls reset the pointer’s column and line positions and tell the INPUT statement where to go to read the data value. You use pointer controls to specify the columns and lines from which you want to read:

  • Column pointer controls move the pointer to the column you specify.

  • Line pointer controls move the pointer to the next line.

  • Line hold controls keep the pointer on the current input line.

  • Binary file indicator controls indicate that the input line is from a binary file.

Column Pointer Controls

Column pointer controls indicate in which column an input value starts. Column pointer controls begin with either an at sign (@) or a plus sign (+). A complete list follows:

@$n$

moves the pointer to column $n$.

@point-variable

moves the pointer to the column given by the current value of point-variable.

@(expression)

moves the pointer to the column given by the value of the expression. The expression must evaluate to a positive integer.

$+n$

moves the pointer $n$ columns.

$+$point-variable

moves the pointer the number of columns given by the value of point-variable.

$+$(expression)

moves the pointer the number of columns given by the value of expression. The value of expression can be positive or negative.

Here are some examples of using column pointer controls:

Example

 

Meaning

@12

 

go to column 12

@N

 

go to the column given by the value of N

@(N$-$1)

 

go to the column given by the value of N$-$1

+5

 

skip 5 spaces

+N

 

skip N spaces

+(N+1)

 

skip N+1 spaces

In the earlier example that used formatted input, you used several pointer controls. Here are the statements:

   infile inclass;
   input @1 name $char8. @10 sex $char1. @15 age 2.0
         @20 height 4.1 @25 weight 5.1;

The @1 moves the pointer to column 1, the @10 moves it to column 10, and so on. You move the pointer to the column where the data field begins and then supply an informat specifying how many columns the variable occupies. The INPUT statement could also be written as follows:

   input @1 name $char8. +1 sex $char1. +4 age 2. +3 height 4.1
         +1 weight 5.1;

In this form, you move the pointer to column 1 (@1) and read eight columns. The pointer is now at column 9. Now, move the pointer +1 columns to column 10 to read SEX. The $char1. informat says to read a character variable occupying one column. After you read the value for SEX, the pointer is at column 11, so move it to column 15 with +4 and read AGE in columns 15 and 16 (the 2. informat). The pointer is now at column 17, so move +3 columns and read HEIGHT. The same idea applies for reading WEIGHT.

Line Pointer Control

The line pointer control (/) directs IML to skip to the next line of input. You need a line pointer control when a record of data takes more than one line. You use the new line pointer control (/) to skip to the next line and continue reading data. In the example reading the class data, you do not need to skip a line because each line of data contains all the variables for a student.

Line Hold Control

The trailing at sign (@), when at the end of an INPUT statement, directs IML to hold the pointer on the current record so that you can read more data with subsequent INPUT statements. You can use it to read several records from a single line of data. Sometimes, when a record is very short—say, 10 columns or so—you can save space in your external file by coding several records on the same line.

Binary File Indicator Controls

When the external file you want to read is a binary file (RECFM=N is specified in the INFILE statement), you must tell IML how to read the values by using the following binary file indicator controls:

>$n$

start reading the next record at the byte position $n$ in the file.

>point-variable

start reading the next record at the byte position in the file given by point-variable.

>(expression)

start reading the next record at the byte position in the file given by expression.

<$n$

read the number of bytes indicated by the value of $n$.

<point-variable

read the number of bytes indicated by the value of point-variable.

<(expression)

read the number of bytes indicated by the value of expression.

Pattern Searching

You can have the input mechanism search for patterns of text by using the at sign (@) with a character operand. IML starts searching at the current position, advances until it finds the pattern, and leaves the pointer at the position immediately after the found pattern in the input record. For example, the following statement searches for the pattern NAME= and then uses list input to read the value after the found pattern:

   input @ 'NAME=' name $;

If the pattern is not found, then the pointer is left past the end of the record, and the rest of the INPUT statement follows the conventions based on the options MISSOVER, STOPOVER, and FLOWOVER described in the section Using the INFILE Statement. If you use pattern searching, you usually specify the MISSOVER option so that you can control for the occurrences of the pattern not being found.

Notice that the MISSOVER feature enables you to search for a variety of items in the same record, even if some of them are not found. For example, the following statements are able to read in the ADDR variable even if NAME= is not found (in which case, NAME is unvalued):

   infile in1 missover;
   input @1 @ "NAME=" name $
         @1 @ "ADDR=" addr &
         @1 @ "PHONE=" phone $;

The pattern operand can use any characters except for the following:

%

$

[ ]

{ }

<

>

$-$

?

*

#

@

^

\`{~ }

(backquote)

Record Directives

Each INPUT statement goes to a new record except in the following special cases:

  • An at sign (@) at the end of an INPUT statement specifies that the record is to be held for future INPUT statements.

  • Binary files (RECFM=N) always hold their records until the > directive.

As discussed in the syntax of the INPUT statement, the line pointer operator (/) instructs the input mechanism to go immediately to the next record. For binary (RECFM=N) files, the > directive is used instead of the /.

Blanks

For character values, the informat determines the way blanks are interpreted. For example, the $CHAR$w$. format reads blanks as part of the whole value, while the BZ$w$. format turns blanks into zeros. See SAS Language Reference: Dictionary for more information about informats.

Missing Values

Missing values in formatted input are represented by blanks or a single period for a numeric value and by blanks for a character value.

Matrix Use

Data values are either character or numeric. Input variables always result in scalar (one row by one column) values with type (character or numeric) and length determined by the input format.

End-of-File Condition

End of file is the condition of trying to read a record when there are no more records to read from the file. The consequences of an end-of-file condition are described as follows.

  • All the variables in the INPUT statement that encountered end of file are freed of their values. You can use the NROW or NCOL function to test if this has happened.

  • If end of file occurs inside a DO DATA loop, execution is passed to the statement after the END statement in the loop.

For text files, end of file is encountered first as the end of the last record. The next time input is attempted, the end-of-file condition is raised.

For binary files, end of file can result in the input mechanism returning a record that is shorter than the requested length. In this case IML still attempts to process the record, using the rules described in the section Using the INFILE Statement.

The DO DATA mechanism provides a convenient mechanism for handling end of file.

For example, to read the class data from the external file USER.TEXT.CLASS into a SAS data set, you need to perform the following steps:

  1. Establish a fileref referencing the data file.

  2. Use an INFILE statement to open the file for input.

  3. Initialize any character variables by setting the length.

  4. Create a new SAS data set with a CREATE statement. You want to list the variables you plan to input in a VAR clause.

  5. Use a DO DATA loop to read the data one line at a time.

  6. Write an INPUT statement telling IML how to read the data.

  7. Use an APPEND statement to add the new data line to the end of the new SAS data set.

  8. End the DO DATA loop.

  9. Close the new data set.

  10. Close the external file with a CLOSEFILE statement.

Your statements should look as follows:

   filename inclass 'user.text.class';
   infile inclass missover;
   name="12345678";
   sex="1";
   create class  var{name sex age height weight};
   do data;
      input name $ sex $ age height weight;
      append;
   end;
   close class;
   closefile inclass;

Note that the APPEND statement is not executed if the INPUT statement reads past the end of file since IML escapes the loop immediately when the condition is encountered.

Differences with the SAS DATA Step

If you are familiar with the SAS DATA step, you will notice that the following features are supported differently or are not supported in IML:

  • The pound sign (#) directive supporting multiple current records is not supported.

  • Grouping parentheses are not supported.

  • The colon (:) format modifier is not supported.

  • The byte operands (< and >) are new features supporting binary files.

  • The ampersand (&) format modifier causes IML to stop reading data if a comma is encountered. Use of the ampersand format modifier is valid with list input only.

  • The RECFM=F option is not supported.