File Access

Using the INPUT Statement

Once you have referenced the data file containing your data with an INFILE statement, you need to tell IML the following information about how the data are arranged:

In other words, you must tell IML how to read the data.

The INPUT statement describes the arrangement of values in an input record. The INPUT statement reads records from a file specified in the previously executed INFILE statement, reading the values into IML variables.

There are two ways to describe a record's values in an IML INPUT statement:

Following are several examples of valid INPUT statements for the class data file, depending, of course, on how the data are stored.

If the data are stored with a blank or a comma between fields, then list input can be used. For example, the INPUT statement for the class data file might look as follows:

  
    infile inclass; 
    input name $ sex $ age height weight;
 

These statements tell IML the following:

The data must be stored in the same order in which the variables are listed in the INPUT statement. Otherwise, you can use formatted input, which is column specific. Formatted input is the most flexible and can handle any data file. Your INPUT statement for the class data file might look as follows:
  
    infile inclass; 
    input @1 name $char8. @10 sex $char1. @15 age 2.0 
       @20 height 4.1 @25 weight 5.1;
 
These statements tell IML the following: The next sections discuss these two modes of input.

List Input

If your data are recorded with a comma or one or more blanks between data fields, you can use list input to read your data. If you have missing values - that is, unknown values - they must be represented by a period (.) rather than a blank field.

When IML looks for a value, it skips past blanks and tab characters. Then it scans for a delimiter to the value. The delimiter is a blank, a comma, or the end of the record. When the ampersand (&) format modifier is used, IML looks for two blanks, a comma, or the end of the record.

The general form of the INPUT statement for list input is as follows:

INPUT variable < $ > < & > \lt  ... variable < $ > < & > > ;

where


variable
names the variable to be read by the INPUT statement.

$
indicates that the preceding variable is character.

&
indicates that a character value can have a single embedded blank. Because a blank normally indicates the end of a data value, use the ampersand format modifier to indicate the end of the value with at least two blanks or a comma.

With list input, IML scans the input lines for values. Consider using list input in the following cases:

List input is the default in several situations. Descriptions of these situations and the behavior of IML follow: If the end of a record is encountered before IML finds a value, then the behavior is as described by the record overflow options in the INFILE statement discussed in the section "Using the INFILE Statement".

When you read with list input, the order of the variables listed in the INPUT statement must agree with the order of the values in the data file. For example, consider the following data:
  
    Alice    f    10   61   97 
    Beth     f    11   64   105 
    Bill     m    12   63   110
 
You can use list input to read these data by specifying the following INPUT statement:
  
    input name $ sex $ age height weight;
 
Note: This statement implies that the variables are stored in the order given. That is, each line of data contains a student's name, sex, age, height, and weight in that order and separated by at least one blank or by a comma.

Formatted Input

The alternative to list input is formatted input. An INPUT statement reading formatted input must have a SAS informat after each variable. An informat gives the data type and field width of an input value. Formatted input can be used with pointer controls and format modifiers. Note, however, that neither pointer controls nor format modifiers are necessary for formatted input.

Pointer Control Features



Pointer controls reset the pointer's column and line positions and tell the INPUT statement where to go to read the data value. You use pointer controls to specify the columns and lines from which you want to read:



Column Pointer Controls



Column pointer controls indicate in which column an input value starts. Column pointer controls begin with either an at sign (@) or a plus sign (+). A complete list follows:
@n
moves the pointer to column n.

@point-variable
moves the pointer to the column given by the current value of point-variable.

@(expression)
moves the pointer to the column given by the value of the expression. The expression must evaluate to a positive integer.

+n
moves the pointer n columns.

+point-variable
moves the pointer the number of columns given by the value of point-variable.

+(expression)
moves the pointer the number of columns given by the value of expression. The value of expression can be positive or negative.
Here are some examples of using column pointer controls:

Example   Meaning
@12 go to column 12
@N go to the column given by the value of N
@(N-1) go to the column given by the value of N-1
+5 skip 5 spaces
+N skip N spaces
+(N+1) skip N+1 spaces


In the earlier example that used formatted input, you used several pointer controls. Here are the statements:
  
    infile inclass; 
    input @1 name $char8. @10 sex $char1. @15 age 2.0 
          @20 height 4.1 @25 weight 5.1;
 
The @1 moves the pointer to column 1, the @10 moves it to column 10, and so on. You move the pointer to the column where the data field begins and then supply an informat specifying how many columns the variable occupies. The INPUT statement could also be written as follows:
  
    input @1 name $char8. +1 sex $char1. +4 age 2. +3 height 4.1 
          +1 weight 5.1;
 
In this form, you move the pointer to column 1 (@1) and read eight columns. The pointer is now at column 9. Now, move the pointer +1 columns to column 10 to read SEX. The $char1. informat says to read a character variable occupying one column. After you read the value for SEX, the pointer is at column 11, so move it to column 15 with +4 and read AGE in columns 15 and 16 (the 2. informat). The pointer is now at column 17, so move +3 columns and read HEIGHT. The same idea applies for reading WEIGHT.

Line Pointer Control



The line pointer control (/) directs IML to skip to the next line of input. You need a line pointer control when a record of data takes more than one line. You use the new line pointer control (/) to skip to the next line and continue reading data. In the example reading the class data, you do not need to skip a line because each line of data contains all the variables for a student.

Line Hold Control



The trailing at sign (@), when at the end of an INPUT statement, directs IML to hold the pointer on the current record so that you can read more data with subsequent INPUT statements. You can use it to read several records from a single line of data. Sometimes, when a record is very short - say, 10 columns or so - you can save space in your external file by coding several records on the same line.

Binary File Indicator Controls



When the external file you want to read is a binary file (RECFM=N is specified in the INFILE statement), you must tell IML how to read the values by using the following binary file indicator controls:
>n
start reading the next record at the byte position n in the file.

>point-variable
start reading the next record at the byte position in the file given by point-variable.

>(expression)
start reading the next record at the byte position in the file given by expression.

<n
read the number of bytes indicated by the value of n.

<point-variable
read the number of bytes indicated by the value of point-variable.

<(expression)
read the number of bytes indicated by the value of expression.

Pattern Searching

You can have the input mechanism search for patterns of text by using the at sign (@) with a character operand. IML starts searching at the current position, advances until it finds the pattern, and leaves the pointer at the position immediately after the found pattern in the input record. For example, the following statement searches for the pattern NAME= and then uses list input to read the value after the found pattern:
  
    input @ 'NAME=' name $;
 

If the pattern is not found, then the pointer is left past the end of the record, and the rest of the INPUT statement follows the conventions based on the options MISSOVER, STOPOVER, and FLOWOVER described in the section "Using the INFILE Statement". If you use pattern searching, you usually specify the MISSOVER option so that you can control for the occurrences of the pattern not being found.

Notice that the MISSOVER feature enables you to search for a variety of items in the same record, even if some of them are not found. For example, the following statements are able to read in the ADDR variable even if NAME= is not found (in which case, NAME is unvalued):

  
    infile in1 missover; 
    input @1 @ "NAME=" name $ 
          @1 @ "ADDR=" addr & 
          @1 @ "PHONE=" phone $;
 

The pattern operand can use any characters except for the following:

%$[ ]{ }<>-?*#@^  (backquote)

Record Directives

Each INPUT statement goes to a new record except in the following special cases: As discussed in the syntax of the INPUT statement, the line pointer operator (/) instructs the input mechanism to go immediately to the next record. For binary (RECFM=N) files, the > directive is used instead of the /.

Blanks

For character values, the informat determines the way blanks are interpreted. For example, the $CHARw. format reads blanks as part of the whole value, while the BZw. format turns blanks into 0s. See SAS Language Reference: Dictionary for more information about informats.

Missing Values

Missing values in formatted input are represented by blanks or a single period for a numeric value and by blanks for a character value.

Matrix Use

Data values are either character or numeric. Input variables always result in scalar (one row by one column) values with type (character or numeric) and length determined by the input format.

End-of-File Condition

End of file is the condition of trying to read a record when there are no more records to read from the file. The consequences of an end-of-file condition are described as follows.

For text files, end of file is encountered first as the end of the last record. The next time input is attempted, the end-of-file condition is raised.

For binary files, end of file can result in the input mechanism returning a record that is shorter than the requested length. In this case IML still attempts to process the record, using the rules described in the section "Using the INFILE Statement".

The DO DATA mechanism provides a convenient mechanism for handling end of file.

For example, to read the class data from the external file USER.TEXT.CLASS into a SAS data set, you need to perform the following steps:

  1. Establish a fileref referencing the data file.
  2. Use an INFILE statement to open the file for input.
  3. Initialize any character variables by setting the length.
  4. Create a new SAS data set with a CREATE statement. You want to list the variables you plan to input in a VAR clause.
  5. Use a DO DATA loop to read the data one line at a time.
  6. Write an INPUT statement telling IML how to read the data.
  7. Use an APPEND statement to add the new data line to the end of the new SAS data set.
  8. End the DO DATA loop.
  9. Close the new data set.
  10. Close the external file with a CLOSEFILE statement.

Your statements should look as follows:

  
    filename inclass 'user.text.class'; 
    infile inclass missover; 
    name="12345678"; 
    sex="1"; 
    create class  var{name sex age height weight}; 
    do data; 
       input name $ sex $ age height weight; 
       append; 
    end; 
    close class; 
    closefile inclass;
 
Note that the APPEND statement is not executed if the INPUT statement reads past the end of file since IML escapes the loop immediately when the condition is encountered.

Differences with the SAS DATA Step

If you are familiar with the SAS DATA step, you will notice that the following features are supported differently or are not supported in IML:

Previous Page | Next Page | Top of Page