Previous Page | Next Page

Starting with Raw Data: Beyond the Basics

Problem Solving: When an Input Record Unexpectedly Does Not Have Enough Values


Understanding the Default Behavior

When a DATA step reads raw data from an external file, problems can occur when SAS encounters the end of an input line before reading in data for all variables specified in the input statement. This problem can occur when reading variable-length records and/or records containing missing values.

The following is an example of an external file that contains variable-length records:

----+----1----+----2

22
333
4444
55555

This DATA step uses the numeric informat 5. to read a single field in each record of raw data and to assign values to the variable TestNumber:

data numbers;
   infile 'your-external-file';
   input TestNumber 5.;
run;

proc print data=numbers;
   title 'Test DATA Step';
run;

The DATA step reads the first value (22). Because the value is shorter than the 5 characters expected by the informat, the DATA step attempts to finish filling the value with the next record (333). This value is entered into the PDV and becomes the value of the TestNumber variable for the first observation. The DATA step then goes to the next record, but encounters the same problem because the value (4444) is shorter than the value that is expected by the informat. Again, the DATA step goes to the next record, reads the value (55555), and assigns that value to the TestNumber variable for the second observation.

The following output shows the results. After this program runs, the SAS log contains a note to indicate the places where SAS went to the next record to search for data values.

Reading Raw Data Past the End of a Line: Default Behavior

                                 Test DATA Step                                1

                                         Test
                                 Obs    Number

                                  1        333
                                  2      55555

Methods of Control: Your Options


Four Options: FLOWOVER, STOPOVER, MISSOVER, and TRUNCOVER

To control how SAS behaves after it attempts to read past the end of a data line, you can use the following options in the INFILE statement:

infile 'your-external-file' flowover;

is the default behavior. The DATA step simply reads the next record into the input buffer, attempting to find values to assign to the rest of the variable names in the INPUT statement.

infile 'your-external-file' stopover;

causes the DATA step to stop processing if an INPUT statement reaches the end of the current record without finding values for all variables in the statement. Use this option if you expect all of the data in the external file to conform to a given standard and if you want the DATA step to stop when it encounters a data record that does not conform to the standard.

infile 'your-external-file' missover;

prevents the DATA step from going to the next line if it does not find values in the current record for all of the variables in the INPUT statement. Instead, the DATA step assigns a missing value for all variables that do not have values.

infile 'your-external-file' truncover;

causes the DATA step to assign the raw data value to the variable even if the value is shorter than expected by the INPUT statement. If, when the DATA step encounters the end of an input record, there are variables without values, the variables are assigned missing values for that observation.

You can also use these options even when your data lines are in the program itself, that is, when they follow the DATALINES statement. Simply use datalines instead of a reference to an external file to indicate that the data records are in the DATA step itself:

Note:   The examples in this section show the use of the MISSOVER and TRUNCOVER options with formatted input. You can also use these options with list input and column input.  [cautionend]


Understanding the MISSOVER Option

The MISSOVER option prevents the DATA step from going to the next line if it does not find values in the current record for all of the variables in the INPUT statement. Instead, the DATA step assigns a missing value for all variables that do not have complete values according to any specified informats. The input file contains the following raw data:

----+----1----+----2

22
333
4444
55555

The following example uses the MISSOVER option:

data numbers;
   infile 'your-external-file' missover;
   input TestNumber 5.;
run;

proc print data=numbers;
   title 'Test DATA Step';
run;

Output from the MISSOVER Option

                                 Test DATA Step                                1

                                         Test
                                 Obs    Number

                                  1          .
                                  2          .
                                  3          .
                                  4      55555

Because the fourth record is the only one whose value matches the informat, it is the only record whose value is assigned to the TestNumber variable. The other observations receive missing values. This result is probably not the desired outcome for this example, but the MISSOVER option can sometimes be valuable. For an example, see Updating a Data Set.

Note:   If there is a blank line at the end of the last record, the DATA step attempts to load another record into the input buffer. Because there are no more records, the MISSOVER option instructs the DATA step to assign missing values to all variables, and an extra observation is added to the data set. To prevent this situation from occurring, make sure that your input data does not have a blank line at the end of the last record.  [cautionend]


Understanding the TRUNCOVER Option

The TRUNCOVER option causes the DATA step to assign the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement. If, when the DATA step encounters the end of an input record, there are variables without values, the variables are assigned missing values for that observation. The following example demonstrates the use of the TRUNCOVER statement:

data numbers;
   infile 'your-external-file' truncover;
   input TestNumber 5.;
run;

proc print data=numbers;
   title 'Test DATA Step';
run;

Output from the TRUNCOVER Option

                                 Test DATA Step                                1

                                         Test
                                 Obs    Number

                                  1         22
                                  2        333
                                  3       4444
                                  4      55555
This result shows that all of the values were assigned to the TestNumber variable, despite the fact that three of them did not match the informat. For another example using the TRUNCOVER option, see Input SAS Data Set for Examples.

Previous Page | Next Page | Top of Page