Previous Page | Next Page

Diagnosing and Avoiding Errors

Diagnosing Errors


Examples in This Section

This section uses nationwide test results from the Scholastic Aptitude Test (SAT) for university-bound students from 1972 through 1998(footnote 1) to show what happens when errors occur.


Diagnosing Syntax Errors

The SAS Supervisor detects syntax errors as it compiles each step, and then SAS does the following:

In the following program, the CHART procedure is used to analyze the data. Note that a semicolon in the DATA statement is omitted, and the keyword INFILE is misspelled.

   /* omitted semicolon and misspelled keyword */
libname out 'your-data-library';

data out.error1
   infill 'your-input-file';
   input test $ gender $ year SATscore @@;
run;

proc chart data = out.error1;
   hbar test / sumvar=SATscore type=mean group=gender discrete;
run;

The following output shows the result of the two syntax errors:

Diagnosing Syntax Errors

NOTE: Libref OUT was successfully assigned as follows: 
      Engine:        V8 
      Physical Name: 'YOUR-DATA-LIBRARY'
50   data out.error1
51      infill 'YOUR-INPUT-FILE';
52      input test $ gender $ year SATscore @@;
53   run;
ERROR: No CARDS or INFILE statement.
ERROR: Memtype   field is invalid.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OUT.ERROR1 may be incomplete.  When this step was stopped 
         there were 0 observations and 4 variables.
WARNING: Data set OUT.ERROR1 was not replaced because this step was stopped.
WARNING: The data set WORK.INFILL may be incomplete.  When this step was 
         stopped there were 0 observations and 4 variables.
WARNING: Data set WORK.INFILL was not replaced because this step was stopped.
54   
55   proc chart data=out.error1;
56      hbar test / sumvar=SATscore type=mean group=gender discrete;
57   run;
NOTE: No observations in data set OUT.ERROR1.

As the log indicates, SAS recognizes the keyword DATA and attempts to process the DATA step. Because the DATA statement must end with a semicolon, SAS assumes that INFILL is a data set name and that two data sets are being created: OUT.ERROR1 and WORK.INFILL. Because it considers INFILL the name of a data set, it does not recognize it as part of another statement and, therefore, does not detect the spelling error. Because the quoted string is invalid in a DATA statement, SAS stops processing here and creates no observations for either data set.

SAS attempts to execute the program logically based on the statements that it contains, according to the steps outlined earlier in this section. The second syntax error, the misspelled keyword, is never recognized because SAS considers the DATA statement to be in effect until a semicolon ends the statement. The point to remember is that when multiple errors are made in the same program, not all of them might be detected the first time the program is executed, or they might be flagged differently in a group than if they were made alone. You might find that one correction uncovers another error or at least changes its explanation in the log.

To illustrate this point, the previous program is reexecuted with the semicolon added to the DATA statement. An attempt to correct the misspelled keyword simply introduces a different spelling error, as follows.

/* misspelled keyword */
 libname out 'your-data-library';

data out.error2;
   unfile  'your-input-file';
   input test $ gender $ year SATscore @@;
run;

proc chart data = out.error1;
   hbar test / sumvar=SATscore type=mean group=gender discrete;
run;

The following output shows the results:

Correcting Syntax and Finding Different Error Messages

NOTE: Libref OUT was successfully assigned as follows: 
      Engine:        V8 
      Physical Name: YOUR-DATA-LIBRARY
70   data out.error2;
71      unfile 'YOUR-INPUT-FILE'
        ------
        180
ERROR 180-322: Statement is not valid or it is used out of proper order.

72      input test $ gender $ year SATscore @@;
73   run;
ERROR: No CARDS or INFILE statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OUT.ERROR2 may be incomplete.  When this step was stopped 
         there were 0 observations and 4 variables.
74   
75   proc chart data=out.error1;
76      hbar test / sumvar=SATscore type=mean group=gender discrete;
77   run;
NOTE: No observations in data set OUT.ERROR1.

With the semicolon added, SAS now attempts to create only one data set. From that point on, SAS reads the SAS statements as it did before and issues many of the same messages. However, this time SAS considers the UNFILE statement invalid or out of proper order, and it creates no observations for the data set.


Diagnosing Execution-Time Errors

Several types of errors are detected at execution time. Execution-time errors include the following:

When the SAS Supervisor encounters an execution-time error, it does the following:

If the previous program is rerun with the correct spelling for INFILE but with a misspelling of the filename in the INFILE statement, then the error is detected at execution time and the data is not read.

/* misspelled file in the INFILE statement */ 
libname out 'your-data-library';

data out.error3;
   infile 'an-incorrect-filename';
   input test $ gender $ year SATscore @@;
run;

proc chart data = out.error3;
   hbar test / sumvar=SATscore type=mean group=gender discrete;
run;

As the SAS log in the following output indicates, SAS cannot find the file. SAS stops processing because of errors and creates no observations in the data set.

Diagnosing an Error in the INFILE Statement

NOTE: Libref OUT was successfully assigned as follows: 
      Engine:        V8 
      Physical Name: YOUR-DATA-LIBRARY
10   data out.error3;
11      infile 'AN-INCORRECT-FILENAME';
12      input test $ gender $ year SATscore @@;
13   run;
ERROR: Physical file does not exist, AN-INCORRECT-FILENAME
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OUT.ERROR3 may be incomplete.  When this step was stopped 
         there were 0 observations and 4 variables.
14   
15   proc chart data=out.error3;
16      hbar test / sumvar=SATscore type=mean group=gender discrete;
17   run;
NOTE: No observations in data set OUT.ERROR3.

Diagnosing Data Errors

When SAS detects data errors during execution, it continues processing and then does the following:

Note that the values listed in the program data vector include two variables created automatically by SAS:

_N_

counts the number of times the DATA step iterates.

_ERROR_

indicates the occurrence of an error during an execution of the DATA step. The value that is assigned to the variable _ERROR_ is 0 when no error is encountered and 1 when an error is encountered.

These automatic variables are assigned temporarily to each observation and are not stored with the data set.

The raw data that is shown here is read by a program that uses formats to determine how variable values are printed:

verbal           m 1967 463
 verbal           f 1967 468
 verbal           m 1970 459
 verbal           f 1970 461
 math             m 1967 514
  math             f 1967 467
 math             m 1970 509
 math             f 1970 509

However, the data is not aligned correctly in the columns that are described by the INPUT statement. The sixth data line is shifted two spaces to the right, and the rest of the data lines, except for the first, are shifted one space to the right, as shown by a comparison of the raw data with the following program:

/* data in wrong columns */
libname out 'your-data-library';
proc format;
   value xscore . ='accurate scores unavailable';
run;

data out.error4;
   infile 'your-input-file';
   input test $ 1-8 gender $ 18 year 20-23
         score 25-27;
   format score xscore.;
run;

proc print data = out.error4;
   title 'Viewing Incorrect Output';
run;

The following output shows the results of the SAS program:

Detecting Data Errors with Incorrect Output

                            Viewing Incorrect Output                           1

         Obs     test     gender    year               score

          1     verbal      m       1967                            463
          2     verbal               196                             46
          3     verbal               197                             45
          4     verbal               197                             46
          5     math                 196                             51
          6     math                   .    accurate scores unavailable
          7     math                 197                             50
          8     math                 197                             50

This program generates output, but it is not the expected output. The first observation appears to be correct, but subsequent observations have the following problems:

The SAS log in the following output contains an explanation:

Diagnosing Data Errors

NOTE: Libref OUT was successfully assigned as follows: 
      Engine:        V8 
      Physical Name: YOUR-DATA-LIBRARY
10   proc format;
NOTE: Format XSCORE has been output.
11      value xscore . ='accurate scores unavailable';
12   run;
13   
14   data out.error4;
15     infile 'YOUR-INPUT-FILE';
16      input test $ 1-8 gender $ 18 year 20-23
17            score 25-27;
18      format score xscore.;
19   run;
NOTE: The infile 'YOUR-INPUT-FILE' is:
      
      File Name=YOUR-INPUT-FILE,
      Owner Name=userid,Group Name=dev,
      Access Permission=rw-r--r--,
      File Size (bytes)=233

NOTE: Invalid data for year in line 6 20-23.
NOTE: Invalid data for score in line 6 25-27.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
6           math             f 1967 467 29
test=math gender=  year=. score=accurate scores unavailable _ERROR_=1 _N_=6
NOTE: 9 records were read from the infile 
      'YOUR-INPUT-FILE'.
      The minimum record length was 0.
      The maximum record length was 29.
NOTE: SAS went to a new line when INPUT statement reached past the end of a 
      line.
NOTE: The data set OUT.ERROR4 has 8 observations and 4 variables.
20   
21   proc print data=out.error4;
22      title 'Viewing Incorrect Output';
23   run;
NOTE: There were 8 observations read from the data set OUT.ERROR4.

The errors are flagged, starting with the first message that line 6 contains invalid data for the variable YEAR. The rule indicates that input data has been written to the log. SAS lists on the log the values that are stored in the program data vector. The following lines from the log indicate that SAS has encountered an error:

NOTE: Invalid data for year in line 6 20-23.
NOTE: Invalid data for score in line 6 25-27.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
6           math             f 1967 467 29
test=math gender=  year=. score=accurate scores unavailable _ERROR_=1 _N_=6

Missing values are shown for the variables GENDER and YEAR. The NOTEs in the log indicate that the sixth line of input contained the error.

To debug the program, either the raw data can be repositioned or the INPUT statement can be rewritten, remembering that all the data lines were shifted at least one space to the right. The variable TEST was unaffected, but the variable GENDER was completely removed from its designated field; therefore, SAS reads the variable GENDER as a missing value. In the sixth observation, for which the data was shifted right an additional space, the character value for GENDER occupied part of the field for the numeric variable YEAR. When SAS encounters invalid data, it treats the value as a missing value but also notes on the log that the data is invalid. The important point to remember is that SAS can use only the information that you provide to it, not what you intend to provide to it.


FOOTNOTE 1:   See the Appendix for a complete listing of the input data that is used to create the data sets in this section. [arrow]

Previous Page | Next Page | Top of Page