Diagnosing and Avoiding Errors |
Examples in This Section |
This section uses nationwide test results from the Scholastic Aptitude Test (SAT) for university-bound students from 1972 through 1998(footnote 1) to show what happens when errors occur.
Diagnosing Syntax Errors |
The SAS Supervisor detects syntax errors as it compiles each step, and then SAS does the following:
In the following program, the CHART procedure is used to analyze the data. Note that a semicolon in the DATA statement is omitted, and the keyword INFILE is misspelled.
/* omitted semicolon and misspelled keyword */ libname out 'your-data-library'; data out.error1 infill 'your-input-file'; input test $ gender $ year SATscore @@; run; proc chart data = out.error1; hbar test / sumvar=SATscore type=mean group=gender discrete; run;
The following output shows the result of the two syntax errors:
NOTE: Libref OUT was successfully assigned as follows: Engine: V8 Physical Name: 'YOUR-DATA-LIBRARY' 50 data out.error1 51 infill 'YOUR-INPUT-FILE'; 52 input test $ gender $ year SATscore @@; 53 run; ERROR: No CARDS or INFILE statement. ERROR: Memtype field is invalid. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set OUT.ERROR1 may be incomplete. When this step was stopped there were 0 observations and 4 variables. WARNING: Data set OUT.ERROR1 was not replaced because this step was stopped. WARNING: The data set WORK.INFILL may be incomplete. When this step was stopped there were 0 observations and 4 variables. WARNING: Data set WORK.INFILL was not replaced because this step was stopped. 54 55 proc chart data=out.error1; 56 hbar test / sumvar=SATscore type=mean group=gender discrete; 57 run; NOTE: No observations in data set OUT.ERROR1.
As the log indicates, SAS recognizes the keyword DATA and attempts to process the DATA step. Because the DATA statement must end with a semicolon, SAS assumes that INFILL is a data set name and that two data sets are being created: OUT.ERROR1 and WORK.INFILL. Because it considers INFILL the name of a data set, it does not recognize it as part of another statement and, therefore, does not detect the spelling error. Because the quoted string is invalid in a DATA statement, SAS stops processing here and creates no observations for either data set.
SAS attempts to execute the program logically based on the statements that it contains, according to the steps outlined earlier in this section. The second syntax error, the misspelled keyword, is never recognized because SAS considers the DATA statement to be in effect until a semicolon ends the statement. The point to remember is that when multiple errors are made in the same program, not all of them might be detected the first time the program is executed, or they might be flagged differently in a group than if they were made alone. You might find that one correction uncovers another error or at least changes its explanation in the log.
To illustrate this point, the previous program is reexecuted with the semicolon added to the DATA statement. An attempt to correct the misspelled keyword simply introduces a different spelling error, as follows.
/* misspelled keyword */ libname out 'your-data-library'; data out.error2; unfile 'your-input-file'; input test $ gender $ year SATscore @@; run; proc chart data = out.error1; hbar test / sumvar=SATscore type=mean group=gender discrete; run;
The following output shows the results:
Correcting Syntax and Finding Different Error Messages
NOTE: Libref OUT was successfully assigned as follows: Engine: V8 Physical Name: YOUR-DATA-LIBRARY 70 data out.error2; 71 unfile 'YOUR-INPUT-FILE' ------ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 72 input test $ gender $ year SATscore @@; 73 run; ERROR: No CARDS or INFILE statement. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set OUT.ERROR2 may be incomplete. When this step was stopped there were 0 observations and 4 variables. 74 75 proc chart data=out.error1; 76 hbar test / sumvar=SATscore type=mean group=gender discrete; 77 run; NOTE: No observations in data set OUT.ERROR1.
With the semicolon added, SAS now attempts to create only one data set. From that point on, SAS reads the SAS statements as it did before and issues many of the same messages. However, this time SAS considers the UNFILE statement invalid or out of proper order, and it creates no observations for the data set.
Diagnosing Execution-Time Errors |
Several types of errors are detected at execution time. Execution-time errors include the following:
an incorrect reference in an INFILE statement (for example, misspelling or otherwise incorrectly stating the external file)
When the SAS Supervisor encounters an execution-time error, it does the following:
prints a note, warning, or error message, depending on the seriousness of the error
in some cases, lists the values that are stored in the program data vector
continues or stops processing, depending on the seriousness of the error
If the previous program is rerun with the correct spelling for INFILE but with a misspelling of the filename in the INFILE statement, then the error is detected at execution time and the data is not read.
/* misspelled file in the INFILE statement */ libname out 'your-data-library'; data out.error3; infile 'an-incorrect-filename'; input test $ gender $ year SATscore @@; run; proc chart data = out.error3; hbar test / sumvar=SATscore type=mean group=gender discrete; run;
As the SAS log in the following output indicates, SAS cannot find the file. SAS stops processing because of errors and creates no observations in the data set.
Diagnosing an Error in the INFILE Statement
NOTE: Libref OUT was successfully assigned as follows: Engine: V8 Physical Name: YOUR-DATA-LIBRARY 10 data out.error3; 11 infile 'AN-INCORRECT-FILENAME'; 12 input test $ gender $ year SATscore @@; 13 run; ERROR: Physical file does not exist, AN-INCORRECT-FILENAME NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set OUT.ERROR3 may be incomplete. When this step was stopped there were 0 observations and 4 variables. 14 15 proc chart data=out.error3; 16 hbar test / sumvar=SATscore type=mean group=gender discrete; 17 run; NOTE: No observations in data set OUT.ERROR3.
Diagnosing Data Errors |
When SAS detects data errors during execution, it continues processing and then does the following:
Note that the values listed in the program data vector include two variables created automatically by SAS:
These automatic variables are assigned temporarily to each observation and are not stored with the data set.
The raw data that is shown here is read by a program that uses formats to determine how variable values are printed:
verbal m 1967 463 verbal f 1967 468 verbal m 1970 459 verbal f 1970 461 math m 1967 514 math f 1967 467 math m 1970 509 math f 1970 509
However, the data is not aligned correctly in the columns that are described by the INPUT statement. The sixth data line is shifted two spaces to the right, and the rest of the data lines, except for the first, are shifted one space to the right, as shown by a comparison of the raw data with the following program:
/* data in wrong columns */ libname out 'your-data-library'; proc format; value xscore . ='accurate scores unavailable'; run; data out.error4; infile 'your-input-file'; input test $ 1-8 gender $ 18 year 20-23 score 25-27; format score xscore.; run; proc print data = out.error4; title 'Viewing Incorrect Output'; run;
The following output shows the results of the SAS program:
Detecting Data Errors with Incorrect Output
Viewing Incorrect Output 1 Obs test gender year score 1 verbal m 1967 463 2 verbal 196 46 3 verbal 197 45 4 verbal 197 46 5 math 196 51 6 math . accurate scores unavailable 7 math 197 50 8 math 197 50
This program generates output, but it is not the expected output. The first observation appears to be correct, but subsequent observations have the following problems:
Only the first three digits of the value for the variable YEAR are shown except in the sixth observation where a missing value is indicated.
The third digit of the value for the variable SCORE is missing, again except in the sixth observation, which does show the assigned value for the missing value.
The SAS log in the following output contains an explanation:
NOTE: Libref OUT was successfully assigned as follows: Engine: V8 Physical Name: YOUR-DATA-LIBRARY 10 proc format; NOTE: Format XSCORE has been output. 11 value xscore . ='accurate scores unavailable'; 12 run; 13 14 data out.error4; 15 infile 'YOUR-INPUT-FILE'; 16 input test $ 1-8 gender $ 18 year 20-23 17 score 25-27; 18 format score xscore.; 19 run; NOTE: The infile 'YOUR-INPUT-FILE' is: File Name=YOUR-INPUT-FILE, Owner Name=userid,Group Name=dev, Access Permission=rw-r--r--, File Size (bytes)=233 NOTE: Invalid data for year in line 6 20-23. NOTE: Invalid data for score in line 6 25-27. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7 6 math f 1967 467 29 test=math gender= year=. score=accurate scores unavailable _ERROR_=1 _N_=6 NOTE: 9 records were read from the infile 'YOUR-INPUT-FILE'. The minimum record length was 0. The maximum record length was 29. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set OUT.ERROR4 has 8 observations and 4 variables. 20 21 proc print data=out.error4; 22 title 'Viewing Incorrect Output'; 23 run; NOTE: There were 8 observations read from the data set OUT.ERROR4.
The errors are flagged, starting with the first message that line 6 contains invalid data for the variable YEAR. The rule indicates that input data has been written to the log. SAS lists on the log the values that are stored in the program data vector. The following lines from the log indicate that SAS has encountered an error:
NOTE: Invalid data for year in line 6 20-23. NOTE: Invalid data for score in line 6 25-27. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7 6 math f 1967 467 29 test=math gender= year=. score=accurate scores unavailable _ERROR_=1 _N_=6
Missing values are shown for the variables GENDER and YEAR. The NOTEs in the log indicate that the sixth line of input contained the error.
To debug the program, either the raw data can be repositioned or the INPUT statement can be rewritten, remembering that all the data lines were shifted at least one space to the right. The variable TEST was unaffected, but the variable GENDER was completely removed from its designated field; therefore, SAS reads the variable GENDER as a missing value. In the sixth observation, for which the data was shifted right an additional space, the character value for GENDER occupied part of the field for the numeric variable YEAR. When SAS encounters invalid data, it treats the value as a missing value but also notes on the log that the data is invalid. The important point to remember is that SAS can use only the information that you provide to it, not what you intend to provide to it.
FOOTNOTE 1: See the Appendix for a complete listing of the input data that is used to create the data sets in this section.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.