Reading, Combining, and Modifying SAS Data Sets |
The Importance of Error Checking |
When reading observations with the SET statement and KEY= option or with the MODIFY statement, error checking is imperative for several reasons. The most important reason is that these tools use nonsequential access methods, and so there is no guarantee that an observation will be located that satisfies the request. Error checking enables you to direct execution to specific code paths, depending on the outcome of the I/O operation. Your program will continue execution for expected conditions and terminate execution when unexpected results occur.
Error-Checking Tools |
Two tools have been created to make error checking easier when you use the MODIFY statement or the SET statement with the KEY= option to process SAS data sets:
_IORC_ is created automatically when you use the MODIFY statement or the SET statement with KEY=. The value of _IORC_ is a numeric return code that indicates the status of the I/O operation from the most recently executed MODIFY or SET statement with KEY=. Checking the value of this variable enables you to detect abnormal I/O conditions and to direct execution down specific code paths instead of having the application terminate abnormally. For example, if the KEY= variable value does match between two observations, you might want to combine them and output an observation. If they don't match, however, you might want only to write a note to the log.
Because the values of the _IORC_ automatic variable are internal and subject to change, the SYSRC macro was created to enable you to test for specific I/O conditions while protecting your code from future changes in _IORC_ values. When you use SYSRC, you can check the value of _IORC_ by specifying one of the mnemonics listed in the following table.
Mnemonic Value | Meaning of Return Code | When Return Code Occurs | |
---|---|---|---|
_DSENMR | The TRANSACTION data set observation does not exist in the MASTER data set. | MODIFY with BY is used and no match occurs. | |
_DSEMTR | Multiple TRANSACTION data set observations with the same BY variable value do not exist in the MASTER data set. | MODIFY with BY is used and consecutive observations with the same BY values do not find a match in the first data set. In this situation, the first observation that fails to find a match returns _DSENMR. The subsequent observations return _DSEMTR. | |
_DSENOM | No matching observation was found in the MASTER data set. | SET or MODIFY with KEY= finds no match. | |
_SENOCHN | The output operation was unsuccessful. | the KEY= option in a MODIFY statement contains duplicate values. | |
_SOK | The I/O operation was successful. | a match is found. |
Example 1: Routing Execution When an Unexpected Condition Occurs |
This example shows how to prevent an unexpected condition from terminating the DATA step. The goal is to update a master data set with new information from a transaction data set. This application assumes that there are no duplicate values for the common variable in either data set.
Note: This program works as expected only if the master and transaction data sets contain no consecutive observations with the same value for the common variable. For an explanation of the behavior of MODIFY with KEY= when duplicates exist, see the MODIFY statement in SAS Language Reference: Dictionary.
The TRANSACTION data set contains three observations: two updates to information in MASTER and a new observation about PartNumber value 6 that needs to be added. MASTER is indexed on PartNumber. There are no duplicate values of PartNumber in MASTER or TRANSACTION. The following shows the MASTER and the TRANSACTION input data sets:
MASTER TRANSACTION OBS PartNumber Quantity OBS PartNumber AddQuantity 1 1 10 1 4 14 2 2 20 2 6 16 3 3 30 3 2 12 4 4 40 5 5 50
The objective is to update the MASTER data set with information from the TRANSACTION data set. The program reads TRANSACTION sequentially. MASTER is read directly, not sequentially, using the MODIFY statement and the KEY= option. Only observations with matching values for PartNumber, which is the KEY= variable, are read from MASTER.
data master; 1 set transaction; 2 modify master key=PartNumber; 3 Quantity = Quantity + AddQuantity; 4 run;
Match observations from the MASTER data set based on the values of PartNumber. | |
Update the information on Quantity by adding the new values from the TRANSACTION data set. |
This program has correctly updated one observation but it stopped when it could not find a match for PartNumber value 6. The following lines are written to the SAS log:
ERROR: No matching observation was found in MASTER data set. PartNumber=6 AddQuantity=16 Quantity=70 _ERROR_=1 _IORC_=1230015 _N_=2 NOTE: The SAS System stopped processing this step because of errors. NOTE: The data set WORK.MASTER has been updated. There were 1 observations rewritten, 0 observations added and 0 observations deleted.
The MASTER file was incorrectly updated. The updated master has five observations. One observation was updated correctly, a new one was not added, and a second update was not made. The following shows the incorrectly updated MASTER data set:
MASTER OBS PartNumber Quantity 1 1 10 2 2 20 3 3 30 4 4 54 5 5 50
The objective is to apply two updates and one addition to MASTER, preventing the DATA step from stopping when it does not find a match in MASTER for the PartNumber value 6 in TRANSACTION. By adding error checking, this DATA step is allowed to complete normally and produce a correctly revised version of MASTER. This program uses the _IORC_ automatic variable and the SYSRC autocall macro in a SELECT group to check the value of the _IORC_ variable and execute the appropriate code based on whether a match is found.
data master; 1 set transaction; 2 modify master key=PartNumber; 3 select(_iorc_); 4 when(%sysrc(_sok)) do; Quantity = Quantity + AddQuantity; replace; end; when(%sysrc(_dsenom)) do; Quantity = AddQuantity; _error_ = 0; output; end; otherwise do; put 'ERROR: Unexpected value for _IORC_= ' _iorc_; put 'Program terminating. DATA step iteration # ' _n_; put _all_; stop; end; end; run;
The DATA step executed without error and observations were appropriately updated and added. The following lines are written to the SAS log:
NOTE: The data set WORK.MASTER has been updated. There were 2 observations rewritten, 1 observations added and 0 observations deleted.
MASTER contains updated quantities for PartNumber values 2 and 4 and a new observation for PartNumber value 6. The following shows the correctly updated MASTER data set:
MASTER OBS PartNumber Quantity 1 1 10 2 2 32 3 3 30 4 4 54 5 5 50 6 6 16
Example 2: Using Error Checking on All Statements That Use KEY= |
This example shows how important it is to use error checking on all statements that use the KEY= option when reading data.
The MASTER and DESCRIPTION data sets are both indexed on PartNumber. The ORDER data set contains values for all parts in a single order. Only ORDER contains the PartNumber value 8. The following shows the MASTER, ORDER, and DESCRIPTION input data sets:
MASTER ORDER OBS PartNumber Quantity OBS PartNumber 1 1 10 1 2 2 2 20 2 4 3 3 30 3 1 4 4 40 4 3 5 5 50 5 8 6 5 7 6 DESCRIPTION OBS PartNumber PartDescription 1 4 Nuts 2 3 Bolts 3 2 Screws 4 6 Washers
The objective is to create a data set that contains the description and number in stock for each part in a single order, except for the parts that are not found in either of the two input data sets, MASTER and DESCRIPTION. A transaction data set contains the part numbers of all parts in a single order. One data set is read to retrieve the description of the part and another is read to retrieve the quantity that is in stock.
The program reads the ORDER data set sequentially and then uses SET with the KEY= option to read the MASTER and DESCRIPTION data sets directly, based on the key value of PartNumber. When a match occurs, an observation is written that contains all the necessary information for each value of PartNumber in ORDER. This first attempt at a solution uses error checking for only one of the two SET statements that use KEY= to read a data set.
data combine; 1 length PartDescription $ 15; set order; 2 set description key=PartNumber; 2 set master key=PartNumber; 2 select(_iorc_); 3 when(%sysrc(_sok)) do; output; end; when(%sysrc(_dsenom)) do; PartDescription = 'No description'; _error_ = 0; output; end; otherwise do; put 'ERROR: Unexpected value for _IORC_= ' _iorc_; put 'Program terminating.'; put _all_; stop; end; end; run;
This program creates an output data set but executes with one error. The following lines are written to the SAS log:
PartNumber=1 PartDescription=Nuts Quantity=10 _ERROR_=1 _IORC_=0 _N_=3 PartNumber=5 PartDescription=No description Quantity=50 _ERROR_=1 _IORC_=0 _N_=6 NOTE: The data set WORK.COMBINE has 7 observations and 3 variables.
The following shows the incorrectly created COMBINE data set. Observation 5 should not be in this data set. PartNumber value 8 does not exist in either MASTER or DESCRIPTION, so no Quantity should be listed for it. Also, observations 3 and 7 contain descriptions from observations 2 and 6, respectively.
COMBINE OBS PartNumber PartDescription Quantity 1 2 Screws 20 2 4 Nuts 40 3 1 Nuts 10 4 3 Bolts 30 5 8 No description 30 6 5 No description 50 7 6 No description 50
To create an accurate output data set, this example performs error checking on both SET statements that use the KEY= option:
data combine(drop=Foundes); 1 length PartDescription $ 15; set order; 2 Foundes = 0; 3 set description key=PartNumber; 4 select(_iorc_); 5 when(%sysrc(_sok)) do; Foundes = 1; end; when(%sysrc(_dsenom)) do; PartDescription = 'No description'; _error_ = 0; end; otherwise do; put 'ERROR: Unexpected value for _IORC_= ' _iorc_; put 'Program terminating. Data set accessed is DESCRIPTION'; put _all_; _error_ = 0; stop; end; end; set master key=PartNumber; 6 select(_iorc_); 7 when(%sysrc(_sok)) do; output; end; when(%sysrc(_dsenom)) do; if not Foundes then do; _error_ = 0; put 'WARNING: PartNumber ' PartNumber 'is not in' ' DESCRIPTION or MASTER.'; end; else do; Quantity = 0; _error_ = 0; output; end; end; otherwise do; put 'ERROR: Unexpected value for _IORC_= ' _iorc_; put 'Program terminating. Data set accessed is MASTER'; put _all_; _error_ = 0; stop; end; end; /* ends the SELECT group */ run;
The DATA step executed without error. Six observations were correctly created and the following message was written to the log:
WARNING: PartNumber 8 is not in DESCRIPTION or MASTER. NOTE: The data set WORK.COMBINE has 6 observations and 3 variables.
The following shows the correctly updated COMBINE data set. Note that COMBINE does not contain an observation with the PartNumber value 8. This value does not occur in either MASTER or DESCRIPTION.
COMBINE OBS PartNumber PartDescription Quantity 1 2 Screws 20 2 4 Nuts 40 3 1 No description 10 4 3 Bolts 30 5 5 No description 50 6 6 Washers 0
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.