Previous Page | Next Page

Modifying SAS Data Sets

Modifying a Master Data Set with Observations from a Transaction Data Set


Understanding the MODIFY Statement

The MODIFY statement replaces data in a master data set with data from a transaction data set, and makes the changes in the original master data set. You can use a BY statement to match observations from the transaction data set with observations in the master data set. The syntax for using the MODIFY statement and the BY statement is

MODIFY master-SAS-data-set transaction-SAS-data-set;
BY by-variable;

The master-SAS-data-set specifies the SAS data set that you want to modify. The transaction-SAS-data-set specifies the SAS data set that provides the values for updating the master data set. The by-variable specifies one or more variables by which you identify corresponding observations.

When you use a BY statement with the MODIFY statement, the DATA step uses dynamic WHERE processing to find observations in the master data set. Neither the master data set nor the transaction data set needs to be sorted. For large data sets, however, sorting the data before you modify it can enhance performance significantly.


Adding New Observations to the Master Data Set

You can use the MODIFY statement to add observations to an existing master data set. If the transaction data set contains an observation that does not match an observation in the master data set, then SAS enables you to write a new observation to the master data set if you use an explicit OUTPUT statement in your program. When you specify an explicit OUTPUT statement, you must also specify a REPLACE statement if you want to replace observations in place. All new observations append to the end of the master data set.


Checking for Program Errors

You can use the _IORC_ automatic variable for error checking in your DATA step program. The _IORC_ automatic variable contains the return code for each I/O operation that the MODIFY statement attempts to perform.

The best way to test the values of _IORC_ is with the mnemonic codes that are provided by the SYSRC autocall macro. Each mnemonic code describes one condition. The mnemonics provide an easy method for testing problems in a DATA step program. The following is a partial list of codes:

_DSENMR

specifies that the transaction data set observation does not exist in the master data set (used only with MODIFY and BY statements). If consecutive observations with different BY values do not find a match in the master data set, then both of them return _DSENMR.

_DSEMTR

specifies that multiple transaction data set observations with a given BY value do not exist in the master data set (used only with MODIFY and BY statements). If consecutive observations with the same BY values do not find a match in the master data set, then the first observation returns _DSENMR and the subsequent observations return _DSEMTR.

_SOK

specifies that the observation was located in the master data set.

For a complete list of mnemonic codes, see the MODIFY statement in SAS Language Reference: Dictionary.


The Program

The program in this section updates values in a master data set with values from a transaction data set. If a transaction does not exist in the master data set, then the program adds the transaction to the master data set.

In this example, a warehouse received a shipment of new items, and the INVENTORY master data set must be modified to reflect the changes. The master data set contains a complete list of the inventory items. The transaction data set contains items that are on the master inventory as well as new inventory items.

The following program creates the ADD_INVENTORY transaction data set, which contains items for updating the master data set. The PartNumber variable contains the part number for the item and corresponds to PartNumber in the INVENTORY data set. The Description variable names the item. The NewStock variable contains the number of each item in the current shipment. The NewPrice variable contains the new price of the item.

The program attempts to update the master data set INVENTORY (see The INVENTORY Data Set) according to the values in the transaction data set ADD_INVENTORY. The program uses the _IORC_ automatic variable to detect errors.

data add_inventory; 1 
   input PartNumber $ Description $ NewStock @16 NewPrice;
   datalines;
K89R seal   6  247.50 
AA11 hammer 55 32.26 
BB22 wrench 21 17.35  
KJ66 cutter 10 24.50  
CC33 socket 7  22.19   
BV1E timer 30  36.50
;

options pagesize=60 linesize=80 pageno=1 nodate;

data inventory;
   modify inventory add_inventory; 2 
   by PartNumber;
   select (_iorc_); 3 
      /* The observation exists in the master data set. */
      when (%sysrc(_sok)) do; 4 
         InStock=InStock+NewStock;
         ReceivedDate=today();
         Price=NewPrice;
         replace; 5 
      end;
      /* The observation does not exist in the master data set. */
      when (%sysrc(_dsenmr)) do; 6 
         InStock=NewStock;
         ReceivedDate=today();
         Price=NewPrice;
         output; 7 
         _error_=0;
      end;
      otherwise do; 8 
           put 'An unexpected I/O error has occurred.'/ 8 
             'Check your data and your program.'; 8 
         _error_=0;
         stop;
      end; 
   end; 
 
proc print data=inventory;
   title 'Tool Warehouse Inventory';
run;

The following list corresponds to the numbered items in the preceding program:

[1] The DATA statement creates the transaction data set ADD_INVENTORY.

[2] The MODIFY statement loads the data from the INVENTORY and ADD_INVENTORY data sets.

[3] The _IORC_ automatic variable is used for error checking. The value of _IORC_ is a numeric return code that indicates the status of the most recent I/O operation.

[4] The SYSRC autocall macro checks to see if the value of _IORC_ is _SOK. If the value is _SOK, then an observation in the transaction data set matches an observation in the master data set.

[5] The REPLACE statement updates the master data set INVENTORY by replacing the observation in the master data set with the observation from the transaction data set.

[6] The SYSRC autocall macro checks to see if the value of _IORC_ is _DSENMR. If the value is _DSENMR, then an observation in the transaction data set does not exist in the master data set.

[7] The OUTPUT statement writes the current observation to the end of the master data set.

[8] If neither condition is met, the PUT statement writes a message to the log.

The following output shows the results:

The Updated INVENTORY Data Set

                            Tool Warehouse Inventory                           1

                  Part                      In      Received
          Obs    Number    Description    Stock         Date     Price

            1     K89R       seal           40     19JAN2001    247.50
            2     M4J7       sander         98     20JUN1998     45.88
            3     LK43       filter        121     19MAY1999     10.99
            4     MN21       brace          43     10AUG1999     27.87
            5     BC85       clamp          80     16AUG1999      9.55
            6     NCF3       valve         198     20MAR1999     24.50
            7     KJ66       cutter         16     19JAN2001     24.50
            8     UYN7       rod           211     09SEP1999     11.55
            9     JD03       switch        383     09JAN2000     13.99
           10     BV1E       timer          56     19JAN2001     36.50
           11     AA11       hammer         55     19JAN2001     32.26
           12     BB22       wrench         21     19JAN2001     17.35
           13     CC33       socket          7     19JAN2001     22.19

SAS writes the following message to the log:

NOTE: The data set WORK.INVENTORY has been updated.  There were 3 observations 
      rewritten, 3 observations added and 0 observations deleted.
CAUTION:
If you execute your program without the OUTPUT and REPLACE statements, then your master file might not update correctly.

Using OUTPUT or REPLACE in a DATA step overrides the default replacement of observations. If you use these statements in a DATA step, then you must explicitly program each action that you want to take.  [cautionend]

For more information about the MODIFY, OUTPUT, and REPLACE statements, see the Statements section in SAS Language Reference: Dictionary.

Previous Page | Next Page | Top of Page