DATASETS Procedure

APPEND Statement

Adds the observations from one SAS data set to the end of another SAS data set.
Default: If the BASE= data set is accessed through a SAS server and if no other user has the data set open at the time the APPEND statement begins processing, the BASE= data set defaults to CNTLLEV=MEMBER (member-level locking). When this behavior happens, no other user can update the file while the data set is processed.
Requirement: The BASE= data set must be a member of a SAS library that supports update processing.
Tips: You can specify most data set options for the BASE= argument and DATA= option. However, if you specify DROP=, KEEP=, or RENAME= data set option for the BASE= data set, the option is ignored. You can use any global statements as well.

If a failure occurs during processing, the data set is marked as damaged and is reset to its preappend condition at the next REPAIR statement. If the data set has an index, the index is not updated with each observation but is updated once at the end. (This behavior is Version 7 and later, as long as APPENDVER=V6 is not set.)

Concatenating Two SAS Data Sets

Syntax

Required Argument

BASE=<libref.> SAS-data-set
names the data set to which you want to add observations.
libref
specifies the library that contains the SAS data set. If you omit the libref, the default is the libref for the procedure input library. If you are using PROC APPEND, the default for libref is either WORK or USER.
SAS-data-set
names a SAS data set. If the APPEND statement cannot find an existing data set with this name, it creates a new data set in the library. That is, you can use the APPEND statement to create a data set by specifying a new data set name in the BASE= argument.
Whether you are creating a new data set or appending to an existing data set, the BASE= data set is the current SAS data set after all append operations.
Alias:OUT=

Optional Arguments

APPENDVER=V6
uses the Version 6 behavior for appending observations to the BASE= data set, which is to append one observation at a time. Beginning in Version 7, to improve performance, the default behavior changed so that all observations are appended after the data set is processed.
DATA=<libref.> SAS-data-set
names the SAS data set containing observations that you want to append to the end of the SAS data set specified in the BASE= argument.
libref
specifies the library that contains the SAS data set. If you omit libref, the default is the libref for the procedure input library. The DATA= data set can be from any SAS library. You must use the two-level name if the data set resides in a library other than the procedure input library.
SAS-data-set
names a SAS data set. If the APPEND statement cannot find an existing data set with this name, it stops processing.
Alias:NEW=
Default:the most recently created SAS data set, from any SAS library
FORCE
forces the APPEND statement to concatenate data sets when the DATA= data set contains variables that meet one of the following criteria:
  • are not in the BASE= data set
  • do not have the same type as the variables in the BASE= data set
  • are longer than the variables in the BASE= data set
Tip:You can use the GENNUM= data set option to append to or from a specific version in a generation group. Here are some examples:
/* appends historical version to base A */ 
proc datasets;    
   append base=a       
      data=a (gennum=2);  

/* appends current version of A to historical version */ 
proc datasets;    
   append base=a (gennum=1)       
      data=a;
GETSORT
copies the sort indicator from the DATA= data set to the BASE= data set. The sort indicator is established by either a PROC SORT or an ORDERBY clause in PROC SQL if the following criteria are met:
  • The BASE= data set must meet the following criteria:
    • be SAS Version 7 or higher
    • contain no observations
    • accept sort indicators
    CAUTION:
    Any pre-existing sort indicator on the BASE= data set is overwritten with no warning, even if the DATA= data set is not sorted at all.
  • The DATA= data set must meet the following criteria:
    • contain a sort indicator established by PROC SORT
    • be the same data representation as the BASE= data set
Restrictions:The GETSORT option has no effect on the data sets if the BASE= data set has an audit trail associated with it. This restriction causes a WARNING in the output while the APPEND process continues.

The GETSORT option has no effect on the data sets if there are dropped, kept, or renamed variables in the DATA= data file.

NOWARN
suppresses the warning when used with the FORCE option to concatenate two data sets with different variables.

Details

Appending Sorted Data Sets

You can append sorted data sets and maintain the sort using the following guidelines:
  • The DATA= data set and the BASE= data set contain sort indicators from the SORT procedure.
  • The DATA= data set and the BASE= data set are sorted using the same variables.
  • The observations added from the DATA= data set do not violate the sort order of the BASE= data set.
The sort indicator from the BASE= data set is retained.

Using the Block I/O Method to Append

The block I/O method is used to append blocks of data instead of one observation at a time. This method increases performance when you are appending large data sets. SAS determines whether to use the block I/O method. Not all data sets can use the block I/O method. There are restrictions set by the APPEND statement and the Base SAS engine.
To display information in the SAS log about the append method that is being used, you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
The following message is written to the SAS log, if the block I/O method is not used:
INFO: Data set block I/O cannot be used because:
If the APPEND statement determines that the block I/O will not be used, one of the following explanations is written to the SAS log:
INFO: - The data sets use different engines, have different variables or have attributes that might differ.
INFO: - There is a WHERE clause present.
INFO: - There is no member level locking.
INFO: - The OBS option is active.
INFO: - The FIRSTOBS option is active.
If the Base SAS engine determines that the block I/O method will not be used, one of the following explanations is written to the SAS log:
INFO: - Referential Integrity Constraints exist.
INFO: - Cross Environment Data Access is being used.
INFO: - The file is compressed.
INFO: - The file has an audit file which is not suspended.

Restricting the Observations That Are Appended

You can use the WHERE= data set option with the DATA= data set in order to restrict the observations that are appended. Likewise, you can use the WHERE statement in order to restrict the observations from the DATA= data set. The WHERE statement has no effect on the BASE= data set. If you use the WHERE= data set option with the BASE= data set, WHERE= has no effect.
CAUTION:
For an existing BASE= data set:
If there is a WHERE statement in the BASE= data set, it takes effect only if the WHEREUP= option is set to YES.
CAUTION:
For the non-existent BASE= data set:
If there is a WHERE statement in the non-existent BASE= data set, regardless of the WHEREUP option setting, you use the WHERE statement.
Note: You cannot append a data set to itself by using the WHERE= data set option.

Choosing between the SET Statement and the APPEND Statement

If you use the SET statement in a DATA step to concatenate two data sets, SAS must process all the observations in both data sets to create a new one. The APPEND statement bypasses the processing of data in the original data set and adds new observations directly to the end of the original data set. Using the APPEND statement can be more efficient than using a SET statement if any of the following occurs:
  • The BASE= data set is large.
  • All variables in the BASE= data set have the same length and type as the variables in the DATA= data set and if all variables exist in both data sets.
    Note: You can use the CONTENTS statement to see the variable lengths and types.
The APPEND statement is especially useful if you frequently add observations to a SAS data set (for example, in production programs that are constantly appending data to a journal-type data set).

Appending Password-Protected SAS Data Sets

In order to use the APPEND statement, you need Read access to the DATA= data set and Write access to the BASE= data set. To gain access, use the READ= and WRITE= data set options in the APPEND statement the way you would use them in any other SAS statement, which is in parentheses immediately after the data set name. When you are appending password-protected data sets, use the following guidelines:
  • If you do not give the Read password for the DATA= data set in the APPEND statement, by default the procedure looks for the Read password for the DATA= data set in the PROC DATASETS statement. However, the procedure does not look for the Write password for the BASE= data set in the PROC DATASETS statement. Therefore, you must specify the Write password for the BASE= data set in the APPEND statement.
  • If the BASE= data set is read-protected only, you must specify its Read password in the APPEND statement.

Appending to a Compressed Data Set

You can concatenate compressed SAS data sets. Either or both of the BASE= and DATA= data sets can be compressed. If the BASE= data set allows the reuse of space from deleted observations, the APPEND statement might insert the observations into the middle of the BASE= data set to make use of available space.
For information about the COMPRESS= and REUSE= data set and system options, see SAS Data Set Options: Reference and SAS System Options: Reference.

Appending to an Indexed Data Set — Fast-Append Method

Beginning with Version 7, the behavior of appending to an indexed data set changed to improve performance.
  • In Version 6, when you appended to an indexed data set, the index was updated for each added observation. Index updates tend to be random. Therefore, disk I/O could have been high.
  • Currently, SAS does not update the index until all observations are added to the data set. After the append, SAS internally sorts the observations and inserts the data into the index sequentially. The behavior reduces most of the disk I/O and results in a faster append method.
The fast-append method is used by default when the following requirements are met. Otherwise, the Version 6 method is used:
  • The BASE= data set is open for member-level locking. If CNTLLEV= is set to record, then the fast-append method is not used.
  • The BASE= data set does not contain referential integrity constraints.
  • The BASE= data set is not accessed using the Cross Environment Data Access (CEDA) facility.
  • The BASE= data set is not using a WHERE= data set option.
To display information in the SAS log about the append method that is being used, you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
Either a message appears if the fast-append method is in use, or a message or messages appear as to why the fast-append method is not in use.
The current append method initially adds observations to the BASE= data set regardless of the restrictions that are determined by the index. For example, a variable that has an index that was created with the UNIQUE option does not have its values validated for uniqueness until the index is updated. Then, if a nonunique value is detected, the offending observation is deleted from the data set. After observations are appended, some of them might subsequently be deleted.
For a simple example, consider that the BASE= data set has ten observations numbered from 1 to 10 with a UNIQUE index for the variable ID. You append a data set that contains five observations numbered from 1 to 5, and observations 3 and 4 both contain the same value for ID. The following actions occur:
  1. After the observations are appended, the BASE= data set contains 15 observations numbered from 1 to 15.
  2. SAS updates the index for ID, validates the values, and determines that observations 13 and 14 contain the same value for ID.
  3. SAS deletes one of the observations from the BASE= data set, resulting in 14 observations that are numbered from 1 to 15. For example, observation 13 is deleted. Note that you cannot predict which observation is deleted, because the internal sort might place either observation first. (In Version 6, you could predict that observation 13 would be added and observation 14 would be rejected.)
If you do not want the current behavior (which could result in deleted observations) or if you want to be able to predict which observations are appended, request the Version 6 append method by specifying the APPENDVER=V6 option:
proc datasets;
   append base=a data=b appendver=v6;
run;
Note: In Version 6, deleting the index and then re-creating it after the append could improve performance. The current method might eliminate the need to do that. However, the performance depends on the nature of your data.

Appending to Data Sets with Different Variables

If the DATA= data set contains variables that are not in the BASE= data set, use the FORCE option in the APPEND statement to force the concatenation of the two data sets. The APPEND statement drops the extra variables and issues a warning message. You can use the NOWARN option to suppress the warning message.
If the BASE= data set contains a variable that is not in the DATA= data set, the APPEND statement concatenates the data sets, but the observations from the DATA= data set have a missing value for the variable that was not present in the DATA= data set. The FORCE option is not necessary in this case.
If you use the DROP=, KEEP=, or RENAME= options on the BASE= data set, the options ONLY affect the APPEND processing and does not change the variables in the appended BASE= data set. Variables that are dropped or not kept using the DROP= and KEEP= options still exist in the appended BASE= data set. Variables that are renamed using the RENAME= option remain with their original name in the appended BASE= data set.

Appending to Data Sets That Contain Variables with Different Attributes

If a variable has different attributes in the BASE= data set than it does in the DATA= data set, the attributes in the BASE= data set prevail.
If the SAS formats in the DATA= data set are different from those in the BASE= data set, then the SAS formats in the BASE= data set are used. However, SAS does not convert the data from the DATA= data set in order to be consistent with the SAS formats in the BASE= data set. The result could be data that seems to be incorrect. A warning message is displayed in the SAS log. The following example illustrates appending data by using different SAS formats:
data format1;
   input Date date9.;
   format Date date9.;
datalines;
24sep1975
22may1952
;

data format2;
   input Date datetime20.;
   format Date datetime20.;
datalines;
25aug1952:11:23:07.4
;

proc append base=format1 data=format2;
run;
The following messages are displayed in the SAS log.
NOTE: Appending WORK.FORMAT2 to WORK.FORMAT1.
WARNING: Variable Date has format DATE9. on the BASE data set
   and format DATETIME20. on the DATA data set. DATE9. used.
NOTE: There were 1 observations read from the data set WORK.FORMAT2.
NOTE: 1 observations added.
NOTE: The data set WORK.FORMAT1 has 3 observations and 1 variables.
If the length of a variable is longer in the DATA= data set than in the BASE= data set, or if the same variable is a character variable in one data set and a numeric variable in the other, use the FORCE option. Using FORCE has the following consequences:
  • The length of the variables in the BASE= data set takes precedence. SAS truncates values from the DATA= data set to fit them into the length that is specified in the BASE= data set.
  • The type of the variables in the BASE= data set takes precedence. The APPEND statement replaces values of the wrong type (all values for the variable in the DATA= data set) with missing values.
Note: If a character variable's transcoding attribute is opposite in the BASE= and DATA= data sets (for example, one is YES and the other is NO), then a warning is issued. To determine the transcoding attributes, use the CONTENTS procedure for each data set. You set the transcoding attribute with the TRANSCODE= option in the ATTRIB statement or with the TRANSCODE= column modifier in PROC SQL.

Appending Data Sets That Contain Integrity Constraints

If the DATA= data set contains integrity constraints and the BASE= data set does not exist, the APPEND statement copies the general constraints. Note that the referential constraints are not copied. If the BASE= data set exists, the APPEND action copies only observations.

Appending with Generation Groups

You can use the GENNUM= data set option to append to a specific version in a generation group. Here are examples:
SAS Statements
Result
proc datasets;
   append base=a
      data=b(gennum=2);
Appends historical version B#002 to base A
proc datasets;
   append base=a(gennum=2)
      data=b(gennum=2);
Appends historical version B#002 to historical version A#002

Using the APPEND Procedure Instead of the APPEND Statement

The only difference between the APPEND procedure and the APPEND statement in PROC DATASETS, is the default for libref in the BASE= and DATA= arguments. For PROC APPEND, the default is either WORK or USER. For the APPEND statement, the default is the libref of the procedure input library.

System Failures

If a system failure or some other type of interruption occurs while the procedure is executing, the append operation might not be successful; it is possible that not all, perhaps none, of the observations are added to the BASE= data set. In addition, the BASE= data set might suffer damage. The APPEND operation performs an update in place, which means that it does not make a copy of the original data set before it begins to append observations. If you want to be able to restore the original observations, you can initiate an audit trail for the base data file and select to store a before-update image of the observations. Then you can write a DATA step to extract and reapply the original observations to the data file. For information about initiating an audit trail, see PROC DATASETS.