The DATASETS Procedure |
Default: | If the BASE= data set is accessed through a SAS server and if no other user has the data set open at the time the APPEND statement begins processing, the BASE= data set defaults to CNTLLEV=MEMBER (member-level locking). When this behavior happens, no other user can update the file while the data set is processed. |
Requirement: | The BASE= data set must be a member of a SAS library that supports update processing. |
Tip: | You can specify most data set options for the BASE= argument and DATA= option. However, if you specify DROP=, KEEP=, or RENAME= data set option for the BASE= data set, the option is ignored. You can use any global statements as well. See Global Statements. |
Tip: | If a failure occurs during processing, the data set is marked as damaged and is reset to its preappend condition at the next REPAIR statement. If the data set has an index, the index is not updated with each observation but is updated once at the end. (This behavior is Version 7 and later, as long as APPENDVER=V6 is not set.) |
Featured in: | Concatenating Two SAS Data Sets |
APPEND BASE=<libref.>SAS-data-set
<APPENDVER=V6> <DATA=<libref.>SAS-data-set> <FORCE> <GETSORT> <NOWARN>; |
Required Arguments |
names the data set to which you want to add observations.
specifies the library that contains the SAS data set. If you omit the libref, the default is the libref for the procedure input library. If you are using PROC APPEND, the default for libref is either WORK or USER.
names a SAS data set. If the APPEND statement cannot find an existing data set with this name, it creates a new data set in the library. That is, you can use the APPEND statement to create a data set by specifying a new data set name in the BASE= argument.
Whether you are creating a new data set or appending to an existing data set, the BASE= data set is the current SAS data set after all append operations.
Alias: | OUT= |
Featured in: | Concatenating Two SAS Data Sets |
Options |
uses the Version 6 behavior for appending observations to the BASE= data set, which is to append one observation at a time. Beginning in Version 7, to improve performance, the default behavior changed so that all observations are appended after the data set is processed.
See also: | Appending to an Indexed Data Set -- Fast-Append Method |
names the SAS data set containing observations that you want to append to the end of the SAS data set specified in the BASE= argument.
specifies the library that contains the SAS data set. If you omit libref, the default is the libref for the procedure input library. The DATA= data set can be from any SAS library. You must use the two-level name if the data set resides in a library other than the procedure input library.
names a SAS data set. If the APPEND statement cannot find an existing data set with this name, it stops processing.
Alias: | NEW= |
Default: | the most recently created SAS data set, from any SAS library |
See also: | Appending with Generation Groups |
Featured in: | Concatenating Two SAS Data Sets |
forces the APPEND statement to concatenate data sets when the DATA= data set contains variables that either
are not in the BASE= data set
do not have the same type as the variables in the BASE= data set
are longer than the variables in the BASE= data set.
See also: |
|
||||
Featured in: | Concatenating Two SAS Data Sets |
You can use the GENNUM= data set option to append to or from a specific version in a generation group. Here are some examples:
/* appends historical version to base A */ proc datasets; append base=a data=a (gennum=2); /* appends current version of A to historical version */ proc datasets; append base=a (gennum=1) data=a;
copies the sort indicator from the DATA= data set to the BASE= data set. The sort indicator is established by either a PROC SORT or an ORDERBY clause in PROC SQL if the following criteria are met:
The BASE= data set must:
be SAS Version 7 or higher
contain no observations
accept sort indicators
The DATA= data set must:
contain a sort indicator established by PROC SORT
be the same data representation as the BASE= data set
Restrictions: |
The GETSORT option has no effect
on the data sets if one of the following criteria applies:
|
Featured in: | Getting Sort Indicator Information |
suppresses the warning when used with the FORCE option to concatenate two data sets with different variables.
Appending Sorted Data Sets |
You can append sorted data sets and maintain the sort using the following guidelines:
The DATA= data set and the BASE= data set contain sort indicators from the SORT procedure.
The DATA= data set and the BASE= data set are sorted using the same variables.
The observations added from the DATA= data set do not violate the sort order of the BASE= data set.
Using the Block I/O Method to Append |
The block I/O method is used to append blocks of data instead of one observation at a time. This method increases performance when you are appending large data sets. SAS determines whether to use the block I/O method. Not all data sets can use the block I/O method. There are restrictions set by the APPEND statement and the Base SAS engine.
To display information in the SAS log about the append method that is being used, you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
The following message is written to the SAS log, if the block I/O method is not used:
INFO: Data set block I/O cannot be used because:
If the APPEND statement determines that the block I/O will not be used, one of the following explanations is written to the SAS log:
INFO: - The data sets use different engines, have different variables or have attributes that might differ.
INFO: - There is a WHERE clause present.
INFO: - There is no member level locking.
INFO: - The OBS option is active.
INFO: - The FIRSTOBS option is active.
If the Base SAS engine determines that the block I/O method will not be used, one of the following explanations is written to the SAS log:
INFO: - Referential Integrity Constraints exist.
INFO: - Cross Environment Data Access is being used.
INFO: - The file is compressed.
INFO: - The file has an audit file which is not suspended.
Restricting the Observations That Are Appended |
You can use the WHERE= data set option with the DATA= data set in order to restrict the observations that are appended. Likewise, you can use the WHERE statement in order to restrict the observations from the DATA= data set. The WHERE statement has no effect on the BASE= data set. If you use the WHERE= data set option with the BASE= data set, WHERE= has no effect.
If there is a WHERE statement in the BASE= data set, it takes effect only if the WHEREUP= option is set to YES.
If there is a WHERE statement in the non-existent BASE= data set, regardless of the WHEREUP option setting, you use the WHERE statement.
Note: You cannot append a data set to itself by using the WHERE= data set option.
Choosing between the SET Statement and the APPEND Statement |
If you use the SET statement in a DATA step to concatenate two data sets, SAS must process all the observations in both data sets to create a new one. The APPEND statement bypasses the processing of data in the original data set and adds new observations directly to the end of the original data set. Using the APPEND statement can be more efficient than using a SET statement if any of the following list occurs:
the BASE= data set is large.
all variables in the BASE= data set have the same length and type as the variables in the DATA= data set and if all variables exist in both data sets.
Note: You can use the CONTENTS statement to see the variable lengths and types.
The APPEND statement is especially useful if you frequently add observations to a SAS data set (for example, in production programs that are constantly appending data to a journal-type data set).
Appending Password-Protected SAS Data Sets |
In order to use the APPEND statement, you need read access to the DATA= data set and write access to the BASE= data set. To gain access, use the READ= and WRITE= data set options in the APPEND statement the way you would use them in any other SAS statement, which is in parentheses immediately after the data set name. When you are appending password-protected data sets, use the following guidelines:
If you do not give the read password for the DATA= data set in the APPEND statement, by default the procedure looks for the read password for the DATA= data set in the PROC DATASETS statement. However, the procedure does not look for the write password for the BASE= data set in the PROC DATASETS statement. Therefore, you must specify the write password for the BASE= data set in the APPEND statement.
If the BASE= data set is read-protected only, you must specify its read password in the APPEND statement.
Appending to a Compressed Data Set |
You can concatenate compressed SAS data sets. Either or both of the BASE= and DATA= data sets can be compressed. If the BASE= data set allows the reuse of space from deleted observations, the APPEND statement might insert the observations into the middle of the BASE= data set to make use of available space.
For information on the COMPRESS= and REUSE= data set and system options, see SAS Language Reference: Dictionary.
Appending to an Indexed Data Set -- Fast-Append Method |
Beginning with Version 7, the behavior of appending to an indexed data set changed to improve performance.
In Version 6, when you appended to an indexed data set, the index was updated for each added observation. Index updates tend to be random; therefore, disk I/O could have been high.
Currently, SAS does not update the index until all observations are added to the data set. After the append, SAS internally sorts the observations and inserts the data into the index sequentially. The behavior reduces most of the disk I/O and results in a faster append method.
The fast-append method is used by default when the following requirements are met; otherwise, the Version 6 method is used:
The BASE= data set is open for member-level locking. If CNTLLEV= is set to record, then the fast-append method is not used.
The BASE= data set does not contain referential integrity constraints.
The BASE= data set is not accessed using the Cross Environment Data Access (CEDA) facility.
The BASE= data set is not using a WHERE= data set option.
To display information in the SAS log about the append method that is being used, you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
Either a message displays if the fast-append method is in use or a message or messages display as to why the fast-append method is not in use.
The current append method initially adds observations to the BASE= data set regardless of the restrictions that are determined by the index. For example, a variable that has an index that was created with the UNIQUE option does not have its values validated for uniqueness until the index is updated. Then, if a nonunique value is detected, the offending observation is deleted from the data set. After observations are appended, some of them might subsequently be deleted.
For a simple example, consider that the BASE= data set has ten observations numbered from 1 to 10 with a UNIQUE index for the variable ID. You append a data set that contains five observations numbered from 1 to 5, and observations 3 and 4 both contain the same value for ID. The following actions occur:
After the observations are appended, the BASE= data set contains 15 observations numbered from 1 to 15.
SAS updates the index for ID, validates the values, and determines that observations 13 and 14 contain the same value for ID.
SAS deletes one of the observations from the BASE= data set, resulting in 14 observations that are numbered from 1 to 15. For example, observation 13 is deleted. Note that you cannot predict which observation is deleted, because the internal sort might place either observation first. (In Version 6, you could predict that observation 13 would be added and observation 14 would be rejected.)
If you do not want the current behavior (which could result in deleted observations) or if you want to be able to predict which observations are appended, request the Version 6 append method by specifying the APPENDVER=V6 option:
proc datasets; append base=a data=b appendver=v6; run;
Note: In Version 6, deleting the index and then recreating it after the append could improve performance. The current method might eliminate the need to do that. However, the performance depends on the nature of your data.
Appending to Data Sets with Different Variables |
If the DATA= data set contains variables that are not in the BASE= data set, use the FORCE option in the APPEND statement to force the concatenation of the two data sets. The APPEND statement drops the extra variables and issues a warning message. You can use the NOWARN option to suppress the warning message.
If the BASE= data set contains a variable that is not in the DATA= data set, the APPEND statement concatenates the data sets, but the observations from the DATA= data set have a missing value for the variable that was not present in the DATA= data set. The FORCE option is not necessary in this case.
If you use the DROP=, KEEP=, or RENAME= options on the BASE= data set, the options ONLY affect the APPEND processing and does not change the variables in the appended BASE= data set. Variables that are dropped or not kept using the DROP= and KEEP= options still exist in the appended BASE= data set. Variables that are renamed using the RENAME= option remain with their original name in the appended BASE= data set.
Appending to Data Sets That Contain Variables with Different Attributes |
If a variable has different attributes in the BASE= data set than it does in the DATA= data set, the attributes in the BASE= data set prevail.
If the SAS formats in the DATA= data set are different from those in the BASE= data set, then the SAS formats in the BASE= data set are used. However, SAS does not convert the data from the DATA= data set in order to be consistent with the SAS formats in the BASE= data set. The result could be data that seems to be incorrect. A warning message is displayed in the SAS log. The following example illustrates appending data by using different SAS formats:
data format1; input Date date9.; format Date date9.; datalines; 24sep1975 22may1952 ; data format2; input Date datetime20.; format Date datetime20.; datalines; 25aug1952:11:23:07.4 ; proc append base=format1 data=format2; run;
The following messages are displayed in the SAS log.
NOTE: Appending WORK.FORMAT2 to WORK.FORMAT1. WARNING: Variable Date has format DATE9. on the BASE data set and format DATETIME20. on the DATA data set. DATE9. used. NOTE: There were 1 observations read from the data set WORK.FORMAT2. NOTE: 1 observations added. NOTE: The data set WORK.FORMAT1 has 3 observations and 1 variables.
If the length of a variable is longer in the DATA= data set than in the BASE= data set, or if the same variable is a character variable in one data set and a numeric variable in the other, use the FORCE option. Using FORCE has the following consequences:
The length of the variables in the BASE= data set takes precedence. SAS truncates values from the DATA= data set to fit them into the length that is specified in the BASE= data set.
The type of the variables in the BASE= data set takes precedence. The APPEND statement replaces values of the wrong type (all values for the variable in the DATA= data set) with missing values.
Note: If a character variable's transcoding attribute is opposite in the BASE= and DATA= data sets (for example, one is YES and the other is NO), then a warning is issued. To determine the transcoding attributes, use the CONTENTS procedure for each data set. You set the transcoding attribute with the TRANSCODE= option in the ATTRIB statement or with the TRANSCODE= column modifier in PROC SQL.
Appending Data Sets That Contain Integrity Constraints |
If the DATA= data set contains integrity constraints and the BASE= data set does not exist, the APPEND statement copies the general constraints. Note that the referential constraints are not copied. If the BASE= data set exists, the APPEND action copies only observations.
Appending with Generation Groups |
You can use the GENNUM= data set option to append to a specific version in a generation group. Here are examples:
Using the APPEND Procedure Instead of the APPEND Statement |
The only difference between the APPEND procedure and the APPEND statement in PROC DATASETS, is the default for libref in the BASE= and DATA= arguments. For PROC APPEND, the default is either WORK or USER. For the APPEND statement, the default is the libref of the procedure input library.
System Failures |
If a system failure or some other type of interruption occurs while the procedure is executing, the append operation might not be successful; it is possible that not all, perhaps none, of the observations are added to the BASE= data set. In addition, the BASE= data set might suffer damage. The APPEND operation performs an update in place, which means that it does not make a copy of the original data set before it begins to append observations. If you want to be able to restore the original observations, you can initiate an audit trail for the base data file and select to store a before-update image of the observations. Then you can write a DATA step to extract and reapply the original observations to the data file. For information about initiating an audit trail, see the PROC DATASETS AUDIT Statement.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.