Concatenating SAS Data Sets |
If two data sets contain the same variables and the variables possess the same attributes, then the file that results from concatenating them with the SET statement is the same as the file that results from concatenating them with the APPEND procedure. The APPEND procedure concatenates much faster than the SET statement, particularly when the BASE= data set is large, because the APPEND procedure does not process the observations from the BASE= data set. However, the two methods of concatenating are sufficiently different when the variables or their attributes differ between data sets. In this case, you must consider the differences in behavior before you decide which method to use.
The following table summarizes the major differences between using the SET statement and using the APPEND procedure to concatenate files.
Criterion | SET statement | APPEND procedure |
---|---|---|
Number of data sets that you can concatenate | Uses any number of data sets. | Uses two data sets. |
Handling of data sets that contain different variables | Uses all variables and assigns missing values where appropriate. | Uses all variables in the BASE= data set and assigns missing values to observations from the DATA= data set where appropriate. Requires the FORCE option to concatenate data sets if the DATA= data set contains variables that are not in the BASE= data set. Cannot include variables found only in the DATA= data set when concatenating the data sets. |
Handling of different formats, informats, or labels | Uses explicitly defined formats, informats, and labels rather than defaults. If two or more data sets explicitly define the format, informat, or label, then SAS uses the definition from the data set you name first in the SET statement. | Uses formats, informats, and labels from the BASE= data set. |
Handling of different variable lengths | If the same variable has a different length in two or more data sets, then SAS uses the length from the data set you name first in the SET statement. | Requires the FORCE option if the length of a variable is longer in the DATA= data set. Truncates the values of the variable to match the length in the BASE= data set. |
Handling of different variable types | Does not concatenate the data sets. | Requires the FORCE option to concatenate data sets. Uses the type attribute from the BASE= data set and assigns missing values to the variable in observations from the DATA= data set. |
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.