Concatenating SAS Data Sets |
Understanding the APPEND Procedure |
The APPEND procedure adds the observations from one SAS data set to the end of another SAS data set. PROC APPEND does not process the observations in the first data set. It adds the observations in the second data set directly to the end of the original data set.
The APPEND procedure has the following form:
PROC APPEND BASE=base-SAS-data-set <DATA=SAS-data-set-to-append> <FORCE>; |
names the SAS data set to which you want to append the observations. If this data set does not exist, then SAS creates it. At the completion of PROC APPEND, the value of base-SAS-data-set becomes the current (most recently created) SAS data set.
names the SAS data set that contains the observations to add to the end of the base data set. If you omit this option, then PROC APPEND adds the observations in the current SAS data set to the end of the base data set.
forces PROC APPEND to concatenate the files in some situations in which the procedure would normally fail.
Using the APPEND Procedure: The Simplest Case |
The following program appends the data set CUSTOMER_SUPPORT to the data set SALES. Both data sets contain the same variables and each variable has the same attributes in both data sets.
options pagesize=60 linesize=80 pageno=1 nodate; proc append base=sales data=customer_support; run; proc print data=sales; title 'Employees in Sales and Customer Support Departments'; run;
The following output shows the results:
Employees in Sales and Customer Support Departments 1 Employee Home Obs ID Name HireDate Salary Phone 1 429685482 Martin, Virginia 09AUG1990 34800 493-0824 2 244967839 Singleton, MaryAnn 24APR1995 27900 929-2623 3 996740216 Leighton, Maurice 16DEC1993 32600 933-6908 4 675443925 Freuler, Carl 15FEB1998 29900 493-3993 5 845729308 Cage, Merce 19OCT1992 39800 286-0519 6 324987451 Sayre, Jay 15NOV1994 44800 933-2998 7 596771321 Tolson, Andrew 18MAR1998 41200 929-4800 8 477562122 Jensen, Helga 01FEB1991 47400 286-2816 9 894724859 Kulenic, Marie 24JUN1993 41400 493-1472 10 988427431 Zweerink, Anna 07JUL1995 43700 929-3885The resulting data set is identical to the data set that was created by naming SALES and CUSTOMER_SUPPORT in the SET statement (see The Concatenated DEPT1_2 Data Set). It is important to realize that PROC APPEND permanently alters the SALES data set, which is the data set for the BASE= option. SALES now contains observations from both the Sales and the Customer Support departments.
Using the APPEND Procedure When Data Sets Contain Different Variables |
Recall that the SECURITY data set contains the variable Gender, which is not in the SALES data set, and lacks the variable HomePhone, which is present in the SALES data set. What happens if you try to use PROC APPEND to concatenate data sets that contain different variables?
If you try to append SECURITY to SALES using the following program, then the concatenation fails:
proc append base=sales data=security; run;
SAS writes the following messages to the log:
2 proc append base=sales data=security; 3 run; NOTE: Appending WORK.SECURITY to WORK.SALES. WARNING: Variable Gender was not found on BASE file. WARNING: Variable HomePhone was not found on DATA file. ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files. NOTE: 0 observations added. NOTE: The data set WORK.SALES has 5 observations and 5 variables. NOTE: Statements not processed because of errors noted above. NOTE: The SAS System stopped processing this step because of errors.
You must use the FORCE option with PROC APPEND when the DATA= data set contains a variable that is not in the BASE= data set. If you modify the program to include the FORCE option, then it successfully concatenates the files.
options pagesize=60 linesize=80 pageno=1 nodate; proc append base=sales data=security force; run; proc print data=sales; title 'Employees in the Sales and the Security Departments'; run;
The following output shows the results:
The SALES Data Set: Using FORCE with PROC APPEND
Employees in the Sales and the Security Departments 1 Employee Home Obs ID Name HireDate Salary Phone 1 429685482 Martin, Virginia 09AUG1990 34800 493-0824 2 244967839 Singleton, MaryAnn 24APR1995 27900 929-2623 3 996740216 Leighton, Maurice 16DEC1993 32600 933-6908 4 675443925 Freuler, Carl 15FEB1998 29900 493-3993 5 845729308 Cage, Merce 19OCT1992 39800 286-0519 6 744289612 Saparilas, Theresa 09MAY1998 33400 7 824904032 Brosnihan, Dylan 04JAN1992 38200 8 242779184 Chao, Daeyong 28SEP1995 37500 9 544382887 Slifkin, Leah 24JUL1994 45000 10 933476520 Perry, Marguerite 19APR1992 39900
This output illustrates two important points about using PROC APPEND to concatenate data sets with different variables:
If the BASE= data set contains a variable that is not in the DATA= data set (for example, HomePhone), then PROC APPEND concatenates the data sets and assigns a missing value to that variable in the observations that are taken from the DATA= data set.
If the DATA= data set contains a variable that is not in the BASE= data set (for example, Gender), then the FORCE option in PROC APPEND forces the procedure to concatenate the two data sets. But because that variable is not in the descriptor portion of the BASE= data set, the procedure cannot include it in the concatenated data set.
Note: In the current example, each data set contains a variable that is not in the other. It is only the case of a variable in the DATA= data set that is not in the BASE= data set that requires the use of the FORCE option. However, both cases display a warning in the log.
Using the APPEND Procedure When Variables Have Different Attributes |
When you use PROC APPEND with variables that have different attributes, the following applies:
If a variable has different attributes in the BASE= data set than it does in the DATA= data set, then the attributes in the BASE= data set prevail. In the cases of differing formats, informats, and labels, the concatenation succeeds.
If the length of a variable is longer in the BASE= data set than in the DATA= data set, then the concatenation succeeds.
If the length of a variable is longer in the DATA= data set than in the BASE= data set, or if the same variable is a character variable in one data set and a numeric variable in the other, then PROC APPEND fails to concatenate the files unless you specify the FORCE option.
Using the FORCE option has these consequences:
The length that is specified in the BASE= data set prevails. Therefore, SAS truncates values from the DATA= data set to fit them into the length that is specified in the BASE= data set.
The type that is specified in the BASE= data set prevails. The procedure replaces values of the wrong type (all values for the variable in the DATA= data set) with missing values.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.