Previous Page | Next Page

Concatenating SAS Data Sets

Concatenating Data Sets with the SET Statement


Understanding the SET Statement

The SET statement reads observations from one or more SAS data sets and uses them to build a new data set.

The SET statement for concatenating data sets has the following form:

SET SAS-data-set(s);

where

SAS-data-set

is two or more SAS data sets to concatenate. The observations from the first data set that you name in the SET statement appear first in the new data set. The observations from the second data set follow those from the first data set, and so on. The list can contain any number of data sets.


Using the SET Statement: The Simplest Case

In the simplest situation, the data sets that you concatenate contain the same variables (variables with the same name). In addition, the type, length, informat, format, and label of each variable match across all data sets. In this case, SAS copies all observations from the first data set into the new data set, then copies all observations from the second data set into the new data set, and so on. Each observation is an exact copy of the original.

In the following example, a company that uses SAS to maintain personnel records for six separate departments decided to combine all personnel records. Two departments, Sales and Customer Support, store their data in the same form. Each observation in both data sets contains values for these variables:

EmployeeID

is a character variable that contains the employee's identification number.

Name

is a character variable that contains the employee's name in the form last name, comma, first name.

HireDate

is a numeric variable that contains the date the employee was hired. This variable has a format of DATE9.

Salary

is a numeric variable that contains the employee's annual salary in US dollars.

HomePhone

is a character variable that contains the employee's home telephone number.

The following program creates the SAS data sets SALES and CUSTOMER_SUPPORT:

options pagesize=60 linesize=80 pageno=1 nodate;

data sales;
   input EmployeeID $ 1-9 Name $ 11-29 @30 HireDate date9.
         Salary HomePhone $;
   format HireDate date9.;
   datalines;
429685482 Martin, Virginia   09aug1990 34800 493-0824
244967839 Singleton, MaryAnn 24apr1995 27900 929-2623
996740216 Leighton, Maurice  16dec1993 32600 933-6908
675443925 Freuler, Carl      15feb1998 29900 493-3993
845729308 Cage, Merce        19oct1992 39800 286-0519
;

proc print data=sales;
   title 'Sales Department Employees';
run;

data customer_support;
   input EmployeeID $ 1-9 Name $ 11-29 @30 HireDate date9.
         Salary HomePhone $;
   format HireDate date9.;
   datalines;
324987451 Sayre, Jay         15nov1994 44800 933-2998
596771321 Tolson, Andrew     18mar1998 41200 929-4800
477562122 Jensen, Helga      01feb1991 47400 286-2816
894724859 Kulenic, Marie     24jun1993 41400 493-1472
988427431 Zweerink, Anna     07jul1995 43700 929-3885
;

proc print data=customer_support;
   title 'Customer Support Department Employees';
run;

The following output shows the results of both DATA steps:

The SALES and the CUSTOMER_SUPPORT Data Sets

                           Sales Department Employees                          1

          Employee                                                    Home
   Obs       ID        Name                   HireDate    Salary     Phone

    1     429685482    Martin, Virginia      09AUG1990     34800    493-0824
    2     244967839    Singleton, MaryAnn    24APR1995     27900    929-2623
    3     996740216    Leighton, Maurice     16DEC1993     32600    933-6908
    4     675443925    Freuler, Carl         15FEB1998     29900    493-3993
    5     845729308    Cage, Merce           19OCT1992     39800    286-0519
                     Customer Support Department Employees                     2

            Employee                                                Home
     Obs       ID             Name          HireDate    Salary     Phone

      1     324987451    Sayre, Jay        15NOV1994     44800    933-2998
      2     596771321    Tolson, Andrew    18MAR1998     41200    929-4800
      3     477562122    Jensen, Helga     01FEB1991     47400    286-2816
      4     894724859    Kulenic, Marie    24JUN1993     41400    493-1472
      5     988427431    Zweerink, Anna    07JUL1995     43700    929-3885

To concatenate the two data sets, list them in the SET statement. Use the PRINT procedure to display the resulting DEPT1_2 data set.

options pagesize=60 linesize=80 pageno=1 nodate;

data dept1_2;
   set sales customer_support;
run;

proc print data=dept1_2;
   title 'Employees in Sales and Customer Support Departments';
run;

The following output shows the new DEPT1_2 data set. The data set contains all observations from SALES followed by all observations from CUSTOMER_SUPPORT:

The Concatenated DEPT1_2 Data Set

              Employees in Sales and Customer Support Departments              1

          Employee                                                    Home
   Obs       ID        Name                   HireDate    Salary     Phone

     1    429685482    Martin, Virginia      09AUG1990     34800    493-0824
     2    244967839    Singleton, MaryAnn    24APR1995     27900    929-2623
     3    996740216    Leighton, Maurice     16DEC1993     32600    933-6908
     4    675443925    Freuler, Carl         15FEB1998     29900    493-3993
     5    845729308    Cage, Merce           19OCT1992     39800    286-0519
     6    324987451    Sayre, Jay            15NOV1994     44800    933-2998
     7    596771321    Tolson, Andrew        18MAR1998     41200    929-4800
     8    477562122    Jensen, Helga         01FEB1991     47400    286-2816
     9    894724859    Kulenic, Marie        24JUN1993     41400    493-1472
    10    988427431    Zweerink, Anna        07JUL1995     43700    929-3885

Using the SET Statement When Data Sets Contain Different Variables

The two data sets in the previous example contain the same variables, and each variable is defined the same way in both data sets. However, you might want to concatenate data sets when not all variables are common to the data sets that are named in the SET statement. In this case, each observation in the new data set includes all variables from the SAS data sets that are named in the SET statement.

The examples in this section show the SECURITY data set, and the concatenation of this data set to the SALES and the CUSTOMER_SUPPORT data sets. Not all variables are common to the three data sets. The personnel records for the Security department do not include the variable HomePhone, and do include the new variable Gender, which does not appear in the SALES or the CUSTOMER_SUPPORT data sets.

The following program creates the SECURITY data set:

options pagesize=60 linesize=80 pageno=1 nodate;

data security;
   input EmployeeID $ 1-9 Name $ 11-29 Gender $ 30
         @32 HireDate date9. Salary;
   format HireDate date9.;
   datalines;
744289612 Saparilas, Theresa F 09may1998 33400
824904032 Brosnihan, Dylan   M 04jan1992 38200
242779184 Chao, Daeyong      M 28sep1995 37500
544382887 Slifkin, Leah      F 24jul1994 45000
933476520 Perry, Marguerite  F 19apr1992 39900
;

proc print data=security;
   title 'Security Department Employees';
run;

The following output shows the results:

The SECURITY Data Set

                         Security Department Employees                         1

           Employee
    Obs       ID               Name           Gender     HireDate    Salary

     1     744289612    Saparilas, Theresa      F       09MAY1998     33400
     2     824904032    Brosnihan, Dylan        M       04JAN1992     38200
     3     242779184    Chao, Daeyong           M       28SEP1995     37500
     4     544382887    Slifkin, Leah           F       24JUL1994     45000
     5     933476520    Perry, Marguerite       F       19APR1992     39900

The following program concatenates the SALES, CUSTOMER_SUPPORT, and SECURITY data sets, and creates the new data set, DEPT1_3:

options pagesize=60 linesize=80 pageno=1 nodate;

data dept1_3;
   set sales customer_support security;
run;

proc print data=dept1_3;
   title 'Employees in Sales, Customer Support,';
   title2 'and Security Departments';
run;

The following output shows the results:

The Concatenated DEPT1_3 Data Set

                     Employees in Sales, Customer Support,                     1
                            and Security Departments

       Employee                                                Home
 Obs      ID       Name                  HireDate   Salary    Phone     Gender

   1   429685482   Martin, Virginia     09AUG1990    34800   493-0824         
   2   244967839   Singleton, MaryAnn   24APR1995    27900   929-2623         
   3   996740216   Leighton, Maurice    16DEC1993    32600   933-6908         
   4   675443925   Freuler, Carl        15FEB1998    29900   493-3993         
   5   845729308   Cage, Merce          19OCT1992    39800   286-0519         
   6   324987451   Sayre, Jay           15NOV1994    44800   933-2998         
   7   596771321   Tolson, Andrew       18MAR1998    41200   929-4800         
   8   477562122   Jensen, Helga        01FEB1991    47400   286-2816         
   9   894724859   Kulenic, Marie       24JUN1993    41400   493-1472         
  10   988427431   Zweerink, Anna       07JUL1995    43700   929-3885         
  11   744289612   Saparilas, Theresa   09MAY1998    33400                F   
  12   824904032   Brosnihan, Dylan     04JAN1992    38200                M   
  13   242779184   Chao, Daeyong        28SEP1995    37500                M   
  14   544382887   Slifkin, Leah        24JUL1994    45000                F   
  15   933476520   Perry, Marguerite    19APR1992    39900                F   

All observations in the data set DEPT1_3 have values for both the variable Gender and the variable HomePhone. Observations from data sets SALES and CUSTOMER_SUPPORT, the data sets that do not contain the variable Gender, have missing values for Gender (indicated by blanks under the variable name). Observations from SECURITY, the data set that does not contain the variable HomePhone, have missing values for HomePhone (indicated by blanks under the variable name).


Using the SET Statement When Variables Have Different Attributes


Understanding Attributes

Each variable in a SAS data set can have as many as six attributes that are associated with it. These attributes are

name

identifies a variable. That is, when SAS looks at two or more data sets, it considers variables with the same name to be the same variable.

type

identifies a variable as character or numeric.

length

refers to the number of bytes that SAS uses to store each of the variable's values in a SAS data set. Length is an especially important consideration when you use character variables, because the default length of character variables is eight bytes. If your data values are greater than eight bytes, then you can use a LENGTH statement to specify the number of bytes of storage that you need so that your data is not truncated.

informat

refers to the instructions that SAS uses when reading data values. These instructions specify the form of an input value.

format

refers to the instructions that SAS uses when writing data values. These instructions specify the form of an output value.

label

refers to descriptive text that is associated with a specific variable.

If the data sets that you name in the SET statement contain variables with the same names and types, then you can concatenate the data sets without modification. However, if variable types differ, then you must modify one or more data sets before concatenating them. When lengths, formats, informats, or labels differ, you might want to modify one or more data sets before proceeding.


Using the SET Statement When Variables Have Different Types

If a variable is defined as a character variable in one data set that is named in the SET statement, and as a numeric variable in another, then SAS issues an error message and does not concatenate the data sets.

In the following example, the Accounting department in the company treats the employee identification number (EmployeeID) as a numeric variable, whereas all other departments treat it as a character variable.

The following program creates the ACCOUNTING data set:

options pagesize=60 linesize=80 pageno=1 nodate;

data accounting;
   input EmployeeID 1-9 Name $ 11-29 Gender $ 30
         @32 HireDate date9. Salary;
   format HireDate date9.;
   datalines;
634875680 Gardinski, Barbara F 29may1998 49800
824576630 Robertson, Hannah  F 14mar1995 52700
744826703 Gresham, Jean      F 28apr1992 54000
824447605 Kruize, Ronald     M 23may1994 49200
988674342 Linzer, Fritz      M 23jul1992 50400
;

proc print data=accounting;
   title 'Accounting Department Employees';
run;

The following output shows the results:

The ACCOUNTING Data Set

                        Accounting Department Employees                        1

            Employee
    Obs        ID              Name           Gender     HireDate    Salary

     1     634875680    Gardinski, Barbara      F       29MAY1998     49800
     2     824576630    Robertson, Hannah       F       14MAR1995     52700
     3     744826703    Gresham, Jean           F       28APR1992     54000
     4     824447605    Kruize, Ronald          M       23MAY1994     49200
     5     988674342    Linzer, Fritz           M       23JUL1992     50400

The following program attempts to concatenate the data sets for all four departments:

data dept1_4;
   set sales customer_support security accounting;
run;

The program fails because of the difference in variable type among the four departments, and SAS writes the following error message to the log:

ERROR: Variable EmployeeID has been defined as both character 
       and numeric.


Changing the Type of a Variable

One way to correct the error in the previous example is to change the type of the variable EmployeeID in ACCOUNTING from numeric to character. Because performing calculations on employee identification numbers is unlikely, EmployeeID can be a character variable.

To change the type of the variable EmployeeID, you can

The following program uses the PUT function and data set options to change the variable type of EmployeeID from numeric to character:

options pagesize=60 linesize=80 pageno=1 nodate;

data new_accounting (rename=(TempVar=EmployeeID)drop=EmployeeID); 1 
   set accounting; 2  
   TempVar=put(EmployeeID, 9.); 3 
run;

proc datasets library=work; 4 
   contents data=new_accounting;
run;

The following list corresponds to the numbered items in the preceding program:

[1] The RENAME= data set option renames the variable TempVar to EmployeeID when SAS writes an observation to the output data set. The DROP= data set option is applied before the RENAME= option. The result is a change in the variable type for EmployeeID from numeric to character.

Note:   Although this example creates a new data set called NEW_ACCOUNTING, you can create a data set that has the same name as the data set that is listed on the SET statement. If you do this, then the type attribute for EmployeeID will be permanently altered in the ACCOUNTING data set.   [cautionend]

[2] The SET statement reads observations from the ACCOUNTING data set.

[3] The PUT function converts a numeric value to a character value, and applies a format to the variable EmployeeID. The assignment statement assigns the result of the PUT function to the variable TempVar.

[4] The DATASETS procedure enables you to verify the new attribute type for EmployeeID.

The following output shows a partial listing from PROC DATASETS:

PROC DATASETS Output for the NEW_ACCOUNTING Data Set

             -----Alphabetic List of Variables and Attributes-----
 
                #    Variable      Type    Len    Pos    Format
                -----------------------------------------------
                5    EmployeeID    Char      9     36          
                2    Gender        Char      1     35          
                3    HireDate      Num       8      0    DATE9.
                1    Name          Char     19     16          
                4    Salary        Num       8      8          

Now that the types of all variables match, you can easily concatenate all four data sets using the following program:

options pagesize=60 linesize=80 pageno=1 nodate;

data dept1_4;
   set sales customer_support security new_accounting;
run;

proc print data=dept1_4;
   title 'Employees in Sales, Customer Support, Security,';
   title2 'and Accounting Departments';
run;

The following output shows the results:

The Concatenated DEPT1_4 Data Set

                Employees in Sales, Customer Support, Security,                1
                           and Accounting Departments

       Employee                                                Home
 Obs      ID       Name                  HireDate   Salary    Phone     Gender

   1   429685482   Martin, Virginia     09AUG1990    34800   493-0824         
   2   244967839   Singleton, MaryAnn   24APR1995    27900   929-2623         
   3   996740216   Leighton, Maurice    16DEC1993    32600   933-6908         
   4   675443925   Freuler, Carl        15FEB1998    29900   493-3993         
   5   845729308   Cage, Merce          19OCT1992    39800   286-0519         
   6   324987451   Sayre, Jay           15NOV1994    44800   933-2998         
   7   596771321   Tolson, Andrew       18MAR1998    41200   929-4800         
   8   477562122   Jensen, Helga        01FEB1991    47400   286-2816         
   9   894724859   Kulenic, Marie       24JUN1993    41400   493-1472         
  10   988427431   Zweerink, Anna       07JUL1995    43700   929-3885         
  11   744289612   Saparilas, Theresa   09MAY1998    33400                F   
  12   824904032   Brosnihan, Dylan     04JAN1992    38200                M   
  13   242779184   Chao, Daeyong        28SEP1995    37500                M   
  14   544382887   Slifkin, Leah        24JUL1994    45000                F   
  15   933476520   Perry, Marguerite    19APR1992    39900                F   
  16   634875680   Gardinski, Barbara   29MAY1998    49800                F   
  17   824576630   Robertson, Hannah    14MAR1995    52700                F   
  18   744826703   Gresham, Jean        28APR1992    54000                F   
  19   824447605   Kruize, Ronald       23MAY1994    49200                M   
  20   988674342   Linzer, Fritz        23JUL1992    50400                M   

Using the SET Statement When Variables Have Different Formats, Informats, or Labels

When you concatenate data sets with the SET statement, the following rules determine which formats, informats, and labels are associated with variables in the new data set.

Returning to the examples, you may have noticed that the DATA steps that created the SALES, CUSTOMER_SUPPORT, SECURITY, and ACCOUNTING data sets use a FORMAT statement to explicitly assign a format of DATE9. to the variable HireDate. Therefore, although HireDate is a numeric variable, it appears in all displays as DDMMMYYYY (for example, 13DEC2000). The SHIPPING data set that is created in the following example, however, uses a format of DATE7. for HireDate. The DATE7. format displays as DDMMMYY (for example, 13DEC00).

In addition, the SALES, CUSTOMER_SUPPORT, SECURITY, and ACCOUNTING data sets contain a default format for Salary, whereas the SHIPPING data set contains an explicitly defined format, COMMA6., for the same variable. The COMMA6. format inserts a comma in the appropriate place when SAS displays the numeric variable Salary.

The following program creates the data set for the Shipping department:

options pagesize=60 linesize=80 pageno=1 nodate;

data shipping;
   input employeeID $ 1-9 Name $ 11-29 Gender $ 30
         @32 HireDate date9.
         @42 Salary;
   format HireDate date7.
          Salary comma6.;
   datalines;
688774609 Carlton, Susan     F 28jan1995 29200
922448328 Hoffmann, Gerald   M 12oct1997 27600
544909752 DePuis, David      M 23aug1994 32900
745609821 Hahn, Kenneth      M 23aug1994 33300
634774295 Landau, Jennifer   F 30apr1996 32900
;

proc print data=shipping;
   title 'Shipping Department Employees';
run;

The following output shows the results:

The SHIPPING Data Set

                         Shipping Department Employees                         1

             employee                                      Hire
      Obs       ID              Name          Gender       Date    Salary

       1     688774609    Carlton, Susan        F       28JAN95    29,200
       2     922448328    Hoffmann, Gerald      M       12OCT97    27,600
       3     544909752    DePuis, David         M       23AUG94    32,900
       4     745609821    Hahn, Kenneth         M       23AUG94    33,300
       5     634774295    Landau, Jennifer      F       30APR96    32,900

Now consider what happens when you concatenate SHIPPING with the previous four data sets.

options pagesize=60 linesize=80 pageno=1 nodate;

data dept1_5;
   set sales customer_support security new_accounting shipping;
run;

proc print data=dept1_5;
   title 'Employees in Sales, Customer Support, Security,';
   title2 'Accounting, and Shipping Departments';
run;

The following output shows the results:

The DEPT1_5 Data Set: Concatenation of Five Data Sets

                Employees in Sales, Customer Support, Security,                1
                      Accounting, and Shipping Departments

       Employee                                                Home
 Obs      ID       Name                  HireDate   Salary    Phone     Gender

   1   429685482   Martin, Virginia     09AUG1990   34,800   493-0824         
   2   244967839   Singleton, MaryAnn   24APR1995   27,900   929-2623         
   3   996740216   Leighton, Maurice    16DEC1993   32,600   933-6908         
   4   675443925   Freuler, Carl        15FEB1998   29,900   493-3993         
   5   845729308   Cage, Merce          19OCT1992   39,800   286-0519         
   6   324987451   Sayre, Jay           15NOV1994   44,800   933-2998         
   7   596771321   Tolson, Andrew       18MAR1998   41,200   929-4800         
   8   477562122   Jensen, Helga        01FEB1991   47,400   286-2816         
   9   894724859   Kulenic, Marie       24JUN1993   41,400   493-1472         
  10   988427431   Zweerink, Anna       07JUL1995   43,700   929-3885         
  11   744289612   Saparilas, Theresa   09MAY1998   33,400                F   
  12   824904032   Brosnihan, Dylan     04JAN1992   38,200                M   
  13   242779184   Chao, Daeyong        28SEP1995   37,500                M   
  14   544382887   Slifkin, Leah        24JUL1994   45,000                F   
  15   933476520   Perry, Marguerite    19APR1992   39,900                F   
  16   634875680   Gardinski, Barbara   29MAY1998   49,800                F   
  17   824576630   Robertson, Hannah    14MAR1995   52,700                F   
  18   744826703   Gresham, Jean        28APR1992   54,000                F   
  19   824447605   Kruize, Ronald       23MAY1994   49,200                M   
  20   988674342   Linzer, Fritz        23JUL1992   50,400                M   
  21   688774609   Carlton, Susan       28JAN1995   29,200                F   
  22   922448328   Hoffmann, Gerald     12OCT1997   27,600                M   
  23   544909752   DePuis, David        23AUG1994   32,900                M   
  24   745609821   Hahn, Kenneth        23AUG1994   33,300                M   
  25   634774295   Landau, Jennifer     30APR1996   32,900                F   

In this concatenation, the input data sets contain the variable HireDate, which was explicitly defined using two different formats. The data sets also contain the variable Salary, which has both a default and an explicit format. You can see from the output that SAS creates the new data set according to the rules mentioned earlier:

Notice the difference if you perform a similar concatenation but reverse the order of the data sets in the SET statement.

options pagesize=60 linesize=80 pageno=1 nodate;

data dept5_1;
   set shipping new_accounting security customer_support sales;
run;

proc print data=dept5_1;
   title 'Employees in Shipping, Accounting, Security,';
   title2 'Customer Support, and Sales Departments';
run;

The following output shows the results:

The DEPT5_1 Data Set: Changing the Order of Concatenation

                  Employees in Shipping, Accounting, Security,                 1
                    Customer Support, and Sales Departments

        employee                                     Hire              Home
  Obs      ID       Name                 Gender      Date   Salary    Phone

    1   688774609   Carlton, Susan         F      28JAN95   29,200           
    2   922448328   Hoffmann, Gerald       M      12OCT97   27,600           
    3   544909752   DePuis, David          M      23AUG94   32,900           
    4   745609821   Hahn, Kenneth          M      23AUG94   33,300           
    5   634774295   Landau, Jennifer       F      30APR96   32,900           
    6   634875680   Gardinski, Barbara     F      29MAY98   49,800           
    7   824576630   Robertson, Hannah      F      14MAR95   52,700           
    8   744826703   Gresham, Jean          F      28APR92   54,000           
    9   824447605   Kruize, Ronald         M      23MAY94   49,200           
   10   988674342   Linzer, Fritz          M      23JUL92   50,400           
   11   744289612   Saparilas, Theresa     F      09MAY98   33,400           
   12   824904032   Brosnihan, Dylan       M      04JAN92   38,200           
   13   242779184   Chao, Daeyong          M      28SEP95   37,500           
   14   544382887   Slifkin, Leah          F      24JUL94   45,000           
   15   933476520   Perry, Marguerite      F      19APR92   39,900           
   16   324987451   Sayre, Jay                    15NOV94   44,800   933-2998
   17   596771321   Tolson, Andrew                18MAR98   41,200   929-4800
   18   477562122   Jensen, Helga                 01FEB91   47,400   286-2816
   19   894724859   Kulenic, Marie                24JUN93   41,400   493-1472
   20   988427431   Zweerink, Anna                07JUL95   43,700   929-3885
   21   429685482   Martin, Virginia              09AUG90   34,800   493-0824
   22   244967839   Singleton, MaryAnn            24APR95   27,900   929-2623
   23   996740216   Leighton, Maurice             16DEC93   32,600   933-6908
   24   675443925   Freuler, Carl                 15FEB98   29,900   493-3993
   25   845729308   Cage, Merce                   19OCT92   39,800   286-0519
Compared with the output in The DEPT1_5 Data Set: Concatenation of Five Data Sets, this example shows that not only does the order of the observations change, but in the case of HireDate, the DATE7. format specified in SHIPPING now prevails because that data set now appears first in the SET statement. The COMMA6. format prevails for the variable Salary because SHIPPING is the only data set that explicitly specifies a format for the variable.

Using the SET Statement When Variables Have Different Lengths

If you use the SET statement to concatenate data sets in which the same variable has different lengths, then the outcome of the concatenation depends on whether the variable is character or numeric. The SET statement determines the length of variables as follows:

The following program creates the RESEARCH data set for the sixth department, Research. Notice that the INPUT statement for this data set creates the variable Name with a length of 27; in all other data sets, Name has a length of 19.

options pagesize=60 linesize=80 pageno=1 nodate;

data research;
   input EmployeeID $ 1-9 Name $ 11-37 Gender $ 38
         @40 HireDate date9. Salary;
   format HireDate date9.;
   datalines;
922854076 Schoenberg, Marguerite     F 19nov1994 39800
770434994 Addison-Hardy, Jonathon    M 23feb1992 41400
242784883 McNaughton, Elizabeth      F 24jul1993 45000
377882806 Tharrington, Catherine     F 28sep1994 38600
292450691 Frangipani, Christopher    M 12aug1990 43900
;

proc print data=research;
   title 'Research Department Employees';
run;

The following output shows the results:

The RESEARCH Data Set

                         Research Department Employees                         1

         Employee
  Obs       ID                 Name              Gender     HireDate    Salary

   1     922854076    Schoenberg, Marguerite       F       19NOV1994     39800
   2     770434994    Addison-Hardy, Jonathon      M       23FEB1992     41400
   3     242784883    McNaughton, Elizabeth        F       24JUL1993     45000
   4     377882806    Tharrington, Catherine       F       28SEP1994     38600
   5     292450691    Frangipani, Christopher      M       12AUG1990     43900

If you concatenate all six data sets, naming RESEARCH in any position except the first in the SET statement, then SAS defines Name with a length of 19.

If you want your program to use the Name variable that has a length of 27, then you have two options. You can

In the first case, list the data set (RESEARCH) that uses the longer length first:

data dept6_1;
   set research shipping new_accounting
       security customer_support sales;
run;

In the second case, include a LENGTH statement in the DATA step that creates the new data set. If you change the length of a numeric variable, then the LENGTH statement can appear anywhere in the DATA step. However, if you change the length of a character variable, then the LENGTH statement must precede the SET statement.

The following program creates the data set DEPT1_6A. The LENGTH statement gives the character variable Name a length of 27, even though the first data set in the SET statement (SALES) assigns it a length of 19.

options pagesize=60 linesize=80 pageno=1 nodate;

data dept1_6a;
   length Name $ 27;
   set sales customer_support security
       new_accounting shipping research;
run;

proc print data=dept1_6a;
   title 'Employees in All Departments';
run;

The following output shows that all values of Name are complete. Note that the order of the variables in the new data set changes because Name is the first variable encountered in the DATA step.

The DEPT1_6A Data Set: Effects of Using a LENGTH Statement

                          Employees in All Departments                         1

                                Employee                        Home
  Obs  Name                        ID       HireDate  Salary   Phone    Gender

    1  Martin, Virginia         429685482  09AUG1990  34,800  493-0824        
    2  Singleton, MaryAnn       244967839  24APR1995  27,900  929-2623        
    3  Leighton, Maurice        996740216  16DEC1993  32,600  933-6908        
    4  Freuler, Carl            675443925  15FEB1998  29,900  493-3993        
    5  Cage, Merce              845729308  19OCT1992  39,800  286-0519        
    6  Sayre, Jay               324987451  15NOV1994  44,800  933-2998        
    7  Tolson, Andrew           596771321  18MAR1998  41,200  929-4800        
    8  Jensen, Helga            477562122  01FEB1991  47,400  286-2816        
    9  Kulenic, Marie           894724859  24JUN1993  41,400  493-1472        
   10  Zweerink, Anna           988427431  07JUL1995  43,700  929-3885        
   11  Saparilas, Theresa       744289612  09MAY1998  33,400              F   
   12  Brosnihan, Dylan         824904032  04JAN1992  38,200              M   
   13  Chao, Daeyong            242779184  28SEP1995  37,500              M   
   14  Slifkin, Leah            544382887  24JUL1994  45,000              F   
   15  Perry, Marguerite        933476520  19APR1992  39,900              F   
   16  Gardinski, Barbara       634875680  29MAY1998  49,800              F   
   17  Robertson, Hannah        824576630  14MAR1995  52,700              F   
   18  Gresham, Jean            744826703  28APR1992  54,000              F   
   19  Kruize, Ronald           824447605  23MAY1994  49,200              M   
   20  Linzer, Fritz            988674342  23JUL1992  50,400              M   
   21  Carlton, Susan           688774609  28JAN1995  29,200              F   
   22  Hoffmann, Gerald         922448328  12OCT1997  27,600              M   
   23  DePuis, David            544909752  23AUG1994  32,900              M   
   24  Hahn, Kenneth            745609821  23AUG1994  33,300              M   
   25  Landau, Jennifer         634774295  30APR1996  32,900              F   
   26  Schoenberg, Marguerite   922854076  19NOV1994  39,800              F   
   27  Addison-Hardy, Jonathon  770434994  23FEB1992  41,400              M   
   28  McNaughton, Elizabeth    242784883  24JUL1993  45,000              F   
   29  Tharrington, Catherine   377882806  28SEP1994  38,600              F   
   30  Frangipani, Christopher  292450691  12AUG1990  43,900              M   

Previous Page | Next Page | Top of Page