Previous Page | Next Page

Starting with SAS Data Sets

Reading Selected Variables


Overview of Reading Selected Variables

You can create a subset of a larger data set not only by excluding observations but also by specifying which variables you want the new data set to contain. In a DATA step you can use the SET statement and the KEEP= or DROP= data set options (or the DROP and KEEP statements) to create a subset from a larger data set by specifying which variables you want the new data set to include.


Keeping Selected Variables

This example uses the KEEP= data set option in the SET statement to read only the variables that represent the services-related expenditures of the data set CITY.

data services;
   set city (keep=Year ServicesTotal ServicesPolice ServicesFire
            ServicesWater_Sewer);
run;

proc print data=services;
   title 'City Services-Related Expenditures';
run;

The following output shows the resulting data set. Note that the data set SERVICES contains only those variables that are specified in the KEEP= option.

Selecting Variables with the KEEP= Option

                       City Services-Related Expenditures                      1

                                                 Services
                         Services    Services     Water_     Services
          Obs    Year     Police       Fire        Sewer       Total

            1    1980      2819        1120         422        4361  
            2    1981      2477        1160         500        4137  
            3    1982      2028        1061         510        3599  
            4    1983      2754         893         540        4187  
            5    1984      2195         963         541        3699  
            6    1985      1877         926         535        3338  
            7    1986      1727        1111         535        3373  
            8    1987      1532        1220         519        3271  
            9    1988      1448        1156         577        3181  
           10    1989      1500        1076         606        3182  
           11    1990      1934         969         646        3549  
           12    1991      2195        1002         643        3840  
           13    1992      2204         964         692        3860  
           14    1993      2175        1144         735        4054  
           15    1994      2556        1341         813        4710  
           16    1995      2026        1380         868        4274  
           17    1996      2526        1454         946        4926  
           18    1997      2027        1486        1043        4556  
           19    1998      2037        1667        1152        4856  
           20    1999      2852        1834        1318        6004  
           21    2000      2787        1701        1317        5805  

The following example uses the KEEP statement instead of the KEEP= data set option to read all of the variables from the CITY data set. The KEEP statement creates a new data set (SERVICES) that contains only the variables listed in the KEEP statement. The following program gives results that are identical to those in the previous example:

data services;
   set city;
   keep Year ServicesTotal ServicesPolice ServicesFire
        ServicesWater_Sewer;
run;

The following example has the same effect as using the KEEP= data set option in the DATA statement. All of the variables are read into the program data vector, but only the specified variables are written to the SERVICES data set:

data services (keep=Year ServicesTotal ServicesPolice ServicesFire
               ServicesWater_Sewer);
   set city;
run;


Dropping Selected Variables

Use the DROP= option to create a subset of a larger data set when you want to specify which variables are being excluded rather than which ones are being included. The following DATA step reads all of the variables from the data set CITY except for those that are specified with the DROP= option, and then creates a data set named SERVICES2:

data services2;
   set city (drop=Total AdminTotal AdminLabor AdminSupplies 
             AdminUtilities);
run;

proc print data=services2;
   title 'City Services-Related Expenditures';
run;

The following output shows the resulting data set:

Excluding Variables with the DROP= Option

                       City Services-Related Expenditures                      1

                                                 Services
                         Services    Services     Water_     Services
          Obs    Year     Police       Fire        Sewer       Total

            1    1980      2819        1120         422        4361  
            2    1981      2477        1160         500        4137  
            3    1982      2028        1061         510        3599  
            4    1983      2754         893         540        4187  
            5    1984      2195         963         541        3699  
            6    1985      1877         926         535        3338  
            7    1986      1727        1111         535        3373  
            8    1987      1532        1220         519        3271  
            9    1988      1448        1156         577        3181  
           10    1989      1500        1076         606        3182  
           11    1990      1934         969         646        3549  
           12    1991      2195        1002         643        3840  
           13    1992      2204         964         692        3860  
           14    1993      2175        1144         735        4054  
           15    1994      2556        1341         813        4710  
           16    1995      2026        1380         868        4274  
           17    1996      2526        1454         946        4926  
           18    1997      2027        1486        1043        4556  
           19    1998      2037        1667        1152        4856  
           20    1999      2852        1834        1318        6004  
           21    2000      2787        1701        1317        5805  
The following example uses the DROP statement instead of the DROP= data set option to read all of the variables from the CITY data set and to exclude the variables that are listed in the DROP statement from being written to the new data set. The results are identical to those in the previous example:
data services2;
   set city;
   drop Total AdminTotal AdminLabor AdminSupplies AdminUtilities;
run;
proc print data=services2; 
run;


Choosing between Data Set Options and Statements

When you create only one data set in the DATA step, the data set options to drop and keep variables have the same effect on the output data set as the statements to drop and keep variables. When you want to control which variables are read into the program data vector, using the data set options in the statement (such as a SET statement) that reads the SAS data set is generally more efficient than using the statements. Later topics in this section show you how to use the data set options in some cases where the statements will not work.


Choosing between the DROP= and KEEP= Data Set Option

In a simple case, you might decide to use the DROP= or KEEP= option, depending on which method enables you to specify fewer variables. If you work with large jobs that read data sets, and you expect that variables might be added between the times your batch jobs run, you may want to use the KEEP= option to specify which variables are included in the subset data set.

The following figure shows two data sets named SMALL. They have different contents because the new variable F was added to data set BIG before the DATA step ran on Tuesday. The DATA step uses the DROP= option to keep variables D and E from being written to the output data set. The result is that the data sets contain different contents: the second SMALL data set has an extra variable, F. If the DATA step used the KEEP= option to specify A, B, and C, then both of the SMALL data sets would have the same variables (A, B, and C). The addition of variable F to the original data set BIG would have no effect on the creation of the SMALL data set.

Using the DROP= Option

[Using the DROP= Option]

Previous Page | Next Page | Top of Page