Starting with SAS Data Sets |
Overview of Reading Selected Variables |
You can create a subset of a larger data set not only by excluding observations but also by specifying which variables you want the new data set to contain. In a DATA step you can use the SET statement and the KEEP= or DROP= data set options (or the DROP and KEEP statements) to create a subset from a larger data set by specifying which variables you want the new data set to include.
Keeping Selected Variables |
This example uses the KEEP= data set option in the SET statement to read only the variables that represent the services-related expenditures of the data set CITY.
data services; set city (keep=Year ServicesTotal ServicesPolice ServicesFire ServicesWater_Sewer); run; proc print data=services; title 'City Services-Related Expenditures'; run;
The following output shows the resulting data set. Note that the data set SERVICES contains only those variables that are specified in the KEEP= option.
Selecting Variables with the KEEP= Option
City Services-Related Expenditures 1 Services Services Services Water_ Services Obs Year Police Fire Sewer Total 1 1980 2819 1120 422 4361 2 1981 2477 1160 500 4137 3 1982 2028 1061 510 3599 4 1983 2754 893 540 4187 5 1984 2195 963 541 3699 6 1985 1877 926 535 3338 7 1986 1727 1111 535 3373 8 1987 1532 1220 519 3271 9 1988 1448 1156 577 3181 10 1989 1500 1076 606 3182 11 1990 1934 969 646 3549 12 1991 2195 1002 643 3840 13 1992 2204 964 692 3860 14 1993 2175 1144 735 4054 15 1994 2556 1341 813 4710 16 1995 2026 1380 868 4274 17 1996 2526 1454 946 4926 18 1997 2027 1486 1043 4556 19 1998 2037 1667 1152 4856 20 1999 2852 1834 1318 6004 21 2000 2787 1701 1317 5805
The following example uses the KEEP statement instead of the KEEP= data set option to read all of the variables from the CITY data set. The KEEP statement creates a new data set (SERVICES) that contains only the variables listed in the KEEP statement. The following program gives results that are identical to those in the previous example:
data services; set city; keep Year ServicesTotal ServicesPolice ServicesFire ServicesWater_Sewer; run;
The following example has the same effect as using the KEEP= data set option in the DATA statement. All of the variables are read into the program data vector, but only the specified variables are written to the SERVICES data set:
data services (keep=Year ServicesTotal ServicesPolice ServicesFire ServicesWater_Sewer); set city; run;
Dropping Selected Variables |
Use the DROP= option to create a subset of a larger data set when you want to specify which variables are being excluded rather than which ones are being included. The following DATA step reads all of the variables from the data set CITY except for those that are specified with the DROP= option, and then creates a data set named SERVICES2:
data services2; set city (drop=Total AdminTotal AdminLabor AdminSupplies AdminUtilities); run; proc print data=services2; title 'City Services-Related Expenditures'; run;
The following output shows the resulting data set:
Excluding Variables with the DROP= Option
City Services-Related Expenditures 1 Services Services Services Water_ Services Obs Year Police Fire Sewer Total 1 1980 2819 1120 422 4361 2 1981 2477 1160 500 4137 3 1982 2028 1061 510 3599 4 1983 2754 893 540 4187 5 1984 2195 963 541 3699 6 1985 1877 926 535 3338 7 1986 1727 1111 535 3373 8 1987 1532 1220 519 3271 9 1988 1448 1156 577 3181 10 1989 1500 1076 606 3182 11 1990 1934 969 646 3549 12 1991 2195 1002 643 3840 13 1992 2204 964 692 3860 14 1993 2175 1144 735 4054 15 1994 2556 1341 813 4710 16 1995 2026 1380 868 4274 17 1996 2526 1454 946 4926 18 1997 2027 1486 1043 4556 19 1998 2037 1667 1152 4856 20 1999 2852 1834 1318 6004 21 2000 2787 1701 1317 5805The following example uses the DROP statement instead of the DROP= data set option to read all of the variables from the CITY data set and to exclude the variables that are listed in the DROP statement from being written to the new data set. The results are identical to those in the previous example:
data services2; set city; drop Total AdminTotal AdminLabor AdminSupplies AdminUtilities; run; proc print data=services2; run;
Choosing between Data Set Options and Statements |
When you create only one data set in the DATA step, the data set options to drop and keep variables have the same effect on the output data set as the statements to drop and keep variables. When you want to control which variables are read into the program data vector, using the data set options in the statement (such as a SET statement) that reads the SAS data set is generally more efficient than using the statements. Later topics in this section show you how to use the data set options in some cases where the statements will not work.
Choosing between the DROP= and KEEP= Data Set Option |
In a simple case, you might decide to use the DROP= or KEEP= option, depending on which method enables you to specify fewer variables. If you work with large jobs that read data sets, and you expect that variables might be added between the times your batch jobs run, you may want to use the KEEP= option to specify which variables are included in the subset data set.
The following figure shows two data sets named SMALL. They have different contents because the new variable F was added to data set BIG before the DATA step ran on Tuesday. The DATA step uses the DROP= option to keep variables D and E from being written to the output data set. The result is that the data sets contain different contents: the second SMALL data set has an extra variable, F. If the DATA step used the KEEP= option to specify A, B, and C, then both of the SMALL data sets would have the same variables (A, B, and C). The addition of variable F to the original data set BIG would have no effect on the creation of the SMALL data set.
Using the DROP= Option
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.