Performance Considerations |
Row and Column Selection |
Limiting the number of rows that the DBMS returns to SAS is an extremely important performance consideration. The less data that the SAS job requests, the faster the job runs.
Wherever possible, specify selection criteria that limits the number of rows that the DBMS returns to SAS. Use the SAS WHERE clause to retrieve a subset of the DBMS data.
If you are interested in only the first few rows of a table, consider adding the OBS= option. SAS passes this option to the DBMS to limit the number of rows to transmit across the network, which can significantly improve performance against larger tables. To do this if you are using SAS Enterprise Guide, select View Explorer, select the table that you want from the list of tables, and select the member that you want to see the contents of the table.
Likewise, select only the DBMS columns that your program needs. Selecting unnecessary columns slows your job.
The KEEP= and DROP= Options |
Just as with a SAS data set you can use the DROP= and KEEP= data set options to prevent retrieving unneeded columns from your DBMS table.
In this example the KEEP= data set option causes the SAS/ACCESS engine to select only the SALARY and DEPT columns when it reads the MYDBLIB.EMPLOYEES table.
libname mydblib db2 user=testid password=testpass database=testdb; proc print data (keep=salary dept); where dept='ACC024'; quit;
The generated SQL that the DBMS processes is similar to the following code:
SELECT "SALARY", "DEPT" FROM EMPLOYEES WHERE(DEPT="ACC024")
Without the KEEP option, the SQL processed by the DBMS would be similar to the following:
SELECT * FROM EMPLOYEES WHERE(DEPT="ACC024")
This would result in all of the columns from the EMPLOYEES table being read in to SAS.
The DROP= data set option is a parallel option that specifies columns to omit from the output table. Keep in mind that the DROP= and KEEP= data set options are not interchangeable with the DROP and KEEP statements. Use of the DROP and KEEP statements when selecting data from a DBMS can result in retrieval of all column into SAS, which can seriously impact performance.
For example, the following would result in all of the columns from the EMPLOYEES table being retrieved into SAS. The KEEP statement would be applied when creating the output data set.
libname mydblib db2 user=testid password=testpass database=testdb; data temp; set mydblib.employees; keep salary; run;
The following is an example of how to use the KEEP data set option to retrieve only the SALARY column:
data temp; set mydblib.employees(keep=salary); run;
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.