Selecting a Load Technique in the Table Loader

Problem

You want to load data into a permanent physical table that is structured to match your data model. As the designer or builder of a process flow in SAS Data Integration Studio, you must identify which one of these load styles best meets your process requirements:
  • appending all of the source data to any previously loaded data
  • replacing all previously loaded data with the source data
  • using the source data to update and add to the previously loaded data that is based on specific key columns
Once you know which load style is required, you can select the techniques and options that maximize the step's performance.
Note: All table loaders have similar Load Technique tabs, but this example is specific to the Table Loader Transformation. For specific instructions about other loaders, see the Help topics for the other loaders.

Solution

You can use the Table Loader transformation to perform any of the three load styles. The transformation generates the code that is required to load SAS data sets, database tables, and other types of data, such as an Excel spreadsheet. When you load a table type that supports indexing or constraints, you can use the Table Loader transformation to manage indexes and constraints on the table.
You select the load style in the Load style field on the Load Technique tab of the Table Loader transformation. After you have selected the load style, you can choose from a number of load techniques and options. Based on the load style that you select and the type of table that is being loaded, the choice of techniques and options can vary. The Table Loader transformation generates code to perform a combination of the following loading tasks:
The following sections describe the SAS code alternatives for each load task and provide tips for selecting the load technique (or techniques) that performs best.

Tasks

Remove All Rows

This task is associated with the Replace Load style. Based on the type of target table that is being loaded, two or three of the following selections are listed in the Replace field:
  • Replace Entire table: uses PROC DATASETS to delete the target table
  • Replace All rows using truncate: uses PROC SQL with TRUNCATE to remove all rows (only available for DBMS tables that support truncation)
  • Replace All rows using delete: uses PROC SQL with DELETE * to remove all rows
  • Replace Simulating truncate: uses the DATA step with SET and STOP statements to remove all rows (available only for SAS tables)
When you select Replace Entire table, the table is removed and disk space is freed. Then the table is recreated with 0 rows. Consider this option unless your security requirements restrict table deletion permissions (a restriction that is commonly imposed by a database administrator on database tables). Also, avoid this method if the table has any indexes or constraints that SAS Data Integration Studio cannot recreate from metadata (for example, check constraints).
If available, consider using Replace All rows using truncate. Either of the replace all rows selections enables you to keep all indexes and constraints intact during the load. By design, using TRUNCATE is the quickest way to remove all rows. In Replace All rows using delete, the DELETE * syntax also removes all rows. However, based on the database and table settings, this choice can incur overhead that can degrade performance. Consult your database administrator or the database documentation for a comparison of the two techniques.
CAUTION:
When DELETE * is used repeatedly to clear a SAS table, the size of that table should be monitored over time. DELETE * performs only logical deletes for SAS tables. Therefore, the table's physical size continues to increase, which can negatively affect performance.
Replace Simulating truncate is available for SAS tables. It does not remove rows from a table as Replace All rows using delete does, or as Replace All rows using truncate does for a DBMS. It actually behaves more like Replace Entire table in that the entire table is replaced with an empty table before being loaded. Unlike Replace All rows using delete, this replace style does not have the issue of ever-increasing table size.
Compared to Replace Entire table, Replace Simulating truncate offers an advantage in that it can maintain constraints such as check constraints that cannot be defined in metadata for SAS Data Integration Studio. If a target table is to have check constraints, the physical table must be created with all constraints applied before a Table Loader can load it with Replace Simulating truncate. This can be done once, outside of SAS Data Integration Studio or in user-written code in a SAS Data Integration Studio job. When the Loader step runs and the target table already exists, the step simulates a Truncate by creating an empty table with structure and constraints that are identical to the original, and then appends or inserts the data from the source table.
It is important to understand that Replace Simulating truncate, by design, ignores all constraint metadata when code is generated (except to create code to initialize the target if it does not already exist). Therefore, constraints on the physical table cannot be modified by changing constraint metadata and regenerated and rerunning with Replace Simulating truncate.
Note: If you are using Generation Data Sets, use the Simulating Truncate load technique instead of the DELETE * syntax.

Add New Rows

For this task, the Table Loader transformation provides two techniques for all three load styles: PROC APPEND with the FORCE option and PROC SQL with the INSERT statement. The two techniques handle discrepancies between source and target table structures differently.
PROC APPEND with the FORCE option is the default. If the source is a large table and the target is in a database that supports bulk-load, PROC APPEND can take advantage of the bulk-load feature. Consider bulk-loading the data into database tables with the optimized SAS/ACCESS engine bulk loaders. (It is recommended that you use native SAS/ACCESS engine libraries instead of ODBC libraries or OLEDB libraries for relational database data. SAS/ACCESS engines have native access to the databases and have superior bulk-loading capabilities.)
PROC SQL with the INSERT statement performs well when the source table is small because you do not incur the overhead that is needed to set up bulk-loading. PROC SQL with INSERT adds one row at a time to the database.

Match and Update Rows

The Table Loader transformation provides three techniques for matching and updating rows in a table. All the following techniques are associated with the Update/Insert load style:
  • DATA step with the MODIFY BY option
  • DATA step with the MODIFY KEY= option
  • PROC SQL with the WHERE and SET statements
For each of these techniques, you must select one or more columns or an index for matching. All three techniques update matching rows in the target table. The MODIFY BY and MODIFY KEY= options can take unmatched records and add them to the target table during the same pass-through on the source table.
Of these three choices, the DATA step with MODIFY KEY= option often outperforms the other update methods in tests conducted on loading SAS tables. An index is required. The MODIFY KEY= option can also perform adequately for database tables when indexes are used.
When the Table Loader uses PROC SQL with WHERE and SET statements to match and update rows, performance varies. When used in PROC SQL, neither of these statements requires data to be indexed or sorted, but indexing on the key columns can greatly improve performance. Both of these statements use WHERE processing to match each row of the source table with a row in the target table.
The update technique that you choose depends on the percentage of rows being updated. If the majority of target records are being updated, the DATA step with MERGE (or UPDATE) might perform better than the DATA step with MODIFY BY or MODIFY KEY= or PROC SQL because MERGE makes full use of record buffers. Performance results can vary by hardware and operating environment, so you should consider testing more than one technique.
Note: The general Table Loader transformation does not offer the DATA step with MERGE as a load technique. However, you can revise the code for the MODIFY BY technique to do a merge and save that as user-written code for the transformation.