Benefits of Dynamic Cluster Tables

Parallel Loading

Because dynamic cluster tables are virtual tables that consist of numerous smaller SPD Server tables, the architecture enables parallel loading and processing. Cluster table loads and refreshes are broken down into multiple tasks that can be performed concurrently. You can use separate SAS/CONNECT MP CONNECT jobs to manage the parallel loading and processing.

The scalability of parallel loading with dynamic cluster tables depends on the scalability of the server I/O and on the number of processors on the server.

Parallel loading requires multiple concurrent writes to disk. If the I/O hardware does not scale appropriately, the parallel loading process can degrade performance.

SPD Server can create multiple indexes on the same table in parallel. Index creation is a CPU-intensive process. When sufficient processing power is available, parallel index creation in SPD Server is highly scalable. The creation process for each index is threaded. A single index creation can use multiple CPUs on a server if they are available, which greatly improves performance.

Fast and Economical Refreshes

Refreshing a dynamic cluster table uses a fraction of the disk space that a traditional SPD Server table with the same amount of data uses. The dynamic cluster table architecture enables you to refresh many large tables concurrently, while conserving disk and I/O resources. With very large traditional SAS or SPD Server tables, available disk space can limit the number of tables that you can refresh concurrently.

In the life cycle of data warehouses, tables can be refreshed to recapture disk space after rows have been updated or deleted. Refreshing tables can reorder data for optimized performance. However, refreshing a table can temporarily use twice the disk space of the table itself. With very large tables, disk space can be a problem when updating a data warehouse or data mart. When disk space is limited on a server, the amount of data that can be simultaneously refreshed is constrained. The amount of time that is required to load and refresh can become huge.

Because dynamic cluster tables can be quickly unbound into smaller SPD Server tables, refreshing dynamic cluster tables does not use twice the disk space of the original table itself. Instead, only twice the disk space of the largest member table in the dynamic cluster table is used.

After the dynamic cluster table is unbound, disk space equal to the first member table is required to perform a refresh. A backup of the refresh is created, and then the old version is deleted, which creates more available disk space. The refresh process repeats for each successive member table until all members in the dynamic cluster table have been refreshed and updated. Then, the member tables are merged into a dynamic cluster table again.

When a server has enough disk space and I/O resources to refresh more than one member table at a time, parallel processing provides added benefits.