Benefits of Dynamic Cluster Tables

Overview of Dynamic Cluster Tables

Organizing SPD Server data into dynamic cluster tables creates an architecture that supports parallelism, enhanced data flexibility, and enhanced manageability. It significantly improves speed in robust data warehousing environments that use large and very large tables.

For example, you can add new data or remove historical data from very large tables by accessing only the member tables that are affected by the change. You can access the individual member tables in parallel. This strategy reduces the time that is needed for the job to complete, and it uses simple commands. Furthermore, a complete refresh of a dynamic cluster table uses a fraction of the disk space that is needed to refresh a large traditional SAS or SPD Server table with the same amount of data.

Parallel Loading

Because dynamic cluster tables are virtual tables that consist of numerous small SPD Server tables, the architecture enables parallel loading and processing. Cluster table loads and refreshes are broken down into multiple tasks that can be performed concurrently. Separate SAS/CONNECT MP CONNECT jobs manage the parallel loading and processing.

The scalability of parallel loading with dynamic cluster tables depends on the scalability of the server I/O and on the number of processors on the server.

Parallel loading requires multiple concurrent writes to disk. If the I/O hardware does not scale appropriately, the parallel loading process can degrade performance.

SPD Server can create multiple indexes on the same table in parallel. Index creation is a CPU-intensive process.

When sufficient processing power is available, parallel index creation in SPD Server is highly scalable.

The creation process for each index is threaded. A single index creation can use multiple CPUs on a server if they are available, which greatly improves performance.

Fast and Economical Refreshes

Refreshing a dynamic cluster table uses a fraction of the disk space that a traditional SPD Server table with the same amount of data uses. The dynamic cluster table architecture enables users to refresh many large tables concurrently, while conserving disk and I/O resources. With very large traditional SAS or SPD Server tables, available disk space can limit the number of tables that can be concurrently refreshed.

In the life cycle of data warehouses, tables can be refreshed to recapture disk space after rows have been updated or deleted. Refreshing tables can reorder data for optimized performance. However, refreshing a table can temporarily use twice the disk space of the table itself. With very large tables, disk space can be a problem when updating a data warehouse or data mart. When disk space is limited on a server, the amount of data that can be simultaneously refreshed is constrained. The window of time that is required to load and refresh can become huge.

Because dynamic cluster tables can be quickly unbound into smaller SPD Server tables, refreshing dynamic cluster tables does not use twice the disk space of the original table itself. Instead, only twice the disk space of the largest member table in the dynamic cluster table is used.

After the dynamic cluster table is unbound, disk space equal to the first member table is required to perform a refresh. A backup of the refresh is created, and then the old version is deleted, which creates more available disk space. The refresh process repeats for each successive member table until all members in the dynamic cluster table have been refreshed and updated. Then, the member tables are merged into a dynamic cluster table again.

When a server has enough disk space and I/O resources to refresh more than one member table at a time, the benefits of parallel processing can be realized.