Organizing Domains for Scalability

Overview of Organizing Domains

SPD Server performance is based on scalable I/O. You can use the libnames.parm file to optimize the way SPD Server stores files in order to exploit scalable I/O. The Domain Access Options section in this document provides instructions on how to specify named paths for the three data components of SPD Server tables (observation data tables, index data tables, metadata tables) as well as paths for temporary intermediate calculation tables. LIBNAME domain declaration statements can specify the system paths that are associated with each table space component, but the SPD Server administrator must allocate the correct amount of disk space and I/O redundancy to the various paths.
This section provides functional information about the table spaces that are defined by the DATAPATH=, INDEXPATH=, WORKPATH=, and METAPATH= options of the LIBNAME domain declaration statements. SPD Server administrators should use this information to determine the best sizing, I/O, and redundancy requirements to optimize performance and scalability for named SPD Server domain paths.

Data Table Space

When a domain is declared in a LIBNAME statement, data tables are stored in the space defined in the PATHNAME= specification, unless the DATAPATH= option is specified. The PATHNAME= space is designed to contain metadata tables for a domain, but it can also contain data tables. As a domain's size and complexity increases, so do the benefits for organizing data tables into their own DATAPATH= space.
Organizing your data table space significantly impacts I/O scalability. The disk space allocated to data tables stores permanent warehouse tables that users will access. It is important for this disk space to support scalable I/O because it facilitates both parallel processing and real-time multi-user access to the data. In a large warehouse, this disk space is likely to see the greatest proportion of Read and Write I/O.
Tables in the data table space are typically loaded or refreshed using batch processes during evenings or off-peak hours (such as weekends and holidays). Access to data table space is often restricted to read-only for all users except for the administrators who perform the load and refresh processes.
To ensure reliability, data table space is typically organized into RAID 1+0 or RAID-5 disk configurations. Very large warehouses should consider a RAID-5 configuration with a second storage array to mirror the data.

Index Table Space

When a domain is declared in a LIBNAME statement, index tables are stored in the space defined in the PATHNAME= specification, unless the INDEXPATH= option is specified. The PATHNAME= space is designed to contain metadata tables for a domain, but it can also contain index tables. As a domain's size and complexity increases, so do the benefits for organizing index tables into their own INDEXPATH= space.
Index space typically does not require the high-level scalability that data space, temporary table space, or workspaces need for I/O performance. When a process is using an index, the read access pattern is very different from a parallel I/O pattern of data or multiple user patterns against data.
Index space is typically configured as a large striped file system across a large number of disks and I/O channels. A typical configuration such as RAID 1+0 or RAID 5 will support some type of redundancy to ensure index space availability.

Metadata Table Space

When a domain is declared in a LIBNAME statement, metadata tables are stored in the space defined in the PATHNAME= specification. If the space configured in PATHNAME= fills, SPD Server stores overflow metadata for existing tables in the space defined in the optional METAPATH= specification, if it is declared. The PATHNAME= and METAPATH= spaces are specifically designed to contain metadata tables for a domain.
Compared to the other space categories, metadata space is relatively small and usually does not require scalability. If compressed data in a given warehouse uses 10 terabytes of disk space, there will be approximately 10 gigabytes of metadata.
As a rule of thumb, when setting up metadata space, plan to allot 20 gigabytes of metadata space for every 10 terabytes of physical data disk space. When new data paths are added to expand a server, additional metadata space should be added within the primary path of the server.
A table's metadata becomes larger when there are rows in the table that are marked as deleted. Bitmaps are stored in the metadata that is used to filter the deleted rows. The space required depends on the number of rows deleted and on their distribution within the table.
Although the space required for the metadata is small, the setup and configuration of the disk space is very important. The disk space must be expandable, mirrored, and backed up.

Work File Space

SPD Server administrators use statements in the body of the spdsserv.parm file to reserve a space for intermediate calculations and temporary files. The workspace that is configured in spdsserv.parm is shared by all SPD Server users.
Some SPD Server users have data needs that might be constrained by using the common intermediate calculation and file space reserved for all users. SPD Server administrators can use the libnames.parm file to create and reserve a workspace that is specifically associated with a single domain and its approved users. This presents improvement opportunities for both security and performance. As a domain's size and complexity increases, so do the benefits for organizing temporary and intermediate tables into their own workspace defined by WORKPATH=.
Work space refers to the area on disk that SPD Server software uses to store required files when the available CPU memory cannot contain the entire set of calculations. During events like these, some utility files are written to disk. Work space is important to scalability. Tasks such as large sorts, index creation, parallel group-by operations, and SQL joins can require dedicated workspace to store temporary utility files.
Work space is typically configured as part of a large striped file system that spans as many disks and I/O channels as possible. Workspace I/O can critically impact the performance behavior of an SPD Server host.
Work space on disk is typically a RAID 0 configuration or some hardware-redundant RAID design. RAID 0 configurations are risky to the extent that if the RAID 0 disk goes down, the system will also be affected, and any process that was running at the time of failure will probably be affected.