File System Performance Concepts

Overview of File System Performance

SPD Server uses several file types in its data storage model. Data objects in SPD Server consist of one or more component files. Each component file is itself a collection of one or more disk files. These are called the partitions of the component.
Component files create partitions when any of the following conditions is true:
  • The current partition exceeds the user-specified PARTSIZE= value: Subsequent partitions are allocated in cyclical fashion across the set of directories that are specified in the DATAPATH= statement for the LIBNAME domain. Partitioning uses file-level striping to create PARTSIZE-sized files that complement disk-level striping that your operating system's volume manager software creates. SPD Server uses a default PARTSIZE= setting of 16 MB. PARTSIZE= determines a unit of work for parallel operations that require full table scans. Examples of parallel operations that require full table scans are WHERE clause evaluation and SQL GROUP-BY summarization. Trade-offs are balancing increased numbers of files used to store the table versus the work savings realized through parallel partitions. Extra partitions means that files are opened to process a table, but with fewer rows in each partition.
  • The current partition exceeds the RLIMIT_FILESIZE value: In UNIX systems, RLIMIT_FILESIZE is a system parameter that defines the maximum size of a single disk file. In Windows, SPD Server uses a default RLIMIT_FILESIZE value of 2 GB.
  • The current partition exceeds the space on the file system where it has been created.

Defining Directories

SPD Server allows the user to define a set of directories that contain component files and their partitions. Normally, a single directory path is constrained by some volume limit for the file system, or the maximum amount of disk space that the operating system understands.
Most UNIX and Windows systems offer a volume manager utility. You can use volume manager utilities to create file systems (volumes) that are greater than the available space on a single disk. System administrators can use these utilities to create large, multi-gigabyte volumes. These volumes can be spread across a number of disk partitions, or even span multiple disk devices. Volume manager utilities generally support creation of disk volumes that implement one of the common RAID (redundant arrays of inexpensive disks) configuration levels.

Disk Striping

A defining feature of all RAID levels is disk striping. Striping organizes the linear address space of a volume into pieces that are spread across a collection of disk drive partitions. For example, a user can configure a volume across two 1 GB partitions on separate disk drives A and B with a stripe size of 64K bytes. Stripe 0 lives on drive A, stripe 1 lives on drive B, stripe 2 lives on drive A, and so on.
By distributing the stripes of a volume across multiple disks it is possible to
  • achieve parallelism at the disk I/O level
  • use multiple kernel threads to drive a block of I/O
This also reduces contention and data transfer latency for a large block I/O because the physical transfer can be split across multiple disk controllers and drives.

RAID Levels

The following is a brief summary of RAID levels relevant to SPD Server:
RAID-0
High performance with low availability. Physically losing a disk means that data is lost. No redundancy exists to recover volume stripes on a failed disk.
RAID-1
Disk mirroring for high availability. Every block is duplicated on another mirror disk, sometimes referred to as shadowing. In the event one disk is lost, the mirror disk is still likely to be intact, preserving the data. RAID-1 can also improve read performance since a device driver has two potential sources for the same data. The system can choose the drive that has the least load or latency at a given point in time. The down side to RAID-1: it requires twice the number of disk drives as RAID-0 to store a given amount of data.
RAID-5
High performance and high availability at the expense of resources. An error correcting code (ECC) is generated for each stripe written to disk. The ECC distributes the data in each logical stripe across physical stripes in such a way that if a given disk in the volume is lost, data in the logical stripe can still be recovered from the remaining physical stripes. RAID-5's downside is resource utilization; RAID-5 requires extra CPU cycles and extra disk space to transform and manage data using the ECC model.
RAID-1+0
Many RAID systems offer a combination of RAID-1 (pure disk mirroring) and RAID-0 (striping) to provide both redundancy and I/O parallelism in a configuration known as RAID-1+0 (sometimes referred to as RAID-10). Advantages are the same as for RAID-1 and RAID-0. The only disadvantage is the requirement for twice as much disk as the pure RAID-0 solution. Generally, this configuration tends to be a top performer if you have the disk resources to pursue it.
Regardless of RAID level, disk volumes should be hardware striped when using the SPD Server software. This is a significant way to improve performance. Without hardware striping, I/O will bottleneck and constrain SPD Server performance.

Transient Storage

You should configure a RAID-0 volume for WORKPATH= storage for your SPD Server. When sizing this RAID-0 volume, keep in mind that the WORKPATH= that you set up a given SPD Server host must be shared by all of its SQL and LIBNAME proxy processes that exist at a given point in time. The SPD Server Frequently Asked Questions (FAQ) is a good source of information about estimating disk space requirements for WORKPATH=.
Consider using one or more RAID-0 volumes to locate the database domains that will support TEMP=YES LIBNAME assignments. This LIBNAME statement option creates a temporary storage domain that exists only for the duration of the LIBNAME assignment. This is the SPD Server equivalent of the SAS WORK library. All data objects (tables, catalogs, utility files) that are created in the TEMP=YES temporary domain are automatically deleted when you end the SAS session.