Organizing SAS Data Using the SPD Engine

How the SPD Engine Organizes SAS Data

Because the SPD Engine organizes data for high-performance processing, an SPD Engine data set is physically different from a default Base SAS engine data set. The default Base SAS engine stores data in a single data file that contains both data and data descriptors for the file (metadata). The SPD Engine creates separate files for the data and data descriptors. In addition, if the data set is indexed, the SPD Engine creates two index files for each index. Each of these four files is called an SPD Engine component file, and each has an identifier embedded in the filename.
The metadata component is a single physical file, but it can occupy multiple physical files, and each file has .mdf embedded in the filename. The data component is one or more physical files, and each file has .dpf embedded in the filename. If the index component exists because indexes have been defined, each index has two physical files:
  • one file with .hbx embedded in the filename
  • one file with .idx embedded in the filename
Each of these component files can consist of one or more physical files so that the component can span volumes, but be referenced as one logical file. For example, the SPD Engine can create many physical files containing data, but it references the files containing data as a single data component in an SPD Engine data set.
The metadata and index components differ from the data component in two ways:
  1. You can specify a fixed-length partition size for data component files using the PARTSIZE= option. You cannot specify the partition size for the metadata or index components.
  2. The data component files are created in a cyclical fashion across all defined paths. The metadata and index component files are created in a single defined path until that path is full, and then the next defined path is used.

Metadata Component Files

A SPD Engine data set stores the descriptive metadata in a file with the file extension .mdf. Usually, an SPD Engine data set has only one .mdf file. Its .mdf file includes the pathnames of all of its other component files.

Index Component Files

If the file is indexed, the SPD Engine creates two index files for each index. Each of these files contains a particular view of the index.
  • The index file with the .hbx file extension contains the global index.
  • The index file with the .idx file extension contains the segment index.

Data Component Files

The data component of an SPD Engine data set can be several files (partitions) per path or device, rather than just one. Each of these partitions is a fixed length, specified when you create the SPD Engine data set.
Specifying a partition size for the data component files enables you to tune the performance of your applications. The partitions are the threadable units, that is, each partition (file) is read in one thread. Creating and Loading SPD Engine Files provides details about how the SPD Engine stores data, metadata, and indexes.