FOCUS AREAS

Hot Topics

SAS® Scalable Performance Data Engine

A new engine has been developed for SAS 9 called the SAS Scalable Performance Data Engine (SPD Engine). The purpose of this engine is to speed the processing of large data sets by accessing data that has been partitioned into multiple physical files called partitions. The SPD Engine initiates multiple threads with each thread having a direct path to a partition of the data set. Each partition can then be accessed in parallel (by a separate processor) which allows the application to analyze data in parallel, as fast as the data is read from disk. This can effectively reduce any I/O bottlenecks and substantially decrease the elapsed time to process data. The SPD Engine is supported on all SAS 9.1 platforms.

Support of partitioned data is the foundation for the performance gains possible with the SPD Engine. Partitioning the data is something that must be specified with SAS syntax when a data set is created. Once data has been partitioned, there are several areas in SAS 9 where gains in performance can be seen with the SPD Engine. The first is with any data step or SAS procedure that does WHERE processing. The SPD Engine provides parallel WHERE clause processing by initiating multiple threads to apply the same WHERE clause to each of the partitions in parallel. Another is a set of SAS procedures that have been modified to take advantage of the ability to read blocks of data in parallel from multiple partitions. Another area for potential performance gain by using partitioned data is to reduce I/O bottlenecks in a multi-user environment. For example, MP CONNECT could create multiple sessions that each use the SPD Engine to read input from a common data set with much less I/O contention.

Other benefits of the SPD Engine are that it will do an implicit sort of the data that is returned to the application if a BY statement is present. The disk copy of data does not get sorted, only the data that is returned to the application in memory. In addition, WHERE processing will make use of multiple indexes when possible; the BASE engine uses only one.

The SPD Engine evolved from the SPD Server product; therefore, many of its feature are derived from SPD Server. SPD Server supports a client/server environment requiring multiple SAS sessions. It also provides more functionality than the SPD Engine. However, the need to bring support of partitioned data into Base SAS resulted in the creation of the SPD Engine. Unlike SPD Server, the engine runs entirely in the same SAS process or session as the rest of your SAS job. It is not the default engine for SAS 9 and must be specified with an engine name of SPDE on a libname statement. The SPD Engine also does not support all of the features of the BASE engine. It is a goal to support more of the BASE engine functionality in the SPD Engine in a future release. However, the BASE engine will continue to be supported because the data sets created by these two engines are not interchangeable.

page divider

We at SAS have created the Scalability Community to make you aware of the connectivity and scalability features and enhancements that you can leverage for your SAS installation. The success of this community depends on you. Send electronic mail to scalability@sas.com with your comments, requirements, and suggestions.