FOCUS AREAS

Technology Highlight: MP CONNECT

Increase processing power across your network

The business world keeps moving faster, and executives must make quicker decisions to stay one step ahead of the game. Unfortunately, they simultaneously find themselves dealing with larger and larger volumes of data. So how do we process huge amounts of data faster in order to make timely business decisions?

Hardware vendors have responded to this problem by creating symmetric multiprocessing (SMP) machines that provide increased horsepower for solving these problems. Even though many organizations have acquired these multiprocessor machines, the applications they use frequently cannot take full advantage of the computing power available in today's server platforms. Also, let's not forget that nearly all organizations have networked PCs or UNIX workstations. And with both multiprocessor machines and networked machines, many of the processors often sit idle.

The driving strength of the SAS System has always been the ability to analyze huge amounts of data and turn it into information -- information that can be used to make good, profitable decisions. However, SAS is a single-threaded application that, for the most part, can utilize only a single processor at a time.

MP CONNECT enables parallel processing with SAS Version 8, making use of idle processors to reduce the time required to process huge amounts of data. It gives users the ability to run the SAS System in parallel, multiplying the analytic capabilities of SAS for more timely business decisions. In addition, MP CONNECT not only allows parallel processing with SAS to better utilize the processing power of stand-alone multiprocessor servers, but it also can be extended easily to the processors available across your network.

With MP CONNECT, you perform multiprocessing with Version 8 of the SAS System by establishing a connection between a "parent" SAS session and one or more additional SAS sessions. Each of the sessions can then asynchronously execute tasks in parallel. You can continue processing with the parent session, query the status of any of the async tasks, and merge the asynchronous tasks back into your parent session at the appropriate time. You gain the ability to exploit MP/SMP hardware as well as network resources to perform parallel processing of self-contained tasks and easily coordinate the results into the parent SAS session.

The current MP CONNECT functionality is designed to address independent parallelism, which is possible when you have two or more tasks to execute and those tasks do not have interdependencies. An example of independent parallelism would be extraction and processing of data from multiple unique, and possibly remote, data sources. Another example would be a HOLAP scenario that requires the creation of multiple MDDBs.

The primary purpose of parallel processing is to reduce the time that it would take to execute the same job serially. "The performance gains can be amazing," says John Bentley of First Union National Bank. "On a four-way UNIX box we cut the processing time of a 9-million-record data set from 46 minutes to 17 minutes. For another job run against a terabyte-class data warehouse, we cut the SQL processing time from just over an hour to 20 minutes. Even if it takes a few hours to modify and test a production program, you quickly recoup that time. Also, don't overlook the fact that not only are you speeding up your jobs, but you're also scaling up your server because it can handle more work in the same amount of time."

Not every application has the characteristics to benefit from MP CONNECT. If you can segment your application, or some portion of it, into independent units of work that process their data sources separately and independently, then taking the time to modify an application to use MP CONNECT can yield a very high return on investment. Lin Wen of China's Shanghai Baosteel Computer System Engineering Corp. uses MP CONNECT to reduce the time it takes to run several time-consuming nightly batch jobs. "Prior to SAS V8, I had to let them run one by one, or invoke several SAS sessions concurrently in order to save time, even though there is SMP hardware. With MP CONNECT, only minimal modifications need to be made to the existing jobs, and now they are running in 33 percent less time."

In addition to the many external customers achieving greater performance and scalability with MP CONNECT, several internal developers at SAS are realizing the benefits as well. Randy Tobias, manager of SAS Linear Models in R&D, is among them. "MP CONNECT combined with the SAS macro facility enabled Westfall et al (2001) and O'Brien and Tobias (2001) to use simulation to address important problems in statistical testing for analysis of variance for which analytical results were difficult or even impossible," Tobias says. "The accuracy and completeness of these simulation approaches would not have been feasible without the distributed parallel processing capabilities in MP CONNECT. In particular, for the work of Westfall et al, MP CONNECT made it possible to condense three years' worth of computing time into just a month's worth of desktop time." Tobias has run his application on as many as 150 servers simultaneously.

The use of MP CONNECT extends to the software solutions developed by SAS, as well. Kevin DeBruhl, a systems developer, is currently incorporating MP CONNECT into WebHound Release 4.0. "WebHound makes use of MP CONNECT during all stages of data analysis, starting with reading the data from the disk, analyzing and grouping the data, and generating HTML reports," says DeBruhl. "All of these WebHound tasks can be run in parallel to dramatically reduce the overall time required to process millions of records of Web server data."

MP CONNECT is testimony to SAS' realization that applications need to make efficient use of processing resources to help organizations get the answers they need when they need them. Version 9 will bring enhancements to MP CONNECT as well as the SAS Scalable Architecture (SSA). Enhancements to MP CONNECT will address pipeline parallelization, which allows one procedure to pipe or stream its output as input directly to another procedure running in parallel, bypassing the need to write that data to disk. And the SSA will allow threaded execution within a single procedure that will be transparent to the user and internal to selected SAS procedures. In comparison, MP CONNECT is designed to allow users to modify SAS jobs to allow multiple SAS procedures to execute in parallel. Use of MP CONNECT and SSA, in conjunction with each other or independently, maximizes use of server machines and minimizes the time it takes to get answers.