SAS Institute. The Power to Know

FOCUS AREAS

Texas Tech University Performs Stock Price Analysis in Hours Instead of Days

The promise of extraordinary increases in computing ability for minimal investment in IT has made grid computing an attractive topic in business, academic and research environments worldwide. With limited IT budgets and increased workload, Texas Tech University (TTU) deployed a SAS compute grid to optimize resources campus-wide. TTU is utilizing their SAS compute grid to leverage unused and underutilized computing resources to attain information that would otherwise take weeks or months to process.

Problem Description

Solution

Benefits

Problem Description

TTU developed statistical resampling methods to determine whether announcements and other historical events affect stock prices. Resampling is a compute-intensive method where the data are sampled repeatedly (say 10,000 times) with or without replacement. In addition, each resampled data set required costly matrix inversions. Adding to this computational complexity, the resampling procedure itself was studied using 10,000 simulations for a total of 100,000,000 data sets to be processed. The problem grew even larger when 10 parameters were attached to each simulation which resulted in 1 billion data sets.

Previous versions of this research relied on a "sneaker grid," where parcels of code reflecting portions of the billion data sets were given to graduate students to run overnight on their machines. The so-called "sneaker grid" is thus named because the process can be viewed as a person running from office to office "in his sneakers," handing out parcels of code. The results were then collated (essentially manually) from output files, and the "sneaker grid" process was repeated over multiple nights until the 1 billion data sets had been processed.

Solution

SAS/CONNECT provides an essential tool for distributing jobs across a network. TTU combined the distribution capabilities of SAS/CONNECT with high-powered SAS analytics to implement their financial application on a grid.

TTU's SAS grid is comprised of 200+ high powered Windows machines in the computer labs of the Rawls College of Business Administration at TTU; only 100 of these machines are available at any given time because of the number of SAS licenses purchased by TTU. Available licenses are managed though a keyserver application. Thus, the computing environment can be conceptually viewed as a virtual 100 node (2.66 GHz per node) super computer with 100 Gigs of combined RAM. These computers are used during the day by students to complete their daily assignments; SAS grid jobs have been run while students are using them without affecting performance. However, the prime opportunity to leverage these resouces for grid computing is during off peak hours and nights when students have no need for these machines.

Benefits

The grid computing capabilities of SAS/CONNECT offer a fantastic advantage over the sneaker grid in that the jobs to process the one billion data sets are all sent at the same time and all data are sent back to the client machine for automatic summarizing using SAS analytics. TTU used 40 minutes of compute time on their SAS compute farm and would have used 25 hours on a single machine. It should be noted that the savings (40 minutes versus 25 hours) is actually quite a bit more when considering the "false starts" and minor errors that always accompany job execution: there were approximately four false starts, so the total savings is really 5x40 minutes = 3.3 hours, vs. 5x25 hours = 125 hours. This represents a 97% performance gain from leveraging the SAS grid technology.

As a result of TTU's initial success with their SAS grid, they are currently implementing their next SAS grid application which is a portfolio selection and analysis project. The study involves randomly forming 300 portfolios, each comprised of 50 securities taken from the CRSP daily database, and then randomly choosing a one year sequence of daily stock prices. There are over 20,000 securities in the CRSP database; a subsetted SAS data set with essential variables required 1.362 Gigabytes. Each portfolio requires 127,500 models using PROC AUTOREG of SAS/ETS. On an 866 Megahertz PC the computations for each portfolio take approximately 40 hours and the entire analysis would require around 500 days of continuous compute time on a dedicated machine. The only feasible solution to this computation problem is to use SAS grid computing.

In the Fall 2004 TTU will increase their SAS grid to 300 machines running SAS 9 and implement several more SAS applications from multiple areas across the university.

References

Hein, S.E. and Westfall, P.H. (2004). Improving Tests of Abnormal Returns by Bootstrapping the Multivariate Regression Model with Event Parameters. Journal of Financial Econometrics 2, 451-471.

Bremer R., Perez J., Smith P. and Westfall, P.H. (2004). Grid Computing at Texas Tech University Using SAS. manuscript.