Hot Topics

SAS Data Integration & Grid Benchmarking Results

The Data Integration Challenge: A New More Complex Scenario:

A prior press release on data loading clearly demonstrates SAS is the world leader in throughput performance for data integration with a bulk loading result of 5.97 terabyte / hour. With this press release, SAS is taking Data Integration testing and benchmarking to a new level. A new level which is more representative of the complexity that customers experience in their day to day operations. SAS goes beyond the high speed throughput testing that other data integration vendors focus on, and encourages their competitors to follow their lead in more realistic benchmarking. Details of the scenario and test results are provided below for customers and competitors:

SAS Test Results:

SAS executed this benchmarking scenario on a Sun E25K running Solaris 10 with 24 1.95GHz US-IV+ processors (48 cores). The Sun E25K was partitioned into multiple domains to create multiple nodes managed by SAS Grid Manager to facilitate parallel execution in order to maximize throughput. SAS was able to run the entire scenario from start to finish in two hours, 36 minutes. This execution time is well within the ETL / Data Integration nightly batch window for many large corporations. Scalability was nearly linear and additional resources (CPU, RAM and I/O) could be brought online to further reduce runtime if required.

Scenario Overview:

A large international retailer needs to quickly build a new report ready data mart containing three years of transaction data. The benchmarking scenario used to build the mart includes the complete runtime for all phases of generating the data mart from start to finish. This scenario includes: dimension creation, integrity constraints, data validation, dimension lookups and indexing for both dimension and fact tables.

Execution Details:

The scenario has two phases, dimension table creation and fact table creation. Both phases can be executed in parallel. SAS Data Integration Studio was used to create and execute the ETL workflows for building the report mart. From within SAS Data Integration Studio, users can create, schedule and execute work flows that build report marts on a single server or multiple servers using SAS Grid Manager (this capability is built into SAS Data Integration Studio).

Data / Model Details:

The target data model for the data mart is a star schema composed of a single transaction fact table and dimension tables for: customer, product, store and time. Note that Type-II slowly changing dimensions is used to build the dimension tables (a more complex task than a simple dimension table bulk load creation).

Future Efforts:

Understand that data integration requirements vary greatly from customer to customer. However, by executing larger and more complex ETL workflows, SAS demonstrates that their software can tackle customer's toughest requirements for ETL. SAS continues to improve their benchmarking scenarios by adding more complexity and volumes of data to match real world customer problems. If there are things you would like to see in future data integration benchmark efforts, please share them with your local SAS representitive.


For further information and details on these tests, please contact your local SAS Representative. SAS is happy to provide in depth reviews into this effort and discuss how these tests and results can be leveraged in your environment to help accomplish your business requirements.