• Print  |
  • Feedback  |

FOCUS AREAS

Hot Topics

Scalability and Performance Community

Scalability is all about reducing the time-to-solution for your critical tasks. Scalability can be approached from two directions:

Scalability can also be approached from both directions simultaneously:

Successfully scaled performance is not obtainable by simply installing more/faster processors or more/faster I/O devices. Scalability involves making choices between investing in SMP hardware, upgrading I/O configurations, making use of networked machines, reorganizing your data, and how much you are willing to modify your application.

The purpose of this community is to:

Achieving true scalability is a balancing act involving the choice of scalable hardware along with the right software that is specifically designed to leverage it. The information in this community should help you to achieve that balance.


Introduction to Scalability and Performance

Nearly every type of business has seen an explosion in the amount of data that it must store, process, and understand. At the same time, the acceptable time frame for turning all of this data into information continues to grow shorter. The organizations that are in the best position to handle this onslaught of data are those that are able to adjust or scale their hardware and applications to accommodate this increased demand.

Scalability is all about reducing the time-to-solution for your critical tasks. This can be accomplished by performing two or more tasks in parallel (independent parallelization) or overlapping two or more tasks (pipeline parallelization). This requires two things:

It is important to understand that not every application lends itself to scalability and not every hardware configuration is capable of providing scalability.

To decide whether an application should be scaled it is helpful to determine if it takes "too long" to run. This may mean that the time required to run a job exceeds the batch window of time that you have available. Or it may mean that it takes "too long" for you to get the information from your application in order to make timely decisions. Next it is important to identify the pieces of the application that seem to consume the most time. Then you can determine if these portions of your task are compute intensive or if they are I/O bound. This will help you to understand how scalable a particular task may be.

Hardware that is capable of multiprocessing would include symmetric multiprocessing (SMP) machines or multiple machines on a network each containing a single processor. In addition to the number of processors, it is important to have multiple I/O channels. This is inherent to multiple machines on a network. For an SMP machine, this can be accomplished with RAID arrays that allow you to stripe or spread your data across multiple physical disks. Even for a single threaded application, this can improve I/O performance because the operating system is able to read data from multiple drives simultaneously and synchronize the result for the application. For an application that is threaded, not only can the reads be done in parallel, but threads can be used to process the data in parallel as well.

Scalability can be addressed from two directions: scale up and scale out. It is important to realize that these are not mutually exclusive choices. Hardware vendors have responded to the need for scalability by creating SMP machines that provide increased horsepower for solving large, CPU intensive problems. Scaling up, from a hardware perspective, means increasing the number of processors, disk drives, I/O channels, etc. on a single server machine. Scaling out, on the other hand, means adding more hardware, not bigger hardware. When you scale out, the size and speed of an individual machine does not limit the total capacity of your network.

Successfully scaled performance is not obtainable by simply installing more/faster processors or more/faster I/O devices. Scalability involves making choices between investing in SMP hardware, upgrading I/O configurations, making use of networked machines, reorganizing your data, and how much you are willing to modify your application. Achieving true scalability is a balancing act involving the choice of scalable hardware along with the right software that is specifically designed to leverage it. The portion of the original problem that can actually be processed in parallel determines the amount of scalability achievable from the software solution.

Starting in SAS 8 and continuing in SAS 9, enhancements have been made to allow SAS to scale up - fully utilize SMP hardware, and also to allow SAS to scale out - fully utilize distributed processors.

page divider

We at SAS have created the Scalability Community to make you aware of the connectivity and scalability features and enhancements that you can leverage for your SAS installation. The success of this community depends on you. Send electronic mail to scalability@sas.com with your comments, requirements, and suggestions.