Using Queues

Understanding Queues

When a job is submitted for processing on the grid, it is placed in a queue and is held until resources are available for the job. LSF processes the jobs in the queues based on parameters in the queue definitions that establish criteria such as which jobs are processed first, what hosts can process a job, and when a job can be processed. All jobs submitted to the same queue share the same scheduling and control policy. By using multiple queues, you can control the workflow of jobs that are processed on the grid.
By default, SAS uses a queue named NORMAL. To use another queue that is already defined in the lsb.queues file, specify the queue using a queue=queue_name option. You can specify this option in the metadata for the SAS logical grid server (in the Grid Options field), in the job options macro variable referenced in the GRDSVC_ENABLE statement, or in the Grid Options field of a grid options set. For information about specifying a queue in the logical grid server metadata, see Modifying SAS Logical Grid Server Definitions. For information about specifying a queue in a GRDSVC_ENABLE statement, see GRDSVC_ENABLE Function.

Configuring Queues

Queues are defined in the lsb.queues file, which is located in the directory LSF-install-dir\conf\lsbatch\cluster-name\configdir. The file contains an entry for each defined queue. Each entry names and describes the queue and contains parameters that specify the queue's priority and the attributes associated with the queue. For a complete list of parameters allowed in the lsb.queues file, refer to Platform LSF Reference.

Using the Normal Queue

As installed, SAS Grid Manager uses a default queue called NORMAL. If you do not specify the use of a different queue, all jobs are routed to this queue and are processed with the same priority. Other queues enable you to use priorities to control the work on the queues. The queue definition for a normal queue looks like the following:
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
DESCRIPTION = default queue
End Queue

Example: A High-Priority Queue

This example shows the existing queue for high priority jobs. Any jobs in the high-priority queue are sent to the grid for execution before jobs in the normal queue. The relative priorities are set by specifying a higher value for the PRIORITY attribute on the high-priority queue.
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
DESCRIPTION = default queue
End Queue

Begin Queue
QUEUE_NAME = priority
PRIORITY = 40
DESCRIPTION = high priority users
End Queue

Example: A Night Queue

This example shows the existing queue for processing jobs (such as batch jobs) at night. The queue uses these features:
  • The DISPATCH_WINDOW parameter specifies that jobs are sent to the grid for processing only between the hours of 6:00 PM and 7:30 AM.
  • The RUN_WINDOW parameter specifies that jobs from this queue can run only between 6:00 PM and 8:00 AM. Any job that has not completed by 8:00 AM is suspended and resumed the next day at 6:00 PM.
  • The HOSTS parameter specifies that all hosts on the grid except for host1 can run jobs from this queue. Because the queue uses the same priority as the normal queue, jobs from the high-priority queue are still dispatched first. Excluding host1 from the hosts that are available for the night queue leaves one host always available for processing jobs from other queues:
    Begin Queue
    QUEUE_NAME = normal
    PRIORITY = 30
    DESCRIPTION = default queue
    End Queue
    
    Begin Queue
    QUEUE_NAME = priority
    PRIORITY = 40
    DESCRIPTION = high priority users
    End Queue
    
    Begin Queue
    QUEUE_NAME = night
    PRIORITY = 30
    DISPATCH_WINDOW  = (18:00-07:30)
    RUN_WINDOW = (18:00-08:00)
    HOSTS = all ~host1
    DESCRIPTION = night time batch jobs
    End
    Queue

Example: A Queue for Short Jobs

This example shows the existing queue for jobs that need to preempt longer-running jobs. The PREEMPTION parameter specifies which queues can be preempted as well as the queues that take precedence. Adding a value of PREEMPTABLE[short] to the normal queue specifies that jobs from the normal queue can be preempted by jobs from the short queue. Using a value of PREEMPTIVE[normal] to the short queue specifies that jobs from the short queue can preempt jobs from the normal queue. Using a value for PRIORITY on the short queue ensures that the jobs are dispatched before jobs from the normal queue. However, the jobs from the priority queue still take precedence.
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
PREEMPTION = PREEMPTABLE[short]
DESCRIPTION = default queue
End Queue

Begin Queue
QUEUE_NAME = priority
PRIORITY = 40
DESCRIPTION = high priority users
End Queue

Begin Queue
QUEUE_NAME = short
PRIORITY = 35
PREEMPTION = PREEMPTIVE[normal]
DESCRIPTION = short duration jobs 
End Queue

Specifying Job Slot Limits on a Queue

A job slot is a position on a grid node that can accept a single unit of work or SAS process. Each host has a specified number of available job slots. By default, each host is configured with a single job slot for each core on the machine, so a multiple-core machine would have multiple job slots. For information about specifying job slots for a host, see Platform LSF Reference.
You can also use a queue definition to control the number of job slots on the grid or on an individual host that are used by the jobs from a queue. The QJOB_LIMIT parameter specifies the maximum number of job slots on the grid that can be used by jobs from the queue. The HJOB_LIMIT parameter specifies the maximum number of job slots on any one host that can be used by the queue. The following example sets a limit of 60 job slots across the grid that can be used concurrently by the normal queue and a limit of 2 job slots on any host that can be used.
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
DESCRIPTION = default queue
QJOB_LIMIT = 60
HJOB_LIMIT = 2
End Queue

Working with Mismatches between Queues and Workspace Servers

Defining a queue that specifies fewer hosts than are available for processing can cause an error. For example, you might want to have jobs in a certain queue only be processed by the machine host2, so the queue definition would contain the line HOSTS = host2. You might also have a grid-launched logical workspace server definition that includes workspace servers for host1, host2, and host3. When the object spawner receives a request to start a workspace server, it then submits a job to Platform LSF to start the workspace server. Because the logical workspace server definition includes more hosts (host1, host2, host3) than the queue (host2), an error message results.
To accommodate this type of environment, add the line ENABLE_HOST_INTERSECTION=Y to the lsb.params file.