Richards Center at Yale University
RC Home | Search | Table of Contents | General Information

Last Modified: Thursday, 15-Jun-2017 18:52:15 EDT

Using Batch Queues in the CSB Core

Table of Contents

  • What Types of Queues are Available?
  • Where Should I Run My Job?
  • What Specific Queues are Available?
  • Setting up a SGE Job
  • Common SGE Commands
  • Miscellaneous Information
  • How to Get More Information
  • CSB Policy on Batch Queues

    The CSB policy on batch queues and compute-intensive jobs requires that batch queues be used in many instances. See the Batch Queues and Compute-Intensive Jobs page of the Policies and Practices section.

    What are Batch Queues

    Batch queues are implemented in the CSB Core through the Sun Grid Engine (SGE). Batch queues allow one or more computer programs to be combined into a job, which is executed on one of several computers. Batch queues provide the following features:
  • Jobs can be started (submitted), monitored and controlled from any of the CSB unix computers, regardless of where they will be executed.
  • The SGE system will determine which computer is the best place to run your job based on existing job load.
  • Your job runs at an appropriate priority, guaranteeing an equitable distribution of computing time.
  • For most jobs which can be run using SGE, CSB policy requires that you use SGE.
  • Summary of Changes

    What are Batch Queues?

    Batch queues are implemented in the CSB Core through the Sun Grid Engine (SGE). Batch queues allow one or more computer programs to be combined into a job, which is executed on one of several computers. Batch queues provide the following features:

  • Jobs can be started (submitted), monitored and controlled from any of the CSB unix computers, regardless of where they will be executed.
  • The SGE system will determine which computer is the best place to run your job based on existing job load.
  • Your job runs at an appropriate priority, guaranteeing an equitable distribution of computing time.
  • For most jobs which can be run using SGE, CSB policy requires that you use SGE.
  • What Types of Queues are Available?

    Four types of queue are available based on maximum clock time, and one queue based on type of job (GPU-enabled cryo-EM applications). The resources short, medium, long and sponge specify that your job will take no more than 4 hours, 24 hours, 4 days and 28 days respectively on all servers. On a given host, short jobs will be allocated more time than medium jobs, which will get more time than long jobs. Sponge jobs, however, will generally get no time if any other category is also running. The queuing system will kill jobs which exceed the stated time limit.

    There is also a separate 7-day "cryo" queue for running on a multi-GPU server. This is primarily for cryoEM jobs (hence the name), but molecular dynamics will soon be available as well.

    To submit a job named test.com to a medium queue on the most-available computer, you type:

    qsub -q medium test.com

    For other examples, see the section, Common SGE Commands.

    If you incorrectly specify the resource(s) that you need, SGE will complain. For example, if you type, qsub -q med ... instead of qsub -q medium ..., your job submission will fail with an error message.

    Back to Top

    Where Should I Run My Job?

    Usually, you will want to run your job in a queue with the shortest time limit, and therefore the highest priority. IE, if you know your job will take about 1 day, you will want to run it in a long queue. The simplest thing is to use the command,

    qsub -q long (jobname)
    

    At this writing (2/2011), the SGE system will select which computer will run your job based on current job load, with the more powerful systems given greater weight. If this is a problem, see Miscellaneous Information below.

    Back to Top

    What Specific Queues are Available?

    These queues are available, as of 15 June 2017. However, the configuration changes faster than the documentation. For a list of active queues, use the command, qstat -f.
     

    Batch queues available in CSB Core.
    Host Type
    #CPUs 
    Mem/ 
    (GB)
    short 
    (4 hr)
    medium 
    (24 hrs)
    long 
    (4 days)
    sponge 
    (28 days)
    cryo 
    (7 days)
    crunch6 linux 12 64 X X X X  
    cryocrunch linux 16 64         X

    Back to Top

    Setting up a SGE Job

    1. Create a SGE script file for your job. This is simply a script for your favorite shell, containing the commands that you want to execute in batch. Unless you specifically include a line to "cd" to a particular directory at the beginning of your script, your job will be run from your home directory, so define your file paths accordingly.

    2. Submit your job to the appropriate queue, using the qsub command.

    Common SGE commands

    Summary Examples

    qsub  -q medium test.com
    qstat -f -q medium
    qdel 67
    

    Submitting a job -- qsub

    qsub -q medium test.com
    qsub -e test.err -o test.out -q short-crunch3 test.com
    

    See the SGE documentation for a list of all possible options. Generally, options can be included on the command line, or in your command file, as shown in the example in the section on Setting up a SGE Job.

    Discovering queue status -- qstat

    qstat
    qstat -f -q long 
    

    Deleting a job -- qdel

    Each DQS job has a unique job-id. The job id is reported when you submit the job, and can be discovered at any time using the command qstat. To delete a job, use the command

    qdel jobid
    

    Back to Top

    Miscellaneous Information

    Trouble with specific batch computers

    With the current SGE setup, it is no longer possible to select a particular computer in the batch queuing system for running your job. Since all of the batch computers are supposed to be configured the same, this should not normally be a problem. However, if you find that your job doesn't run when submitted to a particular machine, contact the Core Staff for assistance.

    Back to Top

    How to Get More Information

    The documentation that came with the SGE system is available in PDF format. SGE is a large and complex system, of which we use only a small part, so the full document set can be overwhelming.

    A User Guide describes the concepts and workings of SGE.

    The Installation and Administration manuals might be of some help to the staff.

    Documentation for the old DQS software is also available: User Guide, Reference, and Installation/Maintainance.

    Last Modified: Thursday, 15-Jun-2017 18:52:15 EDT


    RC Home | Search | Table of Contents | General Information
    Richards Center (www.rc.yale.edu) at Yale University (www.yale.edu)
    Contact: michael^strickler_at_yale^edu