Submitting Jobs Via PBS on Darwin

 

 

PBS, the Portable Batch System, is a networked subsystem for submitting, monitoring, and controlling a work load of batch jobs on one or more systems.  With PBS, jobs can be scheduled for execution across the darwin nodes according to scheduling policies that attempt to fully utilize system resources without over committing those resources, while being fair to all users.  For more information about PBS, see the online manual page, which can be viewed by executing the command:

 

     man pbs

 

Jobs are submitted to be run under PBS via the qsub command.  For complete details on using qsub, see the online manual page, which can be viewed by executing the command:

 

     man qsub

 

A job is represented by a shell script which contains the commands which should be run.  A simple PBS job script file might contain lines like the following:

 

#PBS –l cput=10:00:00,ncpus=2

cd project1

./myprog

 

The job script file at its simplest can be just a sequence of commands to be executed.  But in general, it can be any script.  The #PBS statement is a special comment statement used to specify job parameters to PBS.  Specify values for cput (CPU time required) and ncpus (number of CPUs required) that are appropriate for your job.  The parameters on the #PBS statement can also be specified with the qsub command on the command line when the job is submitted, but it is convenient to put them in the job file so that they aren’t forgotten. 

 

The cput parameter specifies the maximum amount of CPU time that the job will be allowed to consume.  If a job uses more CPU time than the amount specified by cput, it will be terminated by PBS.  If cput is not specified, a default time limit of 10 minutes (10:00) will be used.

 

The ncpus resource specification is very important for job scheduling on darwin.  Job scheduling is based upon the load level of each node and the number of CPUs that are in use.  The PBS scheduler cannot “look inside” of a job to determine how many CPUs it will use, so the scheduler must be told how many CPUs a job will use via the ncpus resource.  Please be sure that the value of ncpus is appropriate for your job.  If the value of ncpus is too high, then your job may wait in the queue, even though CPU resources are available.  If the value of ncpus is too low, then your job may be cancelled if it actually uses more CPUs than you specified.  If ncpus is not specified, PBS will assume that 1 CPU is needed.   (Note:  A job on darwin can use at most 4 CPUs, unless it is designed to run in a parallel environment.  However, it is recommended that no more than 2 CPUs be used.  Jobs that request more than 2 CPUs may have to wait longer in the job queue, depending upon system load.)

 

Once the job script has been created, it can be submitted for execution via the qsub command, as follows:

 

     qsub scriptfile


 

The qsub command will respond with a line like the following:

 

     nnnn.darwin0

 

where nnnn is the job number assigned to the job, and darwin0 is the name of the PBS server to which the job was submitted.  (It doesn’t matter which darwin node you execute the qsub command on.  The PBS server name, however, will always be darwin0.)  The job number can by used to identify your job with other PBS commands, like qstat and qdel.  The job number is also used to identify output files for your job. 

 

When the job is executed, PBS will create an environment for it that is as much like your normal login environment as is possible.  The job will run under your userid, and the initial working directory will be your home directory.  If the files your jobs uses are not in your home directory, use relative pathnames or full pathnames as appropriate.

 

After the job completes execution, there will be two output files created in the directory from which the job was submitted:

 

            scriptfile.ennnn    (for output written to the error output stream)

             scriptfile.onnnn    (for output written to the standard output stream)

 

where scriptfile is the name of the script file you specified on the qsub command (or STDIN if you did not specify a script file, and entered commands for the job from standard input), and nnnn is the job number.

 

There are several job queues that have been created on darwin.  But in general, you should not specify a queue name when submitting the job unless asked to do so by the system administrator, for specific types of jobs.  Jobs will be routed by default into an appropriate execution queue based upon job resource requirements.

 

Use the qstat command to see which jobs are queued/executing on darwin.  See the man page for qstat for complete details.

 

 

For more information about accessing darwin in general, see Darwin Access Notes.