Torque: a quick introduction
Torque provides a management system for job processing. It is
heavily
based on the Portable Batch System (PBS) and formerly the Network
Queueing System (NQS). You will come across the PBS acronym from time
to time in documentation such as this.
Jobs are submitted to job queues, and are then scheduled to
run
immediately or at some later time, depending on how busy the overall
system is. Users submit the jobs, but Torque decides when a job starts,
chooses which worker node the job runs on, makes sure the job doesn't
overstay its requested time, and manages the return of output files to
the job submitter.
You the user provide a script,
consisting of commands to be processed. This might just contain a
single command, which might be the name of another script, perhaps with
some options or parameters, or the name of a pre-compiled binary file
to be run. Or the script might contain a mixture of control statements
and commands. The script is submitted using the qsub command.
Simple job submission examples
Here's a simple example of job submission:
qsub myjob
where myjob is a file you
prepared earlier with a simple text-mode editor, containing the
following lines:
echo Hello World date hostname echo See you later
That's worth giving a try as a first job. But here's a perhaps more
likely content of your file myjob:
#PBS -l cput=7200,walltime=7200 #PBS -j oe cd "$PBS_O_WORKDIR" gcc -o mybin mysource.c ./mybin
The first two lines here set job options, which could
alternatively
been supplied as options on the qsub command. The first line sets a
resource limit (with the -l option, lowercase L) of 7200 seconds of
processor time and of wall (elapsed) time. The second line requests
that command errors are merged
with the standard output in a single file. The third line changes the
current directory in a job from the home directory to the one current
when the job was submitted. The next line compiles a C
source with the gcc command, to produce a binary file called mybin.
The last line runs that binary file.
When the qsub command is typed in, it replies with a jobid, containing the number of the
job. As soon as the job has finished running, you will find a new file
in the directory, or two files if you didn't request merged errors,
with name(s) by default based on the jobid. You can look at that
output, using the cat or more or less command (for more or less use the space-bar to scroll
through the file, and q to quit), for example:
less myjob.o3243
For more qsub options, see the manual pages for qsub, by
entering
the command man qsub.
Checking the status of jobs
To check the status of your jobs, use the qstat command. It can be used
with various options. A qstat
command on its own summarises jobs, one line per job. With the -a
option (qstat -a),
a similar summary is shown in an alternative output, giving some info
on requested resources. With the -f option (qstat -f), it shows extensive
details for
those jobs, or one particular job if you add its jobid.
Jobs can be in a Queued status, or Running. For other states
and
options consult the manual pages for qstat, by entering the command man qstat.
L.S.Lowe
|