MPI and Torque
MPI implementations
Three MPI-standard
implementations are installed on the e-Science
cluster. There are
The MPICH-GM is installed under /opt/MeSC/mpi-gm. To
use MPICH-GM, add the following path to your environment.
- For bash, export
PATH=/opt/MeSC/mpi-gm/bin:${PATH}
- For csh, setenv PATH
/opt/MeSC/mpi-gm/bin:${PATH}
The MPICH 1.2.6 is installed under /opt/MeSC/mpich-1.2.6.
To use MPICH 1.2.6, add the following path to your environment.
- For bash, export
PATH=/opt/MeSC/mpich-1.2.6/bin:${PATH}
- For csh, setenv PATH
/opt/MeSC/mpich-1.2.6/bin:${PATH}
When you use MPICH, please note that MPI programs compiled with
MPICH-1.2.6 do not work with
MPICH-GM and vice versa.
The LAM MPI 7.1.1 software is the lam package that comes with the
Redhat
distribution, but is on a very
limited number of nodes, and is not currently recommended. If you want
to
use
LAM MPI, you should consult with mesc-cluster-approval
first.
Preparing an MPI program for running
If you are writing an MPI-compliant program using message
passing yourself from scratch, then you need to understand the MPI
standard: this is beyond the scope of this small document!
To compile such a program, set the PATH as above, and then
compile the program using one of the mpi compilers: mpicc for C
programs, mpicxx for C++, and mpif77 for Fortran 77. The program binary
is then ready to run on the cluster.
Running MPICH programs with Torque
When you start a job using Torque which asks for a number of
nodes and processors, then although those processors are allocated to
your job, your job script only starts executing on one of those
processors. To distribute your MPI-compiled job to all your allocated
processors, you need to use a special command as part of your job
script.
/opt/MeSC/bin/mpiexec
is such a command.
To learn more, please enter man
/opt/MeSC/man/man1/mpiexec.1. The mpiexec command will
automatically detect the number and names of nodes allocated to you by
Torque (unlike some other MPI-startup programs).
Here is an example script using mpiexec to start a MPICH-GM
program with Torque using Myrinet:
#!/bin/sh #PBS -l nodes=4:ppn=2:myri #PBS -l walltime=5:00:00,cput=40:00:00 #PBS -j oe
cd $PBS_O_WORKDIR /opt/MeSC/bin/mpiexec -comm mpich-gm mpi_program arg1 arg2
The second line, #PBS -l nodes=4:ppn=2:myri, requests
4 nodes with 2 processors per node. The myri property means
that
this job requires myrinet. In the third line, notice that the cput
(cputime) requested of 40 hours allows for all those processors and so
exceeds the requested walltime (elapsed time) of 5 hours..
Here is another example script which starts a MPICH program with
Torque using
Gigabit Ethernet:
#!/bin/sh #PBS -l nodes=4:ppn=2 #PBS -l walltime=5:00:00,cput=40:00:00 #PBS -q esq #PBS -j oe
cd $PBS_O_WORKDIR /opt/MeSC/bin/mpiexec -comm mpich-p4 mpi_program arg1 arg2
Please be sure to specify the full path for mpiexec, as
shown. LAM
MPI also has a program at /usr/bin/mpiexec which does
not work with MPICH.
|