Midlands e-Science Center University of Birmingham, dti e-Science Grid

MPI and Torque

MPI implementations

Three MPI-standard implementations are installed on the e-Science cluster. There are

The MPICH-GM is installed under /opt/MeSC/mpi-gm. To use MPICH-GM, add the following path to your environment.

  • For bash, export PATH=/opt/MeSC/mpi-gm/bin:${PATH}
  • For csh, setenv PATH /opt/MeSC/mpi-gm/bin:${PATH}

The MPICH 1.2.6 is installed under /opt/MeSC/mpich-1.2.6. To use MPICH 1.2.6, add the following path to your environment.

  • For bash, export PATH=/opt/MeSC/mpich-1.2.6/bin:${PATH}
  • For csh, setenv PATH /opt/MeSC/mpich-1.2.6/bin:${PATH}
When you use MPICH, please note that MPI programs compiled with MPICH-1.2.6 do not work with MPICH-GM and vice versa.

The LAM MPI 7.1.1 software is the lam package that comes with the Redhat distribution, but is on a very limited number of nodes, and is not currently recommended. If you want to use LAM MPI, you should consult with mesc-cluster-approval first.

Preparing an MPI program for running

If you are writing an MPI-compliant program using message passing yourself from scratch, then you need to understand the MPI standard: this is beyond the scope of this small document!

To compile such a program, set the PATH as above, and then compile the program using one of the mpi compilers: mpicc for C programs, mpicxx for C++, and mpif77 for Fortran 77. The program binary is then ready to run on the cluster.


Running MPICH programs with Torque

When you start a job using Torque which asks for a number of nodes and processors, then although those processors are allocated to your job, your job script only starts executing on one of those processors. To distribute your MPI-compiled job to all your allocated processors, you need to use a special command as part of your job script.

/opt/MeSC/bin/mpiexec is such a command. To learn more, please enter man /opt/MeSC/man/man1/mpiexec.1. The mpiexec command will automatically detect the number and names of nodes allocated to you by Torque (unlike some other MPI-startup programs).


Here is an example script using mpiexec to start a MPICH-GM program with Torque using Myrinet:


#!/bin/sh
#PBS -l nodes=4:ppn=2:myri
#PBS -l walltime=5:00:00,cput=40:00:00
#PBS -j oe

cd $PBS_O_WORKDIR
/opt/MeSC/bin/mpiexec -comm mpich-gm mpi_program arg1 arg2

The second line, #PBS -l nodes=4:ppn=2:myri, requests 4 nodes with 2 processors per node. The myri property means that this job requires myrinet. In the third line, notice that the cput (cputime) requested of 40 hours allows for all those processors and so exceeds the requested walltime (elapsed time) of 5 hours..


 Here is another example script which starts a MPICH program with Torque using Gigabit Ethernet:

#!/bin/sh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=5:00:00,cput=40:00:00
#PBS -q esq
#PBS -j oe

cd $PBS_O_WORKDIR
/opt/MeSC/bin/mpiexec -comm mpich-p4 mpi_program arg1 arg2

Please be sure to specify the full path for mpiexec, as shown. LAM MPI also has a program at /usr/bin/mpiexec which does not work with MPICH.