Difference between revisions of "Install and configure MPI"
(29 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Estel]] | [[Category:Estel]] | ||
+ | This article explains how to install and configure MPI to be able to run the '''[[Estel | ESTEL]]''' model in parallel on a network of computers. Note that this article is merely a quick run through the [http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-doc-install.pdf MPI Installer's Guide] at http://www-unix.mcs.anl.gov/mpi/mpich2/. | ||
− | + | This article is general an can be used to configure MPI for any application requiring a Fortran 90 compiler really. | |
− | = Pre- | + | = Pre-requesites= |
− | [[ | + | == ssh key authentication == |
+ | To use MPI on a network of computers, you need to be able to log in any of the computer without user interaction (password etc...) This is easily achieved using secure shell key authentication. The methodology to setup ssh to use key authentication is described in the article entitled "[[Configure_ssh_for_MPI| Configure ssh for MPI]]". | ||
+ | == Fortran 90 compiler == | ||
+ | You need a Fortran90 compiler to compile and run the TELEMAC system. When running simulations in parallel mode, MPI use a wrapper to your existing compiler, usually called <code>mpif90</code>. This wrapper is built when MPI is compiled and therefore, you need to have a Fortran 90 compiler installed ''before'' you attempt to compile MPI. | ||
= Download MPI = | = Download MPI = | ||
+ | You can download from http://www-unix.mcs.anl.gov/mpi/mpich2/. You will end up with a gzipped tarball called something like <code>mpich2.tar.gz</code> with a version number in the name as well. Note that the TELEMAC system uses MPI-1 statements but we encourage you to install MPICH-2 (which is backward compatible) as TELEMAC will probably move towards MPI 2 at some point in the future. | ||
− | + | For the sake of this article, we assume that you have extracted the tarball into a directory called <code>/path/to/mpi-download/</code>. | |
− | + | = Compilation = | |
+ | The compilation of MPI is fairly straightforward but beforehand, you need to create an install folder (called <code>/path/to/mpi/</code> here) and a build folder (called <code>/path/to/mpi-build/</code> here): | ||
+ | <code><pre> | ||
+ | $ mkdir /path/to/mpi | ||
+ | $ mkdir /tmp/mpi-build | ||
+ | </pre></code> | ||
− | + | To configure the MPI build, the only required step is to assign to the environment variable <code>F90</code> the name of your Fortran 90 compiler. This name needs to be in your <code>PATH</code> or you have to give the full path to the Fortran 90- compiler if not. Then the <code>configure</code> command will automatically configure the build for you: | |
− | + | <code><pre> | |
− | tmp | + | cd /tmp/mpi-build |
− | export | + | export F90=f90compiler |
− | + | /path/to/mpi-download/configure --prefix=/path/to/mpi 2>&1 | tee configure.log | |
− | + | </pre></code> | |
− | + | This will test many factors on your machine and configure the build. If the <code>configure</code> command finished without problem, you are ready to build MPI. Note that you can inspect <code>configure.log</code> for problems. In particular, you want to make sure that the Fortran 90 compiler is OK for the build. | |
− | make | + | To compile and install MPI, just issue the standard <code>make</code> and <code>make install</code> commands: |
+ | <code><pre> | ||
+ | make | ||
make install | make install | ||
− | + | </pre></code> | |
− | = Configuration = | + | |
− | + | This will install MPI in <code>/path/to/mpi/</code>. | |
+ | |||
+ | Note that you will need to install MPI on all the nodes in your network that will be used for MPI jobs. As they probably have the same computer architecture, you could just copy the <code>/path/to/mpi</code> accross. | ||
+ | |||
+ | = Configuration of MPI= | ||
+ | == PATH == | ||
+ | Add <code>/path/to/mpi/bin</code> to <code>PATH</code>. This often means adding the following lines to your <code>.bashrc</code> file: | ||
+ | <code><pre> | ||
+ | PATH=/path/to/mpi/bin:$PATH | ||
+ | export PATH | ||
+ | </pre></code> | ||
+ | |||
+ | Now, you should be able to see the MPI Fortran 90 wrapper <code>mpif90</code> and the MPI multiprocessor daemon <code>mpd></code>: | ||
+ | |||
+ | <code><pre> | ||
+ | $ which mpif90 | ||
+ | /path/to/mpi/bin/mpif90 | ||
+ | $ which mpd | ||
+ | /path/to/mpi/bin/mpd | ||
+ | </pre></code> | ||
+ | |||
+ | Note that the <code>PATH</code> needs to be set on each node which will be used for MPI simulations. This is straightforward if your home directory is shared for all the nodes or you need to do it manually otherwise. | ||
+ | |||
+ | == Password in <code>.mpd.conf</code> == | ||
+ | MPI requires a file in your home directory called <code>.mpd.conf</code> (yes, there are two dots) which contains the line: | ||
+ | <code><pre> | ||
+ | secretword=something_secret_but_don't_use_your_real_password | ||
+ | </pre></code> | ||
+ | |||
+ | This file should be readable and writable only by you. | ||
+ | |||
+ | '''Check if this needs to be done on all nodes.... and if the secret thing needs to be the same. Probably yes.''' | ||
+ | |||
+ | Now that the passord has been set, you should be able to start the multiprocessor daemon on one host with the command <code>mpd &</code>. You can check it is up with <code>mpdtrace</code>, run a simple command via MPI with <code>mpiexec</code> and bring it down with <code>mpdallexit</code>, for instance when trying on the master: | ||
+ | |||
+ | <code><pre> | ||
+ | master $ mpd & | ||
+ | [1] 17187 | ||
+ | master $ mpdtrace | ||
+ | master | ||
+ | master $ mpiexec -l -n 2 hostname | ||
+ | 0: master | ||
+ | 1: master | ||
+ | master $ mpdallexit | ||
+ | </pre></code> | ||
+ | |||
+ | It is interesting to note that we asked for two processes with the <code>mpiexec</code> command with the arguments <code>-n 2</code> although we have only one machine in our network yet. The <code>mpd</code> is intelligent enough to just wrap around. In the output above, the <code>hostname</code> is used and we can see that processes 0 and 1 return the sdame value so thay are definitely run on the same host. This is very useful to test parallel programs on a single machine. | ||
+ | |||
+ | == Hosts in <code>mpd.hosts</code> == | ||
+ | |||
+ | To be able to send the processes to other hosts on the network, create a file in your home directory called <code>mpd.hosts</code> which contain a list of the nodes to be used by MPI, one per line. If the network consists of the master and two slaves slave1 and slave2, <code>mpd.conf</code> would contain: | ||
+ | |||
+ | <code><pre> | ||
+ | master.full.domain | ||
+ | slave1.full.domain | ||
+ | slave2.full.domain | ||
+ | </pre></code> | ||
+ | |||
+ | This file should be created on the master node, i.e. the one that you will use to launch MPI jobs. All the hosts in the list need to have a working installation of MPI. | ||
+ | |||
+ | = Using an <code>mpd</code> ring = | ||
+ | |||
+ | To dispatch MPI rocesses to other hosts on the network, we need to start a ring of multiprocessor daemons, which we will simply call a <code>mpd</code> ring. The ring is started from the master node and will include the nodes in the <code>mpd.hosts</code> file. | ||
+ | |||
+ | == Start the ring == | ||
+ | The <code>mpd</code> ring is started from the master node with the <code>mpdboot</code> command. The syntax is self-explanatory: | ||
+ | <code><pre> | ||
+ | master $ mpd -n 3 -f ~/mpd.hosts | ||
+ | </pre></code> | ||
+ | |||
+ | In the example above, "3" is the number of nodes to include in the ring. By default, the master is always included in the ring. | ||
+ | The option <code>-f</code> is used to specify the name of the hosts file. | ||
+ | |||
+ | == Test the ring == | ||
+ | The command <code>mpdtrace</code> can be used to list which machines are in the ring: | ||
+ | <code><pre> | ||
+ | master $ mpdtrace | ||
+ | master | ||
+ | slave1 | ||
+ | slave2 | ||
+ | </pre></code> | ||
+ | |||
+ | To test the ring, use <code>mpdringtest</code>. That will send a message to circle around the ring (a loop) and tell you how long it took. One loop is very fast so make the message circle round a few times: | ||
+ | |||
+ | <code><pre> | ||
+ | master $ mpdringtest 100 | ||
+ | time for 1000 loops = 0.998109102249 seconds | ||
+ | </pre></code> | ||
+ | |||
+ | You can also send a command in parallel using <code>mpiexec</code>. The option <code>-l</code> will append the process number before the output of the command. The option <code>-n 5</code> request the job to start 5 processes, as we only have 3 hosts in the ring, the processes will "wrap around" as shown in the listing below; "master" and "slave2" have been used twice: | ||
+ | <code><pre> | ||
+ | master $ mpiexec -l -n 5 hostname | ||
+ | 2: slave1 | ||
+ | 1: master | ||
+ | 4: master | ||
+ | 0: slave2 | ||
+ | 3: slave2 | ||
+ | </pre></code> | ||
+ | |||
+ | If you can do this, '''congratulations'''! You should be able to run parallel [[Estel | ESTEL]] jobs on your ring now! | ||
+ | |||
+ | Just remember to start a <code>mpd</code> ring before running your parallel job. | ||
+ | |||
+ | == Close the ring == | ||
+ | The command <code>mpdallexit</code> is used to terminate the <code>mpd</code> ring: | ||
+ | <code><pre> | ||
+ | master $ mpdallexit | ||
+ | </pre></code> | ||
+ | |||
+ | = Trouble shooting = | ||
+ | The [http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-doc-install.pdf MPI Installer's Guide] has a good section about troubleshooting (Appendix A). To summarise things that might be helpful: | ||
+ | * make sure you can start <code>mpd</code> on each host separately first | ||
+ | * check that the <code>/etc/hosts</code> on each host is correct and has an entry for each host with the right IP address. | ||
+ | * read the [http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-doc-install.pdf MPI manual]! | ||
+ | * take your time... |
Latest revision as of 14:29, 14 February 2008
This article explains how to install and configure MPI to be able to run the ESTEL model in parallel on a network of computers. Note that this article is merely a quick run through the MPI Installer's Guide at http://www-unix.mcs.anl.gov/mpi/mpich2/.
This article is general an can be used to configure MPI for any application requiring a Fortran 90 compiler really.
Pre-requesites
ssh key authentication
To use MPI on a network of computers, you need to be able to log in any of the computer without user interaction (password etc...) This is easily achieved using secure shell key authentication. The methodology to setup ssh to use key authentication is described in the article entitled " Configure ssh for MPI".
Fortran 90 compiler
You need a Fortran90 compiler to compile and run the TELEMAC system. When running simulations in parallel mode, MPI use a wrapper to your existing compiler, usually called mpif90
. This wrapper is built when MPI is compiled and therefore, you need to have a Fortran 90 compiler installed before you attempt to compile MPI.
Download MPI
You can download from http://www-unix.mcs.anl.gov/mpi/mpich2/. You will end up with a gzipped tarball called something like mpich2.tar.gz
with a version number in the name as well. Note that the TELEMAC system uses MPI-1 statements but we encourage you to install MPICH-2 (which is backward compatible) as TELEMAC will probably move towards MPI 2 at some point in the future.
For the sake of this article, we assume that you have extracted the tarball into a directory called /path/to/mpi-download/
.
Compilation
The compilation of MPI is fairly straightforward but beforehand, you need to create an install folder (called /path/to/mpi/
here) and a build folder (called /path/to/mpi-build/
here):
$ mkdir /path/to/mpi
$ mkdir /tmp/mpi-build
To configure the MPI build, the only required step is to assign to the environment variable F90
the name of your Fortran 90 compiler. This name needs to be in your PATH
or you have to give the full path to the Fortran 90- compiler if not. Then the configure
command will automatically configure the build for you:
cd /tmp/mpi-build
export F90=f90compiler
/path/to/mpi-download/configure --prefix=/path/to/mpi 2>&1 | tee configure.log
This will test many factors on your machine and configure the build. If the configure
command finished without problem, you are ready to build MPI. Note that you can inspect configure.log
for problems. In particular, you want to make sure that the Fortran 90 compiler is OK for the build.
To compile and install MPI, just issue the standard make
and make install
commands:
make
make install
This will install MPI in /path/to/mpi/
.
Note that you will need to install MPI on all the nodes in your network that will be used for MPI jobs. As they probably have the same computer architecture, you could just copy the /path/to/mpi
accross.
Configuration of MPI
PATH
Add /path/to/mpi/bin
to PATH
. This often means adding the following lines to your .bashrc
file:
PATH=/path/to/mpi/bin:$PATH
export PATH
Now, you should be able to see the MPI Fortran 90 wrapper mpif90
and the MPI multiprocessor daemon mpd>
:
$ which mpif90
/path/to/mpi/bin/mpif90
$ which mpd
/path/to/mpi/bin/mpd
Note that the PATH
needs to be set on each node which will be used for MPI simulations. This is straightforward if your home directory is shared for all the nodes or you need to do it manually otherwise.
Password in .mpd.conf
MPI requires a file in your home directory called .mpd.conf
(yes, there are two dots) which contains the line:
secretword=something_secret_but_don't_use_your_real_password
This file should be readable and writable only by you.
Check if this needs to be done on all nodes.... and if the secret thing needs to be the same. Probably yes.
Now that the passord has been set, you should be able to start the multiprocessor daemon on one host with the command mpd &
. You can check it is up with mpdtrace
, run a simple command via MPI with mpiexec
and bring it down with mpdallexit
, for instance when trying on the master:
master $ mpd &
[1] 17187
master $ mpdtrace
master
master $ mpiexec -l -n 2 hostname
0: master
1: master
master $ mpdallexit
It is interesting to note that we asked for two processes with the mpiexec
command with the arguments -n 2
although we have only one machine in our network yet. The mpd
is intelligent enough to just wrap around. In the output above, the hostname
is used and we can see that processes 0 and 1 return the sdame value so thay are definitely run on the same host. This is very useful to test parallel programs on a single machine.
Hosts in mpd.hosts
To be able to send the processes to other hosts on the network, create a file in your home directory called mpd.hosts
which contain a list of the nodes to be used by MPI, one per line. If the network consists of the master and two slaves slave1 and slave2, mpd.conf
would contain:
master.full.domain
slave1.full.domain
slave2.full.domain
This file should be created on the master node, i.e. the one that you will use to launch MPI jobs. All the hosts in the list need to have a working installation of MPI.
Using an mpd
ring
To dispatch MPI rocesses to other hosts on the network, we need to start a ring of multiprocessor daemons, which we will simply call a mpd
ring. The ring is started from the master node and will include the nodes in the mpd.hosts
file.
Start the ring
The mpd
ring is started from the master node with the mpdboot
command. The syntax is self-explanatory:
master $ mpd -n 3 -f ~/mpd.hosts
In the example above, "3" is the number of nodes to include in the ring. By default, the master is always included in the ring.
The option -f
is used to specify the name of the hosts file.
Test the ring
The command mpdtrace
can be used to list which machines are in the ring:
master $ mpdtrace
master
slave1
slave2
To test the ring, use mpdringtest
. That will send a message to circle around the ring (a loop) and tell you how long it took. One loop is very fast so make the message circle round a few times:
master $ mpdringtest 100
time for 1000 loops = 0.998109102249 seconds
You can also send a command in parallel using mpiexec
. The option -l
will append the process number before the output of the command. The option -n 5
request the job to start 5 processes, as we only have 3 hosts in the ring, the processes will "wrap around" as shown in the listing below; "master" and "slave2" have been used twice:
master $ mpiexec -l -n 5 hostname
2: slave1
1: master
4: master
0: slave2
3: slave2
If you can do this, congratulations! You should be able to run parallel ESTEL jobs on your ring now!
Just remember to start a mpd
ring before running your parallel job.
Close the ring
The command mpdallexit
is used to terminate the mpd
ring:
master $ mpdallexit
Trouble shooting
The MPI Installer's Guide has a good section about troubleshooting (Appendix A). To summarise things that might be helpful:
- make sure you can start
mpd
on each host separately first - check that the
/etc/hosts
on each host is correct and has an entry for each host with the right IP address. - read the MPI manual!
- take your time...