Add MPI support

Issue #27 new
Olivier Sallou repo owner created an issue

Using docker container may be an issue to support MPI , due to port range exposure. We would need to force the port range. Need to also to add a link between main container and child containers for ssh communication (using a port different from standard one, which cannot be specified to mpirun hostfile)

A solution could be to use native command with Apache Mesos (not with swarm), using mpi auto port allocation, but ssh port allocation/specification issue remains.

Proposal:

  1. manage like job arrays (launch N jobs) but scheduler need to schedule all or nothing
  2. child nodes are interactive nodes (just launch ssh)
  3. main node execs user command (a mpirun command) using a hostlist file generated by scheduler $GOD_HOME/hostlist.txt (for example)
  4. when main node job ends, kill child nodes

Comments (2)

  1. Olivier Sallou reporter

    What is the issue:

    • mpirun execs program in a host list via ssh: needs an ssh access with passwordless access (for our case). ssh port cannot be specified (but with containers, all ssh goes through custom ports to avoid clashes). This could be managed.
    • once executed, messages go through a IP port ranges (not specified) that Docker needs to be aware off (for port mapping or container linking). This is the main issue. Using native access with Mesos would remove such mapping, but still requires ssh access to the "container". With native mesos, there is no container, only program isolation (command would be an sshd on a specific port?)
  2. Olivier Sallou reporter

    Start N containers with sshd

    Map X ports according to request in a range

    create an env file specifying a port range

    Start a master container executing command and loading created env file, containing host list and port range. Port range must? be the same for all hosts ?

  3. Log in to comment