Test non blocking MPI collectives (CLE>=5.2)

Issue #16 new
jg piccinali repo owner created an issue
  • cd /apps/santis/sandbox/jgp/mpich/src/mpich-3.1.3/test/mpi/f90/coll
  • ftn -c mtestf90.f90
  • ftn nonblockingf90.f90 mtestf90.o -o $PE_ENV
  • aprun -n8 ./CRAY

** https://webrt.cscs.ch/Ticket/Display.html?id=17909

Comments (5)

  1. jg piccinali reporter
    • cd /apps/santis/sandbox/jgp/mpich/src/mpich-3.1.4/test/mpi/coll
    • scorep --mpp=mpi cc 2jg.c -dynamic # nonblocking2.c
    • aprun -n 4 -N 4 -d 1 -j 1 ./a.out vampir.png
  2. jg piccinali reporter
    Nonblocking collectives like MPI_Ibcast are features of MPI 3.0 (and above). 
    Score-P provides support for MPI up to v2.2. 
    However we are working on the support of the new MPI 3.x features.
    
  3. jg piccinali reporter

    Speedup.ch/2015

    Compile

    • make MPICXX="scorep --mpp=mpi CC"

    Run

    • aprun -n 1 ./stencil 1024 1 50
    last heat: 2433.555556 time: 0.740542
    
    • aprun -n 8 -N 8 -d 1 -j 1 ./stencil_mpi_ddt+sc142 1024 1 50 4 2
    [0] last heat: 99.666667 time: 3.936471
    

    Screen Shot 2015-09-29 at 18.48.58.png v1.png

    • aprun -n 8 -N 8 -d 1 -j 1 ./stencil_mpi_carttopo_neighcolls+sc142 1024 1 50 4 2
    [0] last heat: 99.666667
    

    v2.png * Non blocking collective info is missing in tracefile:

    MPI_Ineighbor_alltoallv(sbuf, counts, displs, MPI_DOUBLE, 
        rbuf, counts, displs, MPI_DOUBLE, topocomm, &req);
    
  4. jg piccinali reporter

    Speedup.ch/2015

       void operator() (buffer_t* buffer) {
          int size= 2 * buffer->tile_size() * comm->max_n * comm->max_n;
          if(comm->nbc == comm_t::FFT_NBC) {
            NBC_Ialltoall(buffer->a2as, size, MPI_DOUBLE, buffer->a2ar, size, 
                     MPI_DOUBLE, MPI_COMM_WORLD, &buffer->handle);
          } else {
            MPI_Alltoall(buffer->a2as, size, MPI_DOUBLE, buffer->a2ar, size, 
                    MPI_DOUBLE, MPI_COMM_WORLD);
          }
    

    Cray XC

    Compile

    MPICXX=CC CC=cc CXX=CC F77=ftn \
    ./configure \
    --prefix=/apps/escha/sandbox/jgp/hoefler/libNBC/1.1.1/xc/gnu_482 \
    
    • module swap PrgEnv-cray PrgEnv-gnu
    • module load fftw/3.3.4.4
    CC -w \
    -I../libNBC/1.1.1/xc/gnu_482/include \
    -L../libNBC/1.1.1/xc/gnu_482/lib \
    3d-fft.cpp  -lnbc
    

    Run

    • aprun -n2 -N1 a.out
    1 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 160
    approx. size: 1000.000000 MB
    normal (MPI): 5.617800 (NBC_A2A: 0.080495/0.000000) (Test: 0.000000) (2x1d-fft: 3.083646) - 1x131072000 byte
    normal (NBC): 5.597339 (NBC_A2A: 0.048640/0.020811) (Test: 0.000000) (2x1d-fft: 3.094804) - 1x131072000 byte
    pipe (NBC): 5.403412 (NBC_A2A: 0.036380/0.025574) (Test: 0.000000) (2x1d-fft: 3.074958) - 1x131072000 byte
    tile (NBC): 5.357859 (NBC_A2A: 0.040936/0.018718) (Test: 0.000000) (2x1d-fft: 3.037793) - 1x131072000 byte
    win (NBC): 5.298810 (NBC_A2A: 0.044356/0.015685) (Pack: 0.000000) (2x1d-fft: 2.955262) - 1x131072000 byte
    wintile (NBC): 5.219840 (NBC_A2A: 0.082914/0.060287) (Pack: 0.000000) (2x1d-fft: 2.887048) - 1x131072000 byte
    # real 42.13
    
  5. Log in to comment