Topological view

Issue #59 new
jg piccinali repo owner created an issue

Issue

Cube can display "a topological view of 
the application's processes and threads (if available)"

Do i need to recompile ScoreP or Cube in a different way ?

Currently Loaded Modulefiles:
  1) craype-haswell
  2) binutils/2.24
  3) GCC/4.8.2-EB
  4) cudatoolkit/6.5.14
  5) MVAPICH2/2.0.1-GCC-4.8.2-EB
  6) gmvapich2/2015b
  7) PrgEnv-gnu/2015b
  8) OpenBLAS/0.2.13-GCC-4.8.2-EB-LAPACK-3.5.0
  9) FFTW/3.3.4-gmvapich2-2015b
 10) ScaLAPACK/2.0.2-gmvapich2-2015b-OpenBLAS-0.2.13-LAPACK-3.5.0
 11) gmvolf/2015b
 12) Cube/4.3.2-gmvolf-2015b
 13) OTF2/1.5.1-gmvolf-2015b
 14) OPARI2/1.1.4-gmvolf-2015b
 15) Score-P/1.4.2-gmvolf-2015b
 16) Scalasca/2.2.2-gmvolf-2015b

Comments (6)

  1. jg piccinali reporter

    kescha

    • module purge
    • module load craype-haswell PrgEnv-gnu/2015b Score-P/1.4.2-gmvolf-2015b Scalasca/2.2.2-gmvolf-2015b
    • make CLASS=C NPROCS=144 FFLAGS=-O3 F77="scorep --mpp=mpi mpif90"
    • export OMP_NUM_THREADS=1
    • srun --exclusive -n24 -N1 -c$OMP_NUM_THREADS GNU.KESCHLN-
         24 keschcn-0001
    
    • srun --exclusive -n48 -N2 -c$OMP_NUM_THREADS GNU.KESCHLN-
         24 keschcn-0002
         24 keschcn-0003
    
    • srun --exclusive -n96 -N4 -c$OMP_NUM_THREADS GNU.KESCHLN-
         24 keschcn-0002
         24 keschcn-0003
         24 keschcn-0004
         24 keschcn-0005
    
    • export SLURM_NPROCS=144 # 6cn
    • scan srun --exclusive -n144 -N6 -c$OMP_NUM_THREADS ./GNU.KESCHLN-.C.144
    • scorep-score -r scorep_GNU_144_sum/profile.cubex > scorep_GNU_144_sum/filterjg
    Estimated aggregate size of event trace:                   21GB
    
    • vim filterjg
    SCOREP_REGION_NAMES_BEGIN
     EXCLUDE
    ...
    SCOREP_REGION_NAMES_END
    
    • scorep-score scorep_GNU_144_sum/profile.cubex -f scorep_GNU_144_sum/filterjg
    Estimated aggregate size of event trace:                   7MB
    
    • scan -f scorep_GNU_144_sum/filterjg -t srun --exclusive -n144 -N6 -c$OMP_NUM_THREADS ./GNU.KESCHLN-.C.144 Screen Shot 2015-09-21 at 18.51.47.jpg
    • cube_topoassist scorep_GNU_144_trace/scout.cubex -c
  2. jg piccinali reporter

    cube_topoassist

    • cube_topoassist scorep+sca-T/scout.cubex -c
    # Reading scorep+sca-T/scout.cubex . Please wait... Done.  
    # Processes are ordered by rank. For more information about this file, use cube_info -S <cube experiment>
    # 
    # So far, only cartesian topologies are accepted.
    # Name for new topology?
    # kescha
    # Number of Dimensions?
    # 1
    # Do you want to name the dimensions (axis) of this topology? (Y/N)
    # N
    # Number of elements for dimension 0
    # 4
    # Is dimension 0 periodic?
    # n
    # Topology on THREAD level.
    # Master thread's (rank 0) coordinates in 1 dimensions, separated by spaces
    # 0 
    # Master thread's (rank 1) coordinates in 1 dimensions, separated by spaces
    # 1
    # Master thread's (rank 2) coordinates in 1 dimensions, separated by spaces
    # 2
    # Master thread's (rank 3) coordinates in 1 dimensions, separated by spaces
    # 3
    # Write topo.cubex......done.
    
    • cube topo.cubex eff.png
  3. jg piccinali reporter

    Cube xml format

    • tar xf topo.cubex anchor.xml
      <system>      
        <systemtreenode Id="0">
          <name>machine kesch</name>
          <class>machine</class>
          <systemtreenode Id="1">
            <name>node keschcn-0001</name>
            <class>node</class>
            <locationgroup Id="0">
              <name>MPI Rank 0</name>
              <rank>0</rank>
              <type>process</type>
              <location Id="0">
    etc...
    
  4. jg piccinali reporter

    TODO

    #if defined(REPORT_TOPOLOGY)
    #include <pmi.h>
            int nid;
            int rc = PMI_Get_nid(rank, &nid);
            pmi_mesh_coord_t xyz;
            PMI_Get_meshcoord((uint16_t) nid, &xyz);
            printf("RANK %d: (%d, %d, %d) -> %s (%d, %d, %d)\n", rank, 
                    coords[0], coords[1], coords[2], name, xyz.mesh_x, xyz.mesh_y, xyz.mesh_z);
    #endif
    
    • aprun -n8 -N1 -d1
    RANK 0: (0, 0, 0) -> nid00012
    RANK 6: (1, 1, 0) -> nid00018
    RANK 4: (1, 0, 0) -> nid00016
    RANK 2: (0, 1, 0) -> nid00014
    RANK 1: (0, 0, 1) -> nid00013
    RANK 5: (1, 0, 1) -> nid00017
    RANK 7: (1, 1, 1) -> nid00019
    RANK 3: (0, 1, 1) -> nid00015
    
  5. Log in to comment