Refactoring during ESSEX Coding Week

Issue #270 resolved
Moritz Kreutzer created an issue

This is an continuously updated issue to gather ideas about some necessary refactoring of GHOST during the ESSEX Coding Week in September. Once everything is discussed with all project partners some refactoring will take place and hopefully we get a somehow stable API in GHOST 1.1

  1. Define behavior of location=HOST||DEVICE
    See #246

  2. Refactor densemat dimensions
    See #268

  3. Decide on what densemats need to know about their distribution
    Related issue: #267.
    In a previous commit I deleted the ghost_context from the ghost_densemat as suggested by Jonas. Also, I added a "goffs" field to the ghost_densemat which indicates its offset into a global densemat if it is distributed. My idea was that a densemat does not need to know anything about its context, and if a function requires densemats and communication (e.g., ghost_dot()), the MPI communicator needs to be passed to this function if communication (i.e., global reduction in this case) should be done or can be NULL otherwise.
    Maybe I was a bit hasty with this change. In PHIST, a multivec knows about it's distribution (i.e., its map) and has an MPI communicator attached. In a GHOST context, both information about the distribution and the communication (which depends on the sparse matrix) is stored. The densemat should maybe know about its distribution, but not the communication information. This would require splitting up the ghost_context into a ghost_map and a ghost_context, and a ghost_context contains a ghost_map. A sparsemat then should have attached a context (and with it a map) and a densemat should only have attached a ghost_map. Operations with different sparse/dense matrices can only be done if the maps match. For this, bool ghost_map_similar(ghost_map m1, ghost_map m2) should be added which compares all dimensions but omits the mpicomm.

    struct ghost_map {  
      ghost_mpi_comm mpicomm; // MPI communicator corresponding to this map  
      ghost_gidx gnrows; // global number of rows  
      ghost_gidx *offs; // row offset of each rank in mpicomm, length is nranks+1
      ghost_lidx *nrows; // number of rows of each rank in mpicomm
      ghost_lidx lnrows; // same value as nrows[own_rank] for convenience
    }
    struct ghost_context {
      ghost_map map;
      ghost_lidx nhalo;
      ghost_lidx *nwishes;
      ghost_lidx **wishes;
      int n_wishpartners;
      int *wishpartners;
      ghost_lidx *ndues;
      ghost_lidx **dues;
      int n_duepartners;
      int *duepartners;
      ghost_lidx *halooffs;
    }
    
  4. Check whether the ghost_complex class is (still) needed
    I think we could get rid of this.

  5. Discontinue support for arbitrarily scattered densemats?
    Does it make sense to support densemats with masked out rows? I think this is not even tested in PHIST and I can't think of a use case for that. However, support for masked out columns (also for row-major densemats with bitmaps) is needed of course. The viewScatteredVec and viewVec functions could be simplified by not accepting a number of rows and a row offset.

  6. Re-think padding
    Padding serves two reasons: maintaining alignment restrictions and simplify unrolling. In the latter case, padding elements are read and taken into computation in some kernels which requires setting them to zero and guaranteeing that they are always zero in those kernels. This resulted in the addition of DENSEMAT_ITER_INIT macros (see pull request #3). Those macros add complexity which is not really desired. For block vectors, having the column padding elements equal to zero (if GHOST_DENSEMAT_PAD_COLS is enabled) should be sufficient.
    The row padding only matters for TSMTTSM, as it involves a reduction. In this case, I think it would be worth the hassle to add remainder loops or set the padding elements to zero at kernel entry (adds less code and the overhead should be negligible). For other kernels, e.g., TSMM, the values of padding elements do not matter as they are only used for computation of padding elements in the result densemat.

  7. Refactor DENSEMAT_ITER macros
    For example, do we need the DENSEMAT_ITER2_OFFS macro? Actually, the val pointer of the second densemat could be adjusted according the offset and then the ordinary scattered or compact DENSEMAT_ITER2 macro could be used. If the value of row padding elements does no longer matter, we could get rid of all the DENSEMAT_ITER_INIT macros. In the unlikely case that GHOST_DENSEMAT_PAD_COLS is enabled, one could just set the column padding elements to zero in an additional sweep over the densemat. Of course this would add some overhead, but this option should only be used for benchmarking and testing anyway...

  8. Get rid of ghost_sell in ghost_sparsemat structure.
    This field is present due to an earlier version where we supported CRS and SELL-C-S. This can be deleted now and many things could be consolidated.

Please discuss and add more suggestions to the list @Jonas Thies, @Martin Galgon, @Faisal Shahzad, @christiealappatt

Comments (14)

  1. Jonas Thies

    some requirements for e.g. generalized EVP: - given sparsemat A, create sparsemat B with same row distribution and row/column permutation (maps) and either with the same pattern (shared context) or its own (requires different context with same permutations) - given A, create compatible densemats (from sparsemat get maps for LHS or RHS object) - given a densemat, use it's map to create a context for creating a sparseMat

  2. Moritz Kreutzer reporter

    do not use function pointers for storing, e.g., BLAS-1 functions but select an appropriate function at call time. this helps towards a good and defined behavior for heterogeneos runs, re #246, re #270

    → <<cset 19b304d0aff3>>

  3. Moritz Kreutzer reporter

    functions are no longer accessed via pointers in the ghost_densemat struct. this makes for a more consistent interface as we can no select the compute location of BLAS1 kernels and somehow addresses #270

    → <<cset 4306ddd0e69c>>

  4. Moritz Kreutzer reporter
    • changed milestone to 1.2

    Most of the things have been done. The possible re-thinking of padding, densemat iterations macros and scattered densemats should be postponed to the 1.2 milestone

  5. Log in to comment