Serial merge/write failure when NS is present in both meshes

Issue #8 resolved
Rajeev Jain created an issue

/homes/jain/moab/test/parallel
==> parmerge bug.inp out.h5m 1e-3 Writer with name MOAB for file out.h5m using extension h5m was unsuccessful Using default writer WriteHDF5Parallel for file out.h5m Writing output file failed. Code:MB_FAILURE File Error: Requested write of rows 0 to 103 of a 24 row table.

/homes/jain/moab/test/parallel
==> mpiexec -np 2 parmerge bug.inp out.h5m 1e-3

Let me know if you need more details. Rajeev

Comments (8)

  1. Iulian Grindeanu

    so parmerge utility is supposed to run in parallel, not in serial

    In an usual run, it is loading each file on a separate processor, then does the parallel merge. It is using "load_mesh" in a while loop, in a round robin fashion.

    Do we ever test on one processor?

    Is this the correct scenario?

    Can you create one more file, for merging, move it in a correct place, then launch with mpiexec -np 2 , but inp file should have all 3 files So first file will be on proc 0, second file on proc1, third file on proc 0 (round robin)

    If it is failing, it mean we cannot load 2 files on the same proc (which happens when you load in serial, one proc for 2 files)

    If it is not failing, it means something is wrong with parallel merge/ writer?

  2. Iulian Grindeanu

    OK, so if you run in parallel, it is updating correctly the 2 neumann sets, the second set, 22, has the merged quads. This is what should happen in serial too, and I think the bug is there. Still looking :(

  3. Rajeev Jain reporter

    This happens when loading any type of meshfile that needs quads merging. We do test serial, but this is a special case. This seems to be the correct scenario. It is some allocation problem in serial save ( serial saving to an exodus file works fine ). Let me know if you still want me to create a third file...

  4. Iulian Grindeanu

    no, no need for other files, thanks. we do things differently in serial and in parallel; we call on each proc "serial merge" , before doing parallel merge; the problem is in serial merge;

  5. Iulian Grindeanu

    https://bitbucket.org/fathomteam/moab/commits/ee050432985329e33f4456ad6d4e5a83c1b81932

    as noted there, the higher dimensional entities merging does not work in serial correctly. We need to fix it.

    It could be related to this problem; So what is happening: vertices are merged fine, but quads are not merged.

    The problem seems to work when parallel merge is used, as in that case, quads on skin are merged. So again, I think it is the serial merge.

    Even when you take 2 hexas, each with 6 quad faces, and try to merge, 12 faces are left, and 12 vertices, and 20 edges. There should be 11 quads left

    It causes problems only when faces are part of the neumann sets; somehow, don't know yet why. It may need changes to the hdf5 writer too. The error we are seeing is related to writing adjacency info; when we first estimate, we get a count; when we try to write, get a different count for adjacency table and bail out. Not sure yet how is the adjacency table related to this :(

  6. Iulian Grindeanu

    fix issue #8

    there are 2 important modifications 1) merging of higher dimensional entities is done differently:

    merging of faces between 2 solids should work better now

    at this point, the vertices are merged, but eventual edges, faces are not yet merged we will not skin again, as skinning does not work well, exactly for those entities that are on top of each other instead, we will keep track of vertices that were merged, and retrieve entities adjacent to those vertices; we will look for matches like this: loop over entities adjacent to those vertices retrieve connected vertices get all ents adjacent to those vertices; if a match is found, put it in a list delete the merged higher dim entities

    2) hdf5 writing: adjacencies information can contain sets that are adjacent to entities; start counting adjacencies after sets are assigned file ids, because otherwise adjacencies are not counted correctly

    This was the major source of problems for issue 8

    Now, for writing in parallel and serial, write first sets and then adjacencies

    → <<cset 32f2f6cac2e1>>

  7. Iulian Grindeanu

    Hmmmm...... bitbucket put it to resolved probably because I said "fix issue #8" in the commit Testing should be extensive, rgg scenarios are affected a lot I should have created a branch and a pull request

  8. Log in to comment