ParallelComm: double free or corruption

Issue #24 new
Nico Schlömer created an issue

When compiling

#include <memory>

#include <moab/Core.hpp>
#include <moab/ParallelComm.hpp>

class test {
public:
  test(
      std::shared_ptr<moab::Core> c,
      std::shared_ptr<moab::ParallelComm> p
      ):
   c_(c),
   p_(p)
  {}

  ~test() {};

private:
  std::shared_ptr<moab::ParallelComm> p_;
  std::shared_ptr<moab::Core> c_;
};

int main() {
  std::shared_ptr<test> t;
  {
    auto c = std::make_shared<moab::Core>();
    auto p = std::make_shared<moab::ParallelComm>(c.get(), MPI_COMM_WORLD);
    t = std::make_shared<test>(c, p);
  }
  return 0;
}

I'm getting

*** Error in `./mytest': double free or corruption (out): 0x0000000000922140 ***
[fuji:29542] *** Process received signal ***
[fuji:29542] Signal: Aborted (6)
[fuji:29542] Signal code:  (-6)
[...]

The error is worked around if the initialization order of the data members of the test class is reverted to

  std::shared_ptr<moab::Core> c_;
  std::shared_ptr<moab::ParallelComm> p_;

I suppose this has to do with how the moab::Core object is stored within moab::ParallelComm, and consequently, how it is destroyed.

Comments (9)

  1. Vijay M

    Yes we grappled with this issue in one of our tests where the order of deallocation was leading to free errors, like the one above. I think we had a resolution for this but my memory may be wrong.

  2. Vijay M

    I haven't checked this with the latest master version but I think this shouldn't be happening anymore. @iulian07 please do verify this so that we can close the issue.

  3. Iulian Grindeanu

    I compiled as an example with -std=c++11, with gnu 5.4.1
    I get now this:

    ./nico 
    Attempting to use an MPI routine before initializing MPICH
    

    These issues should have been solved by PR #143 https://bitbucket.org/fathomteam/moab/pull-requests/143/remove-mpi-init-from-moab-lib/diff

    I still get it if I revert the order . Maybe I am using it wrong? How do you compile?

    Not sure what is happening, if default copy constructor gets used behind the scenes. We do not test smart pointers anywhere;

    Core should be initialized first, of course. ParallelComm should not exist without a "Core", and it cannot exist

  4. Iulian Grindeanu

    sorry, we have to initialize MPI before doing any parallel comm stuff

    Not sure why we don't fail more gracefully, but the driver should call mpi init, Moab does not call anymore MPI_Init()

    (PR #143 ) :)
    so this one works fine

    #include <moab/Core.hpp>
    #include <moab/ParallelComm.hpp>
    
    class test {
    public:
      test(
          std::shared_ptr<moab::Core> c,
          std::shared_ptr<moab::ParallelComm> p
          ):
       c_(c),
       p_(p)
      {}
    
      ~test() {};
    
    private:
      std::shared_ptr<moab::Core> c_;
      std::shared_ptr<moab::ParallelComm> p_;
    };
    
    int main() {
      MPI_Init(0,0);
      std::shared_ptr<test> t;
      {
        auto c = std::make_shared<moab::Core>();
        auto p = std::make_shared<moab::ParallelComm>(c.get(), MPI_COMM_WORLD);
        t = std::make_shared<test>(c, p);
      }
      MPI_Finalize();
      return 0;
    }
    
  5. Iulian Grindeanu

    I think the solution is to test in parallel comm constructor if mpi is initialized; if not, bail out with a nice explaining message

  6. Vijay M

    So this issue has been resolved - or at least we understand the behavior ?

    @iulian07 We should introduce MOABInitialize/MOABFinalize so that command-line args can be parsed and various behaviors can be dynamically controlled, without recompilation. We should introduce this change now (in v5.0) and then can build upon the framework in the upcoming releases.

  7. Log in to comment