g++-10 possibly miscompiles parts of Carpet

Create issue
Issue #2426 open
Roland Haas created an issue

In #2410 its was found that while -O2 compiles Baikal very slowly using the various -f options reported by gcc -Q -O2 --help=optimizers compiles Baikal more quickly.

However using these options makes the test CarpetRegrid2/test/regrid2_2cen.par fail with an error enforcing the N13CarpetRegrid215combine_regionsE property:

Property N13CarpetRegrid215combine_regionsE
WARNING level 0 from host ekohaes8 process 0
  in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
  -> Property does not hold after being enforced
WARNING level 0 from host ekohaes8 process 0
  in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
  -> Property does not hold after being enforced

and adding some extra output and debug code:

diff --git a/CarpetRegrid2/src/property.cc b/CarpetRegrid2/src/property.cc
index 167acd00..0beb791d 100644
--- a/CarpetRegrid2/src/property.cc
+++ b/CarpetRegrid2/src/property.cc
@@ -171,6 +171,12 @@ bool combine_regions::test_impl(gh const &hh, dh const &dd,
   CCTK_REAL const regions_size = static_cast<CCTK_REAL>(regions.at(rl).size());
   CCTK_REAL const combined_size = static_cast<CCTK_REAL>(combined.size());

+  std::cout << "regions " << regions << "\n";
+  std::cout << "combined_region " << combined << "\n";
+  std::cout << "combined_region.size() " << combined.size() << "\n";
+  std::cout << "set size: " << regions.at(rl).setsize() << "\n";
+  std::cout << "min_fraction " << min_fraction << " combined_size " << combined_size << " regions_size " << regions_size << "\n";
   // Is the current setup "simple enough"? (It is "simple enough" if
   // either it already consists of a single box, or if using a
   // single bbox would be too inefficient.)
diff --git a/CarpetRegrid2/src/regrid.cc b/CarpetRegrid2/src/regrid.cc
index 3495f504..d53b8299 100644
--- a/CarpetRegrid2/src/regrid.cc
+++ b/CarpetRegrid2/src/regrid.cc
@@ -354,8 +354,10 @@ void Regrid(cGH const *const cctkGH, gh::rregs &regss) {
       ibset const old_regions = regions.at(rl);
       for (vector<property *>::iterator pi = properties.begin();
            pi != properties.end(); ++pi) {
+        std::cout << "Testing " << (*pi)->name() << "\n";
         bool const test_satisfied = (*pi)->test(hh, dd, bnd, regions, rl);
         if (not test_satisfied) {
+          std::cout << "already satisfied " << (*pi)->name() << "\n";
           (*pi)->enforce(hh, dd, bnd, regions, rl);
         done_enforcing = done_enforcing and test_satisfied;
diff --git a/CarpetRegrid2/test/regrid2_2cen.par b/CarpetRegrid2/test/regrid2_2cen.par
index d6bbf7f8..ceb497af 100644
--- a/CarpetRegrid2/test/regrid2_2cen.par
+++ b/CarpetRegrid2/test/regrid2_2cen.par
@@ -10,6 +10,8 @@ Cactus::cctk_run_title = "Test Run"
 Cactus::cctk_show_schedule = "yes"
 Cactus::cctk_itlast = 0

+CarpetRegrid2::veryverbose = yes
 # CartGrid3D
 CartGrid3D::type = "coordbase"
 CartGrid3D::avoid_origin = "no"

produces output

Refinement level 2: preliminary regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
Determining domain outer boundary...
Refinement level 2: determining outer boundary...
Refinement level 2: physical coordinate boundary is at [[-24,-24,0],[24,24,24]]
Refinement level 2: spacing is [0.5,0.5,0.5]
Refinement level 2: exterior coordinate boundary is at [[-25.5,-25.5,-1.5],[25.5,25.5,25.5]]
Refinement level 2: stride is [1,1,1]
Refinement level 2: physical boundary is at [[12,12,12],[108,108,60]]
Refinement level 2: reconstructed physical coordinate boundary is at [[-24,-24,0],[24,24,24]]
Refinement level 2: exterior boundary is at [[9,9,9],[111,111,63]]
Refinement level 2: reconstructed exterior coordinate boundary is at [[-25.5,-25.5,-1.5],[25.5,25.5,25.5]]
Refinement level 2:
   Original regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
Refinement level 2: adding buffer zones...
   New regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
Testing combine_regions
regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])]
combined_region ([1,1,7]:[0,0,17]:[1,1,1]/[1,1,7]:[0,0,17]/[0,0,0]/0)
combined_region.size() 0
set size: 2
min_fraction 0.9 combined_size 0 regions_size 2662
already satisfied combine_regions
Refinement level 2: combining regions...
   New regions are bboxset<CCTK_INT4,3>(set<bbox>:{},stride:[1,1,1],offset:[0,0,0])
regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{},stride:[1,1,1],offset:[0,0,0])]
combined_region ([1,1,1]:[0,0,0]:[1,1,1]/[1,1,1]:[0,0,0]/[0,0,0]/0)
combined_region.size() 0
set size: 0
min_fraction 0.9 combined_size 0 regions_size 0
Property N13CarpetRegrid215combine_regionsE
WARNING level 0 from host ekohaes8 process 0
  in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
  -> Property does not hold after being enforced
WARNING level 0 from host ekohaes8 process 0
  in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
  -> Property does not hold after being enforced

that shows that somehow the original regions

regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])]

gets combined to

combined_region ([1,1,7]:[0,0,17]:[1,1,1]/[1,1,7]:[0,0,17]/[0,0,0]/0)

when calling ibset::container().

This happens on both Linux (g++-10 (Debian 10.1.0-3) 10.1.0) and OSX (g++-mp-10 (MacPorts gcc10 10.1.0_0) 10.1.0) for me using optimization flags obtained from:

CXX_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(CXX) -Q -O2 --help=optimizer | grep enabled'))
C_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(CC) -Q -O2 --help=optimizer | grep enabled'))
F90_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(F90) -Q -O2 --help=optimizer | grep enabled'))

which for the CXX compiler turn out to be:

-faggressive-loop-optimizations -falign-functions -falign-jumps -falign-labels -falign-loops -fallocation-dce -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcode-hoisting -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdce -fdefer-pop -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdse -fearly-inlining -fexpensive-optimizations -fforward-propagate -ffp-int-builtin-inexact -ffunction-cse -fgcse -fgcse-lm -fguess-branch-probability -fhoist-adjacent-loads -fif-conversion -fif-conversion2 -findirect-inlining -finline -finline-atomics -finline-functions -finline-functions-called-once -finline-small-functions -fipa-bit-cp -fipa-cp -fipa-icf -fipa-icf-functions -fipa-icf-variables -fipa-profile -fipa-pure-const -fipa-ra -fipa-reference -fipa-reference-addressable -fipa-sra -fipa-stack-alignment -fipa-vrp -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots -fisolate-erroneous-paths-dereference -fivopts -fjump-tables -flifetime-dse -flra-remat -fmath-errno -fmove-loop-invariants -fomit-frame-pointer -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fpeephole -fpeephole2 -fplt -fprefetch-loop-arrays -fprintf-return-value -free -frename-registers -freorder-blocks -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -fsched-critical-path-heuristic -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion -fschedule-insns2 -fshort-enums -fshrink-wrap -fshrink-wrap-separate -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-backprop -fssa-phiopt -fstdarg-opt -fstore-merging -fstrict-aliasing -fstrict-volatile-bitfields -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-cselim -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter -ftree-vrp -fvar-tracking -fvar-tracking-assignments -fweb

Using just -O2 does not show the failure.

This seems like a compiler bug or some subtle bug in bug in CarpetLib to me.

Comments (17)

  1. Erik Schnetter

    Did you enable various ATTRIBUTE_CONST and PURE flags in autoconf? Maybe respective annotations are wrong.


  2. Roland Haas reporter

    I forgot to say. This happens on the release branch which has the attributes disabled.

  3. Roland Haas reporter

    Looking at the code, this part has me a bit suspicious:

      typedef vector<bbox> iter_memo_t;
      typedef typename iter_memo_t::const_iterator const_iterator;
      mutable iter_memo_t iter_memo;
      int setsize() const {
        iter_memo_t im;
        return im.size();
      const_iterator begin() const {
        return iter_memo.begin();
      const_iterator end() const { return iter_memo.end(); }

    namely this is the only place where iter_memo is used. Reading this it would seem to me calling begin() twice will invalidate the iterator returned initially (due to it being an iterator on the iter_memo vector who just had clear() called on it). Note that I have a hard time understanding if this could possibly cause the issues that I am seeing, but it does look like fishy code to me.

  4. Erik Schnetter

    I was afraid so. The bboxset2 implementation is among the most complex pieces of CarpetLib. It also was written at a time when C++11 support was immature. The first step might be to clean up the code by removing support for non-C++11 compilers.

    Apart from this, the code is peppered by assert statements, and it used shared pointers (instead of more efficient unique pointers) to be more sure that it is correct. I wonder why the assert statements don’t trigger.

  5. Erik Schnetter

    Yes, this code looks fishy. I looked for static variables, but I didn’t see this mutable one. Creating the iterator is expensive, hence this memoized variables.

    The best way to rewrite this would be to add an explicit conversion (calling serialise) to a new type that has a cheap iterator (iter_memo_t). That requires rewriting all callers to begin and end, but that shouldn’t be too bad because it only makes expensive operations explicit.

  6. Roland Haas reporter

    Changing all users was too much for me, so to test this suspicion I added an almost-iterator class to bboxsset2 (with an ugly hack for end()) that is in here: https://bitbucket.org/eschnett/carpet/branch/rhaas/pseudoiterator it makes the failing test (and the other testsuites) pass so seems to confirm the suspicion that the memoization is causing issues.

    An alternative to changing callers might be to have all non-const members of bboxset2 explicitly clear the memoized state (instead of doing in in begin()) and only re-serialize in begin() if the state is not already serialized. I.e. change tracking in bboxset2. This would however, in contrast to changing the callers, keep the serialised state around even after it is no longer needed (just like the current implementation).

    Finally one could (possibly) keep track of how many iterators exist (essentially using a counter in bboxset2 objects that count how many iterators exist) by having the iterators keep a pointer to their “parent” and only re-serialise once all iterators have been destroyed. This also means that in becomes possible to track if the parent bboxset2 is modified while there are const-iterators in existence (which is technically allowed I assume, the only non-allowed action is likely doing anything with the iterators after their parent has been modified).

    I have never really had to worry about (correctly) implementing iterators so cannot say what the best approach would be.

    I do not know if this bug has the chance to fail non-catastrophically (ie without users noticing). However it seems serious enough to me to backport a fix to the current release branch, or at the very least document using CARPET_DISABLE_BBOXSET2 as workaround, which however then requires that https://bitbucket.org/eschnett/carpet/pull-requests/36 be backported.

  7. Erik Schnetter

    I wonder what triggers the problem. Are there multiple concurrent iterators? Is the iterator used in multiple threads? Is the bboxset2 set changed while an iterator is running? Or do I misunderstand the definition of mutable?

    Disabling bboxset2 is very expensive when running on many MPI processes. Since machines today have many more cores than they used to, I would be careful with that.

    My guess is that any failure in a low-level mechanism such as bboxset2 will lead to instant death for any simulation.

  8. Roland Haas reporter

    I am not sure exactly what is happening. I will have to add some debug code to check what is going on. Unfortunately the issue does not show up when compiling without optimization so I suspect that adding too much debug code may make the issue go away.

  9. Roland Haas reporter

    Concerning your questing about mutable: well I guess changing iter_memo was never quite correct since it would have always produced observable differences (since the old iterators would become invalid upon clear()). With C++11 and thread safety the concepts seem to have changed https://softwareengineering.stackexchange.com/questions/379516/is-the-meaning-of-const-still-thread-safe-in-c11 though I try to stay away from these things myself normally :-)

  10. Erik Schnetter

    I have put a patch into a new Carpet branch eschnett/mutable. It passes the test case regrid2_2cen.par on my laptop with GCC 10.

  11. Roland Haas reporter

    Thank you. I will give it a try and if this fixes the issue for me as well propose it for backporting to the release branch. Looking at the code in the branch the commit 412c8c06 "Carpet: Adapt math functions to recent flesh changes" of carpet probably will block a direct merge into master since those “recent flesh changes” never made it the flesh’s master since they break on QueenBee (see https://bitbucket.org/einsteintoolkit/tickets/issues/2407/cactus-math-work-arounds-break-carpetx#comment-57679212 in #2470),

    This patch most likely makes Carpet fail (more) to compile when CARPET_DISABLE_BBOXSET2 is set. Given that BBOXSET1 is no longer used on any clusters by default and thus slowly goes out of sync with bboxset2 over time, do you want me to add code to bboxset1 to provide iterator() (should be trivial, namely returning *this ought to do) or plan to retire bboxset1?

  12. Erik Schnetter

    The math changes are orthogonal. I didn’t intend to include them into this pull request; they were simply present when I started. We can take them out.

    I plan to retire bboxset1, as well as the non-C++11 code in bboxset2. I also planned the non-tree-search in CarpetLib, but since that helped us debug here, I’m not sure any more. Since these codes are disabled and don’t hurt, I don’t have immediate plans to remove them.

    Are you sure that Queen Bee is still around? It must be 15 years old by now. What version of libc / g++ are installed there by default? Would using a newer version of GCC help, since it comes with its own STL?

  13. Roland Haas reporter

    From what I understand QueenBee is currently “QueenBee 2” but LSU kept both the hostname and the name used on documentation the same. I am told “QueenBee 3” has just been deployed (see eg. http://www.hpc.lsu.edu/resources/hpc/system.php?system=QueenBee vs http://www.hpc.lsu.edu/docs/guides.php?system=QB2).

    The Intel compiler we use is very old 2013.1.046 (with 14.0.2 being the machine default) however there are much newer ones available (up to 2018.1) which we could easily use.

    My approach would be to first update the machine definition files for queenbee (and other clusters in simfactory) to use the newer compiler then update the flesh with the breaking change, then update Carpet. This would keep the time of breaking things in the ET to a minimum.

    The other option would be to make the breaking change in the flesh, and count on QB users that care about their system remaining usable with the ET to update the simfactory files.

    Currently qb.ini says:

    # last-tested-on: 2016-06-04
    # last-tested-by: Erik Schnetter <schnetter@gmail.com>

    which is not quite correct anymore.

  14. Roland Haas reporter

    Since the commit d1de2e1a "CarpetLib: Remove mutable member in bboxset2" of carpet itself was reviewed positively I will cherry pick it to master (leaving out the math changes) unless objected by Saturday.

  15. Roland Haas reporter

    Patch handling “mutable" applied as git hash 3db87233 "CarpetLib: Remove mutable member in bboxset2" of carpet

    Failing test (if using explicit optimization options) persists even with this patch though, so leaving this ticket open.

  16. Log in to comment