g++-10 possibly miscompiles parts of Carpet
In #2410 its was found that while -O2
compiles Baikal very slowly using the various -f
options reported by gcc -Q -O2 --help=optimizers
compiles Baikal more quickly.
However using these options makes the test CarpetRegrid2/test/regrid2_2cen.par
fail with an error enforcing the N13CarpetRegrid215combine_regionsE
property:
Property N13CarpetRegrid215combine_regionsE
WARNING level 0 from host ekohaes8 process 0
in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
-> Property does not hold after being enforced
WARNING level 0 from host ekohaes8 process 0
in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
-> Property does not hold after being enforced
and adding some extra output and debug code:
diff --git a/CarpetRegrid2/src/property.cc b/CarpetRegrid2/src/property.cc
index 167acd00..0beb791d 100644
--- a/CarpetRegrid2/src/property.cc
+++ b/CarpetRegrid2/src/property.cc
@@ -171,6 +171,12 @@ bool combine_regions::test_impl(gh const &hh, dh const &dd,
CCTK_REAL const regions_size = static_cast<CCTK_REAL>(regions.at(rl).size());
CCTK_REAL const combined_size = static_cast<CCTK_REAL>(combined.size());
+ std::cout << "regions " << regions << "\n";
+ std::cout << "combined_region " << combined << "\n";
+ std::cout << "combined_region.size() " << combined.size() << "\n";
+ std::cout << "set size: " << regions.at(rl).setsize() << "\n";
+ std::cout << "min_fraction " << min_fraction << " combined_size " << combined_size << " regions_size " << regions_size << "\n";
+
// Is the current setup "simple enough"? (It is "simple enough" if
// either it already consists of a single box, or if using a
// single bbox would be too inefficient.)
diff --git a/CarpetRegrid2/src/regrid.cc b/CarpetRegrid2/src/regrid.cc
index 3495f504..d53b8299 100644
--- a/CarpetRegrid2/src/regrid.cc
+++ b/CarpetRegrid2/src/regrid.cc
@@ -354,8 +354,10 @@ void Regrid(cGH const *const cctkGH, gh::rregs ®ss) {
ibset const old_regions = regions.at(rl);
for (vector<property *>::iterator pi = properties.begin();
pi != properties.end(); ++pi) {
+ std::cout << "Testing " << (*pi)->name() << "\n";
bool const test_satisfied = (*pi)->test(hh, dd, bnd, regions, rl);
if (not test_satisfied) {
+ std::cout << "already satisfied " << (*pi)->name() << "\n";
(*pi)->enforce(hh, dd, bnd, regions, rl);
}
done_enforcing = done_enforcing and test_satisfied;
diff --git a/CarpetRegrid2/test/regrid2_2cen.par b/CarpetRegrid2/test/regrid2_2cen.par
index d6bbf7f8..ceb497af 100644
--- a/CarpetRegrid2/test/regrid2_2cen.par
+++ b/CarpetRegrid2/test/regrid2_2cen.par
@@ -10,6 +10,8 @@ Cactus::cctk_run_title = "Test Run"
Cactus::cctk_show_schedule = "yes"
Cactus::cctk_itlast = 0
+CarpetRegrid2::veryverbose = yes
+
# CartGrid3D
CartGrid3D::type = "coordbase"
CartGrid3D::avoid_origin = "no"
produces output
Refinement level 2: preliminary regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
Determining domain outer boundary...
Refinement level 2: determining outer boundary...
Refinement level 2: physical coordinate boundary is at [[-24,-24,0],[24,24,24]]
Refinement level 2: spacing is [0.5,0.5,0.5]
Refinement level 2: exterior coordinate boundary is at [[-25.5,-25.5,-1.5],[25.5,25.5,25.5]]
Refinement level 2: stride is [1,1,1]
Refinement level 2: physical boundary is at [[12,12,12],[108,108,60]]
Refinement level 2: reconstructed physical coordinate boundary is at [[-24,-24,0],[24,24,24]]
Refinement level 2: exterior boundary is at [[9,9,9],[111,111,63]]
Refinement level 2: reconstructed exterior coordinate boundary is at [[-25.5,-25.5,-1.5],[25.5,25.5,25.5]]
Refinement level 2:
Original regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
Refinement level 2: adding buffer zones...
New regions are bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
Testing combine_regions
regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])]
combined_region ([1,1,7]:[0,0,17]:[1,1,1]/[1,1,7]:[0,0,17]/[0,0,0]/0)
combined_region.size() 0
set size: 2
min_fraction 0.9 combined_size 0 regions_size 2662
already satisfied combine_regions
Refinement level 2: combining regions...
New regions are bboxset<CCTK_INT4,3>(set<bbox>:{},stride:[1,1,1],offset:[0,0,0])
regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{},stride:[1,1,1],offset:[0,0,0])]
combined_region ([1,1,1]:[0,0,0]:[1,1,1]/[1,1,1]:[0,0,0]/[0,0,0]/0)
combined_region.size() 0
set size: 0
min_fraction 0.9 combined_size 0 regions_size 0
Property N13CarpetRegrid215combine_regionsE
WARNING level 0 from host ekohaes8 process 0
in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
-> Property does not hold after being enforced
WARNING level 0 from host ekohaes8 process 0
in thorn CarpetRegrid2, file /data/rhaas/postdoc/gr/cactus/ET_Turing/arrangements/Carpet/CarpetRegrid2/src/property.cc:34:
-> Property does not hold after being enforced
that shows that somehow the original regions
regions [bboxset<CCTK_INT4,3>(set<bbox>:{([0,0,0]:[120,120,72]:[4,4,4]/[0,0,0]:[30,30,18]/[31,31,19]/18259)},stride:[4,4,4],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([30,30,2]:[50,50,22]:[2,2,2]/[15,15,1]:[25,25,11]/[11,11,11]/1331),([58,58,2]:[78,78,22]:[2,2,2]/[29,29,1]:[39,39,11]/[11,11,11]/1331)},stride:[2,2,2],offset:[0,0,0]),bboxset<CCTK_INT4,3>(set<bbox>:{([35,35,7]:[45,45,17]:[1,1,1]/[35,35,7]:[45,45,17]/[11,11,11]/1331),([63,63,7]:[73,73,17]:[1,1,1]/[63,63,7]:[73,73,17]/[11,11,11]/1331)},stride:[1,1,1],offset:[0,0,0])]
gets combined to
combined_region ([1,1,7]:[0,0,17]:[1,1,1]/[1,1,7]:[0,0,17]/[0,0,0]/0)
when calling ibset::container()
.
This happens on both Linux (g++-10 (Debian 10.1.0-3) 10.1.0) and OSX (g++-mp-10 (MacPorts gcc10 10.1.0_0) 10.1.0) for me using optimization flags obtained from:
CXX_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(CXX) -Q -O2 --help=optimizer | grep enabled'))
C_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(CC) -Q -O2 --help=optimizer | grep enabled'))
F90_OPTIMISE_FLAGS=$(filter -f%,$(shell /bin/bash -c '$(F90) -Q -O2 --help=optimizer | grep enabled'))
which for the CXX compiler turn out to be:
-faggressive-loop-optimizations -falign-functions -falign-jumps -falign-labels -falign-loops -fallocation-dce -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcode-hoisting -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdce -fdefer-pop -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdse -fearly-inlining -fexpensive-optimizations -fforward-propagate -ffp-int-builtin-inexact -ffunction-cse -fgcse -fgcse-lm -fguess-branch-probability -fhoist-adjacent-loads -fif-conversion -fif-conversion2 -findirect-inlining -finline -finline-atomics -finline-functions -finline-functions-called-once -finline-small-functions -fipa-bit-cp -fipa-cp -fipa-icf -fipa-icf-functions -fipa-icf-variables -fipa-profile -fipa-pure-const -fipa-ra -fipa-reference -fipa-reference-addressable -fipa-sra -fipa-stack-alignment -fipa-vrp -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots -fisolate-erroneous-paths-dereference -fivopts -fjump-tables -flifetime-dse -flra-remat -fmath-errno -fmove-loop-invariants -fomit-frame-pointer -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fpeephole -fpeephole2 -fplt -fprefetch-loop-arrays -fprintf-return-value -free -frename-registers -freorder-blocks -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -fsched-critical-path-heuristic -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion -fschedule-insns2 -fshort-enums -fshrink-wrap -fshrink-wrap-separate -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-backprop -fssa-phiopt -fstdarg-opt -fstore-merging -fstrict-aliasing -fstrict-volatile-bitfields -fthread-jumps -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-cselim -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter -ftree-vrp -fvar-tracking -fvar-tracking-assignments -fweb
Using just -O2
does not show the failure.
This seems like a compiler bug or some subtle bug in bug in CarpetLib to me.
Comments (18)
-
-
reporter I forgot to say. This happens on the release branch which has the attributes disabled.
-
reporter The issue goes away if I set
CARPET_DISABLE_BBOXSET2
though that requires the fixes in https://bitbucket.org/eschnett/carpet/pull-requests/36 to compile. Note that this just tells me that the file which g++ might miscompiles (or which may be slightly wrong) tobboxset2.hh
and does not really provide a fix that is generically applicable. -
reporter Looking at the code, this part has me a bit suspicious:
private: typedef vector<bbox> iter_memo_t; public: typedef typename iter_memo_t::const_iterator const_iterator; private: mutable iter_memo_t iter_memo; public: int setsize() const { iter_memo_t im; serialise(im); return im.size(); } const_iterator begin() const { iter_memo.clear(); serialise(iter_memo); return iter_memo.begin(); } const_iterator end() const { return iter_memo.end(); }
namely this is the only place where
iter_memo
is used. Reading this it would seem to me callingbegin()
twice will invalidate the iterator returned initially (due to it being an iterator on theiter_memo
vector who just hadclear()
called on it). Note that I have a hard time understanding if this could possibly cause the issues that I am seeing, but it does look like fishy code to me. -
I was afraid so. The bboxset2 implementation is among the most complex pieces of CarpetLib. It also was written at a time when C++11 support was immature. The first step might be to clean up the code by removing support for non-C++11 compilers.
Apart from this, the code is peppered by assert statements, and it used shared pointers (instead of more efficient unique pointers) to be more sure that it is correct. I wonder why the assert statements don’t trigger.
-
Yes, this code looks fishy. I looked for static variables, but I didn’t see this mutable one. Creating the iterator is expensive, hence this memoized variables.
The best way to rewrite this would be to add an explicit conversion (calling
serialise
) to a new type that has a cheap iterator (iter_memo_t
). That requires rewriting all callers tobegin
andend
, but that shouldn’t be too bad because it only makes expensive operations explicit. -
reporter Changing all users was too much for me, so to test this suspicion I added an almost-iterator class to bboxsset2 (with an ugly hack for
end()
) that is in here: https://bitbucket.org/eschnett/carpet/branch/rhaas/pseudoiterator it makes the failing test (and the other testsuites) pass so seems to confirm the suspicion that the memoization is causing issues.An alternative to changing callers might be to have all non-const members of bboxset2 explicitly clear the memoized state (instead of doing in in
begin()
) and only re-serialize inbegin()
if the state is not already serialized. I.e. change tracking in bboxset2. This would however, in contrast to changing the callers, keep the serialised state around even after it is no longer needed (just like the current implementation).Finally one could (possibly) keep track of how many iterators exist (essentially using a counter in bboxset2 objects that count how many iterators exist) by having the iterators keep a pointer to their “parent” and only re-serialise once all iterators have been destroyed. This also means that in becomes possible to track if the parent bboxset2 is modified while there are const-iterators in existence (which is technically allowed I assume, the only non-allowed action is likely doing anything with the iterators after their parent has been modified).
I have never really had to worry about (correctly) implementing iterators so cannot say what the best approach would be.
I do not know if this bug has the chance to fail non-catastrophically (ie without users noticing). However it seems serious enough to me to backport a fix to the current release branch, or at the very least document using
CARPET_DISABLE_BBOXSET2
as workaround, which however then requires that https://bitbucket.org/eschnett/carpet/pull-requests/36 be backported. -
I wonder what triggers the problem. Are there multiple concurrent iterators? Is the iterator used in multiple threads? Is the bboxset2 set changed while an iterator is running? Or do I misunderstand the definition of
mutable
?Disabling bboxset2 is very expensive when running on many MPI processes. Since machines today have many more cores than they used to, I would be careful with that.
My guess is that any failure in a low-level mechanism such as bboxset2 will lead to instant death for any simulation.
-
reporter I am not sure exactly what is happening. I will have to add some debug code to check what is going on. Unfortunately the issue does not show up when compiling without optimization so I suspect that adding too much debug code may make the issue go away.
-
reporter Concerning your questing about
mutable
: well I guess changingiter_memo
was never quite correct since it would have always produced observable differences (since the old iterators would become invalid uponclear()
). With C++11 and thread safety the concepts seem to have changed https://softwareengineering.stackexchange.com/questions/379516/is-the-meaning-of-const-still-thread-safe-in-c11 though I try to stay away from these things myself normally :-) -
I have put a patch into a new Carpet branch
eschnett/mutable
. It passes the test caseregrid2_2cen.par
on my laptop with GCC 10. -
reporter Thank you. I will give it a try and if this fixes the issue for me as well propose it for backporting to the release branch. Looking at the code in the branch the commit 412c8c06 "Carpet: Adapt math functions to recent flesh changes" of carpet probably will block a direct merge into master since those “recent flesh changes” never made it the flesh’s master since they break on QueenBee (see https://bitbucket.org/einsteintoolkit/tickets/issues/2407/cactus-math-work-arounds-break-carpetx#comment-57679212 in #2470),
This patch most likely makes Carpet fail (more) to compile when
CARPET_DISABLE_BBOXSET2
is set. Given that BBOXSET1 is no longer used on any clusters by default and thus slowly goes out of sync with bboxset2 over time, do you want me to add code to bboxset1 to provideiterator()
(should be trivial, namely returning*this
ought to do) or plan to retire bboxset1? -
The math changes are orthogonal. I didn’t intend to include them into this pull request; they were simply present when I started. We can take them out.
I plan to retire bboxset1, as well as the non-C++11 code in bboxset2. I also planned the non-tree-search in CarpetLib, but since that helped us debug here, I’m not sure any more. Since these codes are disabled and don’t hurt, I don’t have immediate plans to remove them.
Are you sure that Queen Bee is still around? It must be 15 years old by now. What version of libc / g++ are installed there by default? Would using a newer version of GCC help, since it comes with its own STL?
-
reporter From what I understand QueenBee is currently “QueenBee 2” but LSU kept both the hostname and the name used on documentation the same. I am told “QueenBee 3” has just been deployed (see eg. http://www.hpc.lsu.edu/resources/hpc/system.php?system=QueenBee vs http://www.hpc.lsu.edu/docs/guides.php?system=QB2).
The Intel compiler we use is very old 2013.1.046 (with 14.0.2 being the machine default) however there are much newer ones available (up to 2018.1) which we could easily use.
My approach would be to first update the machine definition files for queenbee (and other clusters in simfactory) to use the newer compiler then update the flesh with the breaking change, then update Carpet. This would keep the time of breaking things in the ET to a minimum.
The other option would be to make the breaking change in the flesh, and count on QB users that care about their system remaining usable with the ET to update the simfactory files.
Currently qb.ini says:
# last-tested-on: 2016-06-04 # last-tested-by: Erik Schnetter <schnetter@gmail.com>
which is not quite correct anymore.
-
reporter -
reporter - changed status to open
-
reporter -
reporter - marked as minor
- Log in to comment
Did you enable various
ATTRIBUTE_CONST
andPURE
flags in autoconf? Maybe respective annotations are wrong.-erik