Carpet PreSync triggering syncs for all refinement levels unnecessarily
In Carpet, the function PreSyncGroups
calls SyncProlongateGroups
to automatically sync variables based on the reads and writes. I include the function below.
void PreSyncGroups(cFunctionData *attribute, cGH *cctkGH, const std::vector<int> &pre_groups) { DECLARE_CCTK_PARAMETERS; assert(not CCTK_EQUALS(presync_mode, "off") and not CCTK_EQUALS(presync_mode, "warn-only")); if (pre_groups.empty()) return; if (reflevel > 0) { // recurse to check that all coarsers levels are properly SYNCed CCTK_REAL previous_time = cctkGH->cctk_time; const int parent_reflevel = reflevel - 1; BEGIN_GLOBAL_MODE(cctkGH) { ENTER_LEVEL_MODE(cctkGH, parent_reflevel) { cctkGH->cctk_time = tt->get_time(mglevel, reflevel, timelevel); PreSyncGroups(attribute, cctkGH, pre_groups); } LEAVE_LEVEL_MODE; } END_GLOBAL_MODE; cctkGH->cctk_time = previous_time; } // ask Carpet to do the SYNC, this will apply BC as well SyncProlongateGroups(cctkGH, pre_groups, attribute); }
Starting at line 11, the code claims to “check that all coarser levels are properly SYNCed”. However, this “check” amounts to blindly syncing every coarser refinement level! This is very inefficient and a huge performance problem that grows with the number of refinement levels. With a parfile for a binary black hole simulation with BaikalVacuum using 12 refinement levels, I found that each iteration was taking 6 times longer. For now in my work, I am commenting lines 11-24 out and assuming that coarser levels are fine. However, this definitely needs to be fixed in a more ‘proper’ way.
This ties into a similar issue I ran into when I didn’t have boundary conditions registered (due to my own mistake). Since boundaries weren’t getting filled, every READ(everywhere) would trigger sync+BCs, but there were no BCs to apply. Therefore, even if it should have been synced already, it would sync again with the next READ(everywhere) because there were no BCs applied. As an aside, I feel that this case might have needed to generate an error somewhere, as a READ(everywhere) was running fine even though it was only valid for interior + ghosts.
I propose that a reasonable fix would be to add a check in SyncProlongateGroups to see what is actually needed. This function eventually ends up with a list of goodgroups
that are to be synced and have boundary conditions applied. Once this preliminary list is constructed, PreSync should further refine this list into those that need syncs and those that need BCs using validities. Then, it can pass two different lists to each, if necessary. I don’t know if there’s another way for PreSync simulations to get to SyncGroups other than through this function, so there may have to be a check in SyncGroups if that is the case.
Comments (8)
-
reporter -
reporter I attempted to put in some logic to check whether it should actually call SyncProlongateBoundaries. Leaving the recursion loop unchanged, I have
std::vector<int> check_groups; std::set<int> tmpgroups; for (int g = 0; g < pre_groups.size(); g++) { int gi = pre_groups[g]; for (int vi = 0; vi < CCTK_NumVarsInGroupI(gi); vi++) { int const map0 = 0; ggf *const ff = arrdata.AT(gi).AT(map0).data.AT(vi); assert(ff); int const valid = ff->valid(mglevel, reflevel, timelevel); if (not is_set(valid, CCTK_VALID_EVERYWHERE)) { tmpgroups.insert(gi); break; } } } check_groups.assign(tmpgroups.begin(), tmpgroups.end()); if(!check_groups.empty()) { // ask Carpet to do the SYNC, this will apply BC as well SyncProlongateGroups(cctkGH, check_groups, attribute); }
This loops over the groups in pre_groups. It then loops over the variables in each group and checks if they are valid everywhere. If not, then the group is added to tmpgroups and the loops are cycled to the next group. At the end, the vector check_groups is assigned all the group indices that need to be passed on. I mimicked what is done in the PreCheckValid function, so all of this should be fine.
I did this here because it will be a bit more complicated to do it in SyncProlongateBoundaries because we will have to do it separately for syncs and BC application. There’s also a ProlongateBoundaries call that I don’t know what logic I would need to use.
-
@Samuel Cupp does this fix the performance?
-
reporter I haven’t yet run a simulation with a lot of reflevels, but I verified that this code doesn’t call
SyncProlongateGroups
except for the current reflevel when coarser levels don’t need to be synced. Thus, it is equivalent to commenting out the loop. The only added time will be from looping over the groups and comparing ints at each reflevel, which should be negligible. I’ll run the BaikalVacuum with this and check performance.Since I don’t have a code where the coarser levels aren’t synced before getting to the finer level, I can’t do a ‘real’ test of the case where this loop is actually necessary. I’m still not convinced that’s something that can happen without a thorn having incorrect reads/writes.
-
reporter I pushed a branch and made a pull request. I’m doing some final testing now to verify that it behaves as I expect (that is to say, it removes the extra calls just like commenting out the recursion does). I’ll comment again when I’m done.
-
reporter I’ve verified that this fixes the behavior, though actually iterating through all the variables in the pre_groups on every level for every sync is slower than simply commenting out the recursion (~17 time/hr vs ~19 time/hr). I think that we should consider whether this check is necessary or not, and if it is then see if there’s any way to improve the method.
-
reporter Please review.
-
-
assigned issue to
-
assigned issue to
- Log in to comment
I also feel I should add that I’m not convinced that this recursive behavior is necessary. Do we actually need to do this?