Carpet does not currently restrict into ghost points, and it does not synchronise after restricting, as synchronisation is expected to be performed by user thorns in MoL_PostStep at the same time as application of boundary conditions, and MoL_PostStep is scheduled by MoL in CCTK_POSTRESTRICT, which occurs after restriction.
Unfortunately, CCTK_POSTRESTRICT is traversed in the order coarse to fine, whereas restriction happens fine to coarse, so the synchronisation applied by the user thorns does not occur in the correct order. This leads to incorrect data on the coarse grid. I noticed this when comparing output in 3D between identical simulations run on different numbers of processes.
The simple fix is to synchronise after restricting on each level; this ensures that the result is correct, but introduces a performance penalty due to the additional sync. One optimisation is possible. Carpet currently does not restrict into ghost zones, leading to the requirement of a synchronisation after restriction. Carpet can be made to restrict into ghost zones, but only if the restriction operator has a single-point stencil (e.g. point-copying used in vertex-centered mesh refinement). If this is the case, the sync after restriction is no longer necessary.
The branch ianhinder/restrictsync implements these changes.
(aside: the CSS styling on git.carpetcode.org has been broken for a while)
- Carpet: Add restriction sync test Test data generated on 1 process. Test fails on 2 processes due to lack of synchronisation in Carpet after restriction.
- Carpet: Sync after restriction on each level Synchronising in POSTRESTRICT (e.g. in MoL_PostStep) is not sufficient, as there it happens coarse-to-fine, whereas it needs to happen fine-to-coarse, like restriction. This introduces an additional sync of all restricted variables, which will have a performance impact. The coarse grid was, however, incorrect before. test_restrict_sync now passes on 1, 2 and 4 processes. It fails on 8 processes due to an additional blank line in the output which the test system does not tolerate.
- Carpet, CarpetLib: Restrict into ghost zones and skip sync after restrict if not using higher order restriction test_restrict_sync still passes on 1 and 2 processes
- We could enable (3) via a parameter since it is an optimisation. However, Erik believes the optimisation is always correct, and all the tests continue to pass, so I am tentatively proposing that no additional parameter is required.
- Can we make this sort of problem easier to detect? e.g. by poisoning the ghost points which are not set by restriction? When relying on user thorns to do something, can we poison the corresponding points first?
Thanks to Erik for helping to diagnose the problem and suggesting possible fixes. ET test results unchanged on 1 and 2 processes.
Comments on the commits?