- removed comment
out-of-bounds write access checking in Cactus
I possibly very useful (and simple to implement) debugging help in Cactus would be if the Cactus driver provided some means to detect array accesses out of the array/grid-function bounds. In general that is hard to do (in C, for Fortran there are compiler switches) however a possible useful partial solution might already be to put canary values before and after the user-visible data of grid functions/grid arrays. The flesh/driver could then check after each scheduled routine if any of the canary values were modified and if so output a warning.
Schematically the layout in memory would be
Canary1 data Canary2
and CCTK_VarDataPtr would return a pointer to "data" only. After a scheduled function all we check Canary1 and Canary2 and output an error if they are corrupted. Similarly the IncreaseGroupStorage/DecreaseGroupStorage routines could set/check the canary values.
This would prevent these errors triggering failures at some later unrelated call to malloc or free. glibc's malloc function provides some of this if _MALLOC_DEBUG is set, though I am not sure how well that actually works in practice in particular since eg OpenMP provides its own malloc function.
I don't have an implementation of this right now and am mostly fishing for comments.
Keyword:
Comments (10)
-
-
reporter - removed comment
Sorry, I meant OpenMPI not OpenMP but could not figure out how to modify the description after having created the ticket (not enough privileges I think). See eg http://svn.open-mpi.org/svn/ompi/trunk/opal/mca/memory/linux/malloc.c though I cannot find a really good reference for this.
-
reporter - removed comment
I have a proof-of-concept implementation that implements a canary based electric fence around CarpetLib's mem objects. These are used for grid functions, scalars and arrays. The current code will check after each scheduled routine if any (write) access out-of-bounds happened in any of the variables accessible through CCTK_VarDataPtr. This is mostly ok though one can construct situations (involving Carpet's ENTER_XXX_MODE functions etc) where eg a GLOBAL routine accesses grid functions which is currently not caught.
The functionality is activated by setting CarpetLibs::electric_fence = yes. Right now the implementation passes the test suite and there is a (designed to fail) test in CarpetExtra/CarpetEFenceTest.
Most likely the interface will change in the future so that the check_fence function becomes a member of ggf (which is CarpetLib's representation of a grid function) rather than the data objects so that the user can just call ggf->check_fence(warning_callback) passing in a callback routine rather than iterating over all timelevels/maps/whatnot of each variable.
The code is in a branch on bitbucket: https://bitbucket.org/rhaas80/carpet/commits/branch/fence You can click on the "Compare" button near the upper right to see the differences to (my) current Carpet master branch. Note that this branch will be rebased often.
Comments welcome.
-
- removed comment
The line
storage_ = (T*)align_up(size_t(storage_base_ + electric_fence), alignment);
is wrong; electric_fence should be canary/2 instead.
-
reporter - removed comment
Right you are. Fixed now. Was a leftover from when the code only supported a fence_width of 1.
-
- changed status to open
- removed comment
LGTM.
-
- changed status to open
- removed comment
-
reporter - removed comment
Well I was not really aiming for an actual review since I plan to make further changes. It's good to see that the concept and current implementation is approved though. Thank you.
-
reporter - changed status to resolved
- removed comment
Committed as 0e435fca01c0ce3356565c661afb0ff9d64880a1 "CarpetLib: add some code for electric fence like functionality", d746b25592639a0f8eecab042553fe2a639be9d9 "Carpet: use electric fence provided by CarpetLib", d8cc74e37df4ff3750ccaed11b67e152cc0295f3 "CarpetEFenceTest: test electric fence in CarpetLib".
-
reporter - edited description
- changed status to closed
- Log in to comment
This would be straightforward to implement in PUGH and Carpet. In Carpet, the canaries would be initialized when the memory is allocated, and would be checked regularly (e.g. during poison checking), not just before deallocating.
I doubt that OpenMP provides its own malloc function. I expect all memory allocation (including operator new from C++) to go through libc's malloc.