Recently I've encountered some camera soft-bricking issues on the 5D3 4K branch (all recovered, fortunately), from a null pointer error that resulted in writes at some random memory address (which could have been in the middle of some memory used by Canon to store their settings, so the overwritten value ended up being flashed into ROM). This made me thinking whether it's better to have some safeguards in the first place, to prevent such cases from happening.
The big problem is that any programming mistake that results in writing at the wrong memory address can cause such issues (because each task is allowed to write at any memory address). That's a limitation of the CPU and OS (no MMU), though it can be probably worked around to some extent (we do have a MPU with at least one unused entry - see mem_prot).
You'll say - wait a minute, aren't all the memory writes performed into RAM? What's the big deal?
Yes, but... Canon code saves some of their settings into ROM (yes, by reflashing). That's a problem - if their data structure gets corrupted by our programming mistake (which can happen anywhere), the side effect of our mistake is going to be saved into ROM.
And... guess what! The memory blocks used by Canon to store the settings are on the AllocateMemory heap (so, a simple buffer overflow on memory allocated from there can be enough for overwriting their data structures).
Of course, as long as the bootloader is not erased, it should be recoverable. BTW, the bootloader CAN be erased, even accidentally! (Can JTAG be used in this case?)
The current implementation is cherry-picked from the crop_rec_4k branch (explained here).
find stubs for all other cameras
confirm the following behavior on all models (this safeguard is very important IMO)
understand whether/when the other setting blocks (besides RING and RASEN) are saved into ROM (in particular, the CUSTOM block)
figure out how to disable ROM flashing (to prevent accidental writes, but as a side effect, Canon code will not be able to save their settings)
Tests to run:
ML should catch the battery door opening event (you should see a short LED blink)
Modules should load after opening the battery door (if there was no hard crash)
Modules should not load after a camera lock-up (e.g. after calling cli(); while(1);)
Canon settings* should NOT be saved when opening the battery door (they are, by default).
Canon settings should be saved on normal shutdown (including opening the card door).
Canon settings should NOT be saved after a crash, on any kind of shutdown (there is a dummy assert on don't click me for this test).
ML settings should be saved when opening the card door, aka normal shutdown (test by opening ML menu, changing some settings, then opening the card door without first closing ML menu).
*) For now, only some Canon settings are prevented from being saved (the ones from the RING block being the most important). The PAL/NTSC setting is one of them (useful for testing). Some of the settings are managed by the MPU; I'm not sure whether we can do anything about them.
I found the stubs for all of the other camera models. I’d like to test the cameras that I have available before committing the changes but don’t quite understand how to run all of the tests. Module loading after opening the battery door is working on EOSM, 700D and 7D. LED blink also happens with unified build, hard to see a difference. Not sure about the differences between this and the unified branch when it comes to saving ML and Canon settings tests.
I’m testing on 50D from your branch:
ML modules restart after opening battery door without error messages, but also Video Setting changes is saved when it should not.
Running “Don’t click me” will generate a crash log but will also save Canon settings changes and will load ML module after closing battery door
A short LED blink observed right after opening battery door, sometimes there is another shorter blink also present in other branches without those changes