Fix Mario Kart SNES Data Abort exception

Issue #14 resolved
Coto repo owner created an issue

The other SnemulDS branches (starting from below commit’s date) are now running with Memory Protection Unit (MPU) enabled. So it’ll be easier for other devs maintaining SnemulDS TGDS to debug crashes. The default exception handler will throw the current ARM9 processor state the moment the exception was caused, because of SnemulDS codebase / emulator bugs.

Well, we’ve got a first use case now, because Super Mario Kart is crashing when transitioning into the character screen, as evidenced by a MPU Data Abort exception here:

Coto88 / snemulds / Commit 3c5fa67b7f35 — Bitbucket

Fix this.

//////////// HOW THE NINTENDO DS MEMORY PROTECTION UNIT (MPU) WORKS ////////////

When building SnemulDS on TGDS, you’ll get 2 binaries: NTR and TWL modes. In this example i’ll show how to debug the NTR one.

The NintendoDS has 2 types of exception: Prefetch Abort Exception & Data Abort Exception.

Prefetch Abort Exception (PAE): Takes place when you’re executing garbage (that is, the ARM9 [ARM946E-S model] Program Counter register (R15) jumps into memory not containing ARM946E-S ARM or Thumb-1 code.

Data Abort Exception (DAE): Takes place when you try to read or write memory to either:

  • Unmapped memory in the NTR/TWL hardware.
  • Memory Region Protected by User is either enabled to be read / written (that is, you define one of the 8 regions of memory in some kind of write/read white list , by granting write/read rights on the ARM9 core, anything else is forbidden and will trigger a DAE)
  • Memory Region Protected by User is explicitly prohibited to be read / written (that is, you define one of the 8 regions of memory in some kind of write/read black list, by prohibiting write/read rights on the ARM9 core. If that happens a DAE will trigger, anything else is allowed won’t trigger a DAE, but you’ll get garbage on reads, and writes will be ignored).
  • Mixed Memory Protection Unit’s Region Protected settings by last 2 scenarios, where as a Region may be marked as prohibited as well others authored to write/read memory, in both cases where the region (0-7) is blacklisted, it’ll trigger a DAE when a write/read happens as well in other region (0-7) is whitelisted, will trigger a DAE when a write/read memory operation takes place outside it’s bounds. Normal use case of Mixed Memory Protection Unit’s Region Protected settings by ARM developers is to use a giant whitelist section, taking over the entire IO address of the system, and then for each region inside either a whitelist or a blacklist Region (0-7) is defined simultaneously. The effect of this is that you force the application running in baremetal ARM/Thumb-1 code to only write/read memory sections you’ve marked as whitelisted, the rest will throw exceptions if the ARM9 CPU tries to access it, since the MPU will lockstep first from Region Granted (0-7) whitelist mode (from highest priority, 0), allowing a write/read operation, into a Region Prohibited (1-7) section, which will cause a MPU DAE and trigger the Debug Exception Handler (hardcoded into NTR/TWL Bios vector) as long the MPU is running mapped at 0xFFFF0000+, if the aplication were to write/read to any of the former Region Prohibited (1-7) section(s).

So, After you build SnemulDS TGDS, head into the /arm9 folder, and look out for the “arm9.map” text file, open it and examine where all SnemulDS codebase/emulator data and program sections are (program sections are C methods, and data sections are C data).

Now, you need to get a MPU PAE or DAE blue screen issued. (in other words, a segfault, but trapped by the NintendoDS MPU hardware). If you get one and want to fix your software bug, continue reading below.

Now inspect from the NTR/TWL hardware bottom screen, its R15 value. You’ll quickly know if:

  • it’s a PAE; some garbage address instead of an address between the 0x020nnnnn~0x024nnnnn range (because the SnemulDS ARM9 binary lives on the 0x02000000 ~ 0x023FFFFF or 4MB area)
  • it’s a DAE; Valid address between the 0x020nnnnn~0x024nnnnn range (because the SnemulDS ARM9 binary lives on the 0x02000000 ~ 0x023FFFFF or 4MB area), but inspecting any other register file values (R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14), you’ll see garbage. Since we’re using a C compiler (GCC for ARM, maintained by ARM corporation themselves) adopting the “ARM calling convention”, that’s the context you use from now on. According to said convention, 99% of the time, a DAE will have garbage between it’s R0, R1, R2 and R3 value, because we’re passing an invalid value into a function, and the function inside will try to write/read from said address… yes, you guessed right, an out-of-bounds buffer write/read access or maybe an out of bounds struct pointer reference.

Tips to deal with DAE:

  • Sanitize buffer inputs inside functions or make sure whatever data you’re working with is correctly assigned to it’s boundaries, for instance, on real SNES hardware, due to programming bugs, some games will try to write outside it’s OAM or Background memory (maybe because the character is outside the map it’s supposed to be) and the SNES hardware will ignore writes to it. But on an emulator, said bounds aren’t sanitized, and you’ll end up writing outside the buffer allocated at C level. On a debugger, it’ll warn you about this. But on NintendoDS, you’ll get a DAE instead.

  • Check for array boundaries, if you’ve got a struct with 16 entries, and you’re passing a reference to it, it’s a pointer. Make sure the function receiving said pointer checks if said pointer is within it’s 16 entries (0 ~ 15 in programs because data starts from 0). You can do this by getting the struct collection size and comparing each pointer for each entry against the pointer you want to work with, as well its element index so it doesn’t go lower than 0 or higher than it’s struct collection size (-1, because we start from 0 in system structures)………… in other words, sanitizing inputs.

Tips to deal with PAE:

  • sanitize Callbacks
  • sanitize jumptables
  • self modifier code loading wrong page (SnemulDS crashes are mostly this if ROM size is over 3MB and you’re using NTR Mode Memory (4MB), which defaults to rom page streaming, which is bugged. And the reason is because the emulator will fetch it’s next opcode depending on a SNES CPU register context (Program Counter), mapped dynamically per page, and the Program Counter table comes off SnezziDS’s ARM core SNES CPU opcodes implementation, a jump table, and reading a garbage SNES page block, will cause the jump table to go out of bounds since the index is stored for each 65c816 opcode implemented in said SnezziDS ARM Core SNES CPU implementation specific opcode)

Comments (16)

  1. Log in to comment