Spc_Cpu.cpp:492 always assert()s

Issue #18 resolved
R J created an issue

The assert is meant to make sure that the read actually addresses valid memory, but the math only works out correctly if the result is signed.

The last clause is meant to capture situations where the emulated CPU is wrapping around from addresses at the very top of memory to the very bottom – i.e. where A16 would be high, if it existed.

But signed-ness of enums is undefined behavior, and if the expression is compiled with promoting to unsigned 32-bit integers (as is apparently the default with gcc), any SPC file that tries to use this wrap-around will cause an assert().

I'm pretty confident that this is the reason that the assert() triggers, but I'm not completely certain that it's sufficient to guarantee safety. It'd be nice if you could double-check me.

There are a few example SPC files that trigger the assert here - https://forums.nesdev.com/viewtopic.php?p=215169#p215169 - specifically tracks 2, 3, 10, 29, and 44.

Comments (17)

  1. Michael Pyne repo owner

    Taking a look. Though at this point I'm just amazed that someone was able to essentially emulate the Genesis audio in SNES SPC, this is really cool!

  2. Michael Pyne repo owner

    So at first I thought I was going crazy because I've been unable to reproduce crashes, even after making sure I was building in debug mode, not disabling assertions by accident, and I even made sure to checkout 0.6.2 instead of current git master, since the linked thread mentioned 0.6.2 was also crashing.

    None of that caused any of the tracks you mentioned to crash the sample gme_player or GStreamer.

    But... I rebuilt with GCC sanitizers (address sanitizer and undefined behavior sanitizer) and I'm finally getting a useful error message:

    kde@midna ~/projects/bitbucket/game-music-emu/gme-0.6.2/build/player $ ./gme_player ~/Music/vgm/spc/sonic/10_BOSS_Sonic12_Continue.spc 
    ../gme/Spc_Cpu.h:298:20: runtime error: left shift of negative value -12033
    ../gme/Spc_Cpu.h:298:43: runtime error: left shift of negative value -788594688
    ../gme/Spc_Cpu.cpp:471:32: runtime error: index 65539 out of bounds for type 'unsigned char [65536]'
    ../gme/Spc_Dsp.cpp:502:50: runtime error: left shift of negative value -4
    

    In particular the third line is likely what is inflicting the crash -- it doesn't appear on some of the other files that were said to work, but the other warnings do still appear even with the other tracks I've tried.

  3. Michael Pyne repo owner

    Also, for posterity's sake, the thread had some complaints about the quality of the fixes instituted in the wake of the libgme CVE from a couple years back... I don't disagree, but I don't actually know a ton about SPC in the first place, that was Blargg but he's busy, turned over maintainership of this to me since I had contributed the build system support for other platforms besides Windows.

    I'll fix this to the extent that I can (at least to stop it crashing) but if there's anyone available on that forum with expertise who wants to help, that is always appreciated. :)

    I can keep doing the packaging and other scut tasks but I'm never going to have time to investigate all these deeper emulation nuances by myself, I have a day job and other higher-priority OSS coding I do.

  4. Michael Pyne repo owner

    Reproduced the crash, based on the hints in the thread about relying on undefined behavior with enums => signed int.

    By compiling with optimization (-O3) in my case but without using CMake's specific Release or RelWithDebInfo build types (those will disable assertions), I was able to get the assertion to fire.

    In essence this means you need to supply your own build flags with optimization enabled.

  5. R J reporter

    Yeah, blargg put in an extra 256-byte array after the 65536-byte "actual SPC RAM" to allow skipping bounds checking — you can see it in Snes_Spc.h:190

    The 6502-like CPU inside the SPC700 can, like the 6502 itself, address memory up as high as 0xFFFF+0xFF. In hardware, the overflow into the fictional A16 is entirely ignored, so this becomes access to 0x00FE.

    Does the expression "r_t0out + 0xF0 -0x10000" evaluate to a negative value for you? Maybe this is a 32-bit vs 64-bit issue?

    ... How on earth does uint16_t addr evaluate as 65539 on line 457? Does compiling with -O3 causes the uint16_t to be promoted to a 32-bit type? (loads in gdb, disassembles)

    ... is this actually a bug in gcc's optimizer? It asserts on -Os, -O2, or -O3, but not on -O1. But it also doesn't assert if I start with -O1 and add all the -f options that -O2 theoretically adds... ???

  6. Michael Pyne repo owner

    So now that I've been able to reproduce, I've tried it on the latest git master (62a9e9e185ab997277047ed0489c1fbe034abce1) and I believe I've already fixed the crash bug unintentionally.

    It would have been commit 9c5cad7cd1fb83a12e08ba330184901842719ad8 (though there were a couple of cleanup commits to that).

    If you're able to test whether you experience the crash with current git that would be very helpful. I've been holding off on a new release because there were a couple of large changes that I wanted there to be time to test on, but there's been no significant reports other than a build failure (now fixed) so this would be a good reason to release 0.7.0 if it fixes this.

  7. R J reporter

    I agree, you appear to have already fixed it. My comment about uint16_t was because I kept on getting confused as to whether I was staring at HEAD or 0.6.2.

    As far as I can tell, if addr is actually 16-bit, then the else{} that contains the assert shouldn't actually ever trigger at all—the math should work out that (reg-0xFD) will never be more than 3 anymore. if (reg-0x100) is also less than 0

    Thanks, and sorry for causing the worry!

  8. Michael Pyne repo owner

    Does the expression "r_t0out + 0xF0 -0x10000" evaluate to a negative value for you? Maybe this is a 32-bit vs 64-bit issue?

    I'd have to check but I suspect it's actually literally some weird undefined behavior calculation. I can only assume the compiler decides that half the calculation can't happen anyways because it would imply signed overflow => dead code elimination pass occurs => all of a sudden the assertion fires because only half the calculation got performed.

    ... How on earth does uint16_t addr evaluate as 65539 on line 457? Does compiling with -O3 causes the uint16_t to be promoted to a 32-bit type?

    I don't think this can happen in the latest git version because I changed the parameter type of addr. In 0.6.2 this is possible because there the type of addr was plain int (which is why the error is on line 471).

    I think it's still an error in that case, but I was never able to figure out why the addr would have been larger than 0xFFFF+0xFF (I had figured out that the padding was probably due to some kind of hardware magic but didn't know the details).

  9. Michael Pyne repo owner

    No worries, this was at least an interesting troubleshooting exercise, I've been trying to understand the SPC better since I maintain this thing and all.

  10. Michael Pyne repo owner

    So while the bug is closed, I think the original cause is an off-by-one of some sort, probably elsewhere in libgme. The assertion itself looks OK given what you've explained about the 6502-style mapping of registers to memory addresses.

    See https://godbolt.org/g/T1Ggr9

    In the disassembly, the assertion check is on line 19-20 (testing whether to jump to label .L5 or not). GCC determines that a reg of greater than 65538 should trip the assertion... and we've already seen in earlier error messages that libgme 0.6.2 could generate an index of 65539.

    Also you're right about the math on the 16-bit addr... if you convert the type of addr to u_int16_t in the godbolt window then the compiler removes the entire assertion.

  11. R J reporter

    Now that I've been digging around for a bit using godbolt and familiarizing myself with x86-64 calling convention...

    There's definitely something fishy going on with gcc and -O3. If the assert didn't abort, it's trivially doing the wrong math: sub edi, 131072 is right out. (Should be 65536)

    On all lower optimization levels, it's using the correct number.

    I don't think it's an off-by-one, however. Manually executing the function the flow looks correctly up until the assert itself-

    • edi contains int addr
    • edx contains int reg

    and the compiler decides whatever cleverness with reusing reg is stupid and re-evaluates all the expressions relative to addr again.

    • sub edx, 240 / js .L1 is 'reg = addr - 0xF0; if ( reg >= 0)'
    • lea ecx, [rdi-256] / cmp ecx, 65279 / jbe .L1 is 'reg = addr - 0xF0 - 0x10; if ((unsigned) reg > 0xFEFF)'
    • lea eax, [rdi-253] / cmp eax, 2 / jbe .L8 is 'reg = addr - 0xF0 - 0x10 + 0x10 - r_t0out; if ((unsigned) reg < 3)'
    • test eax, eax / js .L3 is 'else if (reg < 0)'

    for the assert, if you do some standard moving things around:

    • reg + r_t0out + 0xF0 - 0x10000 < 0x100
    • reg + r_t0out + 0xF0 < 0x10100
    • reg + r_t0out < 0x10010
    • reg + 13 < 0x10010
    • reg < 0x10003
    • addr - 253 < 0x10003
    • addr < 0x10100 -- looks correct

    ... but now I'm confused why it's assert()ing.

    If I look at the disassembly of Snes_Spc::read_cpu ... when compiled with -O3, it never does the test for the assert. It just always __assert_fail()s.

  12. Michael Pyne repo owner

    There's definitely something fishy going on with gcc and -O3. If the assert didn't abort, it's trivially doing the wrong math: sub edi, 131072 is right out. (Should be 65536)

    The math is wrong but the effect might be identical (this the problem with optimizers in general). Since the result is immediately sign-extended into the same register then subtracting by any multiple of 65536 would seem to be equivalent in effect.

    addr < 0x10100 -- looks correct

    I agree, and that is actually what gcc seems to be calculating (but in different terms, which is why it looks for eax to be <= 65538, since that is the value of the proper bound for addr when given the same series of calculations applied to regs by the time the immediate prior conditional statement is checked). 65539 is just the 0x10003 you calculated by hand as the first out-of-bounds value so I think gcc's math is right here.

    That's why I think the off-by-one might be elsewhere in libgme, something is mistakenly allowing the emulated hardware to attain a value of 65539 (i.e. calculated from an input addr of 0x10101) that the real hardware wouldn't generate, breaking this function as a result.

    The assert is incredibly confusing in retrospect: it's just trying to ensure that addr is < 0x10100, by reversing the calculations used to derive regs from addr. But addr is still in scope, we can just use that directly.

    To make this all more the interesting, even the broken version of libgme (0.6.2) doesn't seem to crash for me under clang-6.0.0, even with assertions enabled (and I've even verified the check for the assertion is there in the disassembly, so it's not that clang got too smart somehow).

    So if it is a problem with undefined behavior somewhere, it may be that it's breaking compilation elsewhere but the bug is only noticed when it reaches this code path.

    If I look at the disassembly of Snes_Spc::read_cpu ... when compiled with -O3, it never does the test for the assert. It just always __assert_fail()s.

    This is expected if you're looking at HEAD. Though given the behavior you describe for accesses to [0x10000-0x10100), it looks like I need to revert that part of the change even if it's not perceptibly changing the audio output?

  13. R J reporter

    The math is wrong but the effect might be identical (this the problem with optimizers in general). Since the result is immediately sign-extended into the same register then subtracting by any multiple of 65536 would seem to be equivalent in effect.

    I don't think that's true in this specific case. It's sign extending from a 32-bit type to a 64-bit type here; errors modulo 0x10000 shouldn't be concealed by that.

    As far as I can tell, the function "should" be called with 0≤addr<0x10100. Wraparound seems to be implemented here and no-where else.

    I'm definitely working with 0.6.2 (via Debian).

    I've been testing against 02_BOSS_Sonic1_Green_Hill.spc just because it's the first one in the list. With full debugging (on an -O3 binary, sigh) it shows ... why is there so much preprocessor magic :(

    given the breakpoint Snes_Spc::cpu_read if addr >= 0x10000, it enters the function with addr = 0x10003 (as anticipated), and the "pc" variable in the caller points to a byte 0xCA ( pc-ram = 0xA46 — in the file this shows up at 0xB46 because there's a 0x100 byte header)...

    Digging through the preprocessor magic, this is actually misleading—PC has already been incremented once and we want to look at the previous byte, which is 0xF7. That should be MOV A, [d]+Y or LDA (z),y in 6502 syntax. The immediate parameter—0xCA—means we should look at the two bytes in RAM at that address (they're 0xFF 0xFF), treat that as a little-endian pointer (0xFFFF), add Y (which is 4) and that gets us ... 0x10003.

    I'm pretty certain it's doing the right thing here. I mean, the emulated machine's trying to do some kind of nonsense (and whether it should be is a harder question), but I could just as easily set up a tiny trivial test SPC file that just tries to MOV Y, #$FF / MOV A, !$FFFF+Y or LDY #$FF / LDA $FFFF,y and I bet it would still crash in the same way.

    Anyway, when I look at the optimizer output of read_cpu:

    Dump of assembler code for function Snes_Spc::cpu_read(int, int):
     0x00007ffff72f95f0 <+0>:   mov    r8d,esi
     0x00007ffff72f95f3 <+3>:   movsxd rax,esi
     0x00007ffff72f95f6 <+6>:   sub    r8d,0xf0
     0x00007ffff72f95fd <+13>:  movzx  eax,BYTE PTR [rdi+rax*1+0xb64]
     0x00007ffff72f9605 <+21>:  js     0x7ffff72f9770 <Snes_Spc::cpu_read(int, int)+384> ; returns
     0x00007ffff72f960b <+27>:  lea    ecx,[rsi-0x100]
     0x00007ffff72f9611 <+33>:  cmp    ecx,0xfeff
     0x00007ffff72f9617 <+39>:  jbe    0x7ffff72f9770 <Snes_Spc::cpu_read(int, int)+384> ; returns
     0x00007ffff72f961d <+45>:  lea    ecx,[rsi-0xfd]
     0x00007ffff72f9623 <+51>:  push   rbx
     0x00007ffff72f9624 <+52>:  mov    rbx,rdi
     0x00007ffff72f9627 <+55>:  cmp    ecx,0x2
     0x00007ffff72f962a <+58>:  jbe    0x7ffff72f9650 <Snes_Spc::cpu_read(int, int)+96> ; when addr was in range of 0xFD through 0xFF
     0x00007ffff72f962c <+60>:  test   ecx,ecx
     0x00007ffff72f962e <+62>:  js     0x7ffff72f9688 <Snes_Spc::cpu_read(int, int)+152> ; when addr was in range of 0xF0 through 0xFC
     0x00007ffff72f9630 <+64>:  lea    rcx,[rip+0x118a9]        # 0x7ffff730aee0 <_ZZN8Snes_Spc8cpu_readEiiE19__PRETTY_FUNCTION__>
     0x00007ffff72f9637 <+71>:  lea    rsi,[rip+0x11a22]        # 0x7ffff730b060
     0x00007ffff72f963e <+78>:  lea    rdi,[rip+0x11a83]        # 0x7ffff730b0c8
     0x00007ffff72f9645 <+85>:  mov    edx,0x1f0
     0x00007ffff72f964a <+90>:  call   0x7ffff72b5020 <__assert_fail@plt> ; there's no test here, it just goes straight to __assert_fail !
    
  14. Michael Pyne repo owner

    The optimizer output that you list is correct.

    The direct jump at +90 is guarded by the js opcode at +62. That is testing whether the result of comparing ecx against itself set the signed flag (i.e. if it's < 0). This is the "else if ( reg < 0 )" clause from the source. If reg is <0 then +62 jumps to +152 to call cpu_read_smp_reg.

    Just more optimizer magic I think, this is the third different way I've seen this test compiled in 2 days :)

    Either way I do think the assertion is OK in what it's looking for and that the error (if any) is outside of this function. I wonder what the real hardware does for memory accesses this far out of bounds (clearly it can't assert, but maybe it halts?).

    So maybe there's a gcc optimizer problem after all (like I said, no crash with clang even with 0.6.2)? It's all very weird...

  15. Michael Pyne repo owner

    Actually, no, I was wrong, you were right on the optimizer output thing. I somehow forgot all about the conditional test baked into assert itself, which is what you were talking about.

    I mention this because I finally figured it out... it's a miscompilation based on undefined behavior.

    If you change the line of code int result = RAM [addr]; to int result = (addr < sizeof RAM) ? RAM [addr] : 0; then you get a working function that is able to access memory above 0x10000.

    I believe the issue is that GCC was treating the RAM [addr] access as an implicit acknowledgement that addr's value had to be < 0x10000. Combine that with trying to subtract 0x10000 from a value that is "clearly" already less than 0x10000, and now you're about to pass a negative value in the recursive call to cpu_read.

    Something something something optimizers and basically _assert_fail is just allowed to fire unconditionally if you end up in that if/else branch for any reason due to undefined behavior.

    If you change the access to RAM in the way that I described then there's never a "bad" access to memory and the compiler has to behave...

  16. R J reporter

    ... Wow, that's definitively in the realm of "clever enough to be obfuscatory".

    I've noticed elsewhere in game-music-emu that sometimes included the protective layer (permitting unchecked out-of-bounds access) as part of the ram array (see Ay_Emu.h "byte ram[0x10000 + 0x100]") and sometimes as a separate arrays (Sap_Emu.h, Spc_Emu.h). I'm certain there's no answer available, but I do wonder why they differ.

  17. Michael Pyne repo owner

    Ensure padding to guard emulator memory access is in the memory block.

    Using a separate padding block, even if it can be immediately adjacent in the overall struct, is undefined behavior and newer compilers are starting to miscompile code using this kind of guard.

    See Issue 18 for an example.

    → <<cset b2dc7f059da9>>

  18. Log in to comment