Spc_Cpu.cpp:492 always assert()s
The assert is meant to make sure that the read actually addresses valid memory, but the math only works out correctly if the result is signed.
The last clause is meant to capture situations where the emulated CPU is wrapping around from addresses at the very top of memory to the very bottom – i.e. where A16 would be high, if it existed.
But signed-ness of enums is undefined behavior, and if the expression is compiled with promoting to unsigned 32-bit integers (as is apparently the default with gcc), any SPC file that tries to use this wrap-around will cause an assert().
I'm pretty confident that this is the reason that the assert() triggers, but I'm not completely certain that it's sufficient to guarantee safety. It'd be nice if you could double-check me.
There are a few example SPC files that trigger the assert here - https://forums.nesdev.com/viewtopic.php?p=215169#p215169 - specifically tracks 2, 3, 10, 29, and 44.
Comments (17)
-
repo owner -
repo owner So at first I thought I was going crazy because I've been unable to reproduce crashes, even after making sure I was building in debug mode, not disabling assertions by accident, and I even made sure to checkout 0.6.2 instead of current git master, since the linked thread mentioned 0.6.2 was also crashing.
None of that caused any of the tracks you mentioned to crash the sample
gme_player
or GStreamer.But... I rebuilt with GCC sanitizers (address sanitizer and undefined behavior sanitizer) and I'm finally getting a useful error message:
kde@midna ~/projects/bitbucket/game-music-emu/gme-0.6.2/build/player $ ./gme_player ~/Music/vgm/spc/sonic/10_BOSS_Sonic12_Continue.spc ../gme/Spc_Cpu.h:298:20: runtime error: left shift of negative value -12033 ../gme/Spc_Cpu.h:298:43: runtime error: left shift of negative value -788594688 ../gme/Spc_Cpu.cpp:471:32: runtime error: index 65539 out of bounds for type 'unsigned char [65536]' ../gme/Spc_Dsp.cpp:502:50: runtime error: left shift of negative value -4
In particular the third line is likely what is inflicting the crash -- it doesn't appear on some of the other files that were said to work, but the other warnings do still appear even with the other tracks I've tried.
-
repo owner Also, for posterity's sake, the thread had some complaints about the quality of the fixes instituted in the wake of the libgme CVE from a couple years back... I don't disagree, but I don't actually know a ton about SPC in the first place, that was Blargg but he's busy, turned over maintainership of this to me since I had contributed the build system support for other platforms besides Windows.
I'll fix this to the extent that I can (at least to stop it crashing) but if there's anyone available on that forum with expertise who wants to help, that is always appreciated. :)
I can keep doing the packaging and other scut tasks but I'm never going to have time to investigate all these deeper emulation nuances by myself, I have a day job and other higher-priority OSS coding I do.
-
repo owner Reproduced the crash, based on the hints in the thread about relying on undefined behavior with enums => signed int.
By compiling with optimization (-O3) in my case but without using CMake's specific
Release
orRelWithDebInfo
build types (those will disable assertions), I was able to get the assertion to fire.In essence this means you need to supply your own build flags with optimization enabled.
-
reporter Yeah, blargg put in an extra 256-byte array after the 65536-byte "actual SPC RAM" to allow skipping bounds checking — you can see it in Snes_Spc.h:190
The 6502-like CPU inside the SPC700 can, like the 6502 itself, address memory up as high as 0xFFFF+0xFF. In hardware, the overflow into the fictional A16 is entirely ignored, so this becomes access to 0x00FE.
Does the expression "r_t0out + 0xF0 -0x10000" evaluate to a negative value for you? Maybe this is a 32-bit vs 64-bit issue?
... How on earth does uint16_t addr evaluate as 65539 on line 457? Does compiling with -O3 causes the uint16_t to be promoted to a 32-bit type? (loads in gdb, disassembles)... is this actually a bug in gcc's optimizer? It asserts on -Os, -O2, or -O3, but not on -O1. But it also doesn't assert if I start with -O1 and add all the -f options that -O2 theoretically adds... ??? -
repo owner - changed status to resolved
So now that I've been able to reproduce, I've tried it on the latest git master (62a9e9e185ab997277047ed0489c1fbe034abce1) and I believe I've already fixed the crash bug unintentionally.
It would have been commit 9c5cad7cd1fb83a12e08ba330184901842719ad8 (though there were a couple of cleanup commits to that).
If you're able to test whether you experience the crash with current git that would be very helpful. I've been holding off on a new release because there were a couple of large changes that I wanted there to be time to test on, but there's been no significant reports other than a build failure (now fixed) so this would be a good reason to release 0.7.0 if it fixes this.
-
reporter I agree, you appear to have already fixed it. My comment about uint16_t was because I kept on getting confused as to whether I was staring at HEAD or 0.6.2.
As far as I can tell, if addr is actually 16-bit, then the else{} that contains the assert shouldn't actually ever trigger at all—the math should work out that (reg-0xFD) will never be more than 3 anymore. if (reg-0x100) is also less than 0
Thanks, and sorry for causing the worry!
-
repo owner Does the expression "r_t0out + 0xF0 -0x10000" evaluate to a negative value for you? Maybe this is a 32-bit vs 64-bit issue?
I'd have to check but I suspect it's actually literally some weird undefined behavior calculation. I can only assume the compiler decides that half the calculation can't happen anyways because it would imply signed overflow => dead code elimination pass occurs => all of a sudden the assertion fires because only half the calculation got performed.
... How on earth does uint16_t addr evaluate as 65539 on line 457? Does compiling with -O3 causes the uint16_t to be promoted to a 32-bit type?
I don't think this can happen in the latest git version because I changed the parameter type of
addr
. In 0.6.2 this is possible because there the type ofaddr
was plainint
(which is why the error is on line 471).I think it's still an error in that case, but I was never able to figure out why the
addr
would have been larger than 0xFFFF+0xFF (I had figured out that the padding was probably due to some kind of hardware magic but didn't know the details). -
repo owner No worries, this was at least an interesting troubleshooting exercise, I've been trying to understand the SPC better since I maintain this thing and all.
-
repo owner So while the bug is closed, I think the original cause is an off-by-one of some sort, probably elsewhere in libgme. The assertion itself looks OK given what you've explained about the 6502-style mapping of registers to memory addresses.
See https://godbolt.org/g/T1Ggr9
In the disassembly, the assertion check is on line 19-20 (testing whether to jump to label .L5 or not). GCC determines that a
reg
of greater than 65538 should trip the assertion... and we've already seen in earlier error messages that libgme 0.6.2 could generate an index of 65539.Also you're right about the math on the 16-bit addr... if you convert the type of
addr
tou_int16_t
in the godbolt window then the compiler removes the entire assertion. -
reporter Now that I've been digging around for a bit using godbolt and familiarizing myself with x86-64 calling convention...
There's definitely something fishy going on with gcc and -O3. If the assert didn't abort, it's trivially doing the wrong math:
sub edi, 131072
is right out. (Should be 65536)On all lower optimization levels, it's using the correct number.
I don't think it's an off-by-one, however. Manually executing the function the flow looks correctly up until the assert itself-
edi
contains int addredx
contains int reg
and the compiler decides whatever cleverness with reusing reg is stupid and re-evaluates all the expressions relative to addr again.
sub edx, 240 / js .L1
is 'reg = addr - 0xF0; if ( reg >= 0)'lea ecx, [rdi-256] / cmp ecx, 65279 / jbe .L1
is 'reg = addr - 0xF0 - 0x10; if ((unsigned) reg > 0xFEFF)'lea eax, [rdi-253] / cmp eax, 2 / jbe .L8
is 'reg = addr - 0xF0 - 0x10 + 0x10 - r_t0out; if ((unsigned) reg < 3)'test eax, eax / js .L3
is 'else if (reg < 0)'
for the assert, if you do some standard moving things around:
- reg + r_t0out + 0xF0 - 0x10000 < 0x100
- reg + r_t0out + 0xF0 < 0x10100
- reg + r_t0out < 0x10010
- reg + 13 < 0x10010
- reg < 0x10003
- addr - 253 < 0x10003
- addr < 0x10100 -- looks correct
... but now I'm confused why it's assert()ing.
If I look at the disassembly of Snes_Spc::read_cpu ... when compiled with -O3, it never does the test for the assert. It just always __assert_fail()s.
-
repo owner There's definitely something fishy going on with gcc and -O3. If the assert didn't abort, it's trivially doing the wrong math: sub edi, 131072 is right out. (Should be 65536)
The math is wrong but the effect might be identical (this the problem with optimizers in general). Since the result is immediately sign-extended into the same register then subtracting by any multiple of 65536 would seem to be equivalent in effect.
addr < 0x10100 -- looks correct
I agree, and that is actually what gcc seems to be calculating (but in different terms, which is why it looks for
eax
to be <= 65538, since that is the value of the proper bound foraddr
when given the same series of calculations applied toregs
by the time the immediate prior conditional statement is checked). 65539 is just the 0x10003 you calculated by hand as the first out-of-bounds value so I think gcc's math is right here.That's why I think the off-by-one might be elsewhere in libgme, something is mistakenly allowing the emulated hardware to attain a value of 65539 (i.e. calculated from an input
addr
of 0x10101) that the real hardware wouldn't generate, breaking this function as a result.The assert is incredibly confusing in retrospect: it's just trying to ensure that
addr
is < 0x10100, by reversing the calculations used to deriveregs
fromaddr
. Butaddr
is still in scope, we can just use that directly.To make this all more the interesting, even the broken version of libgme (0.6.2) doesn't seem to crash for me under clang-6.0.0, even with assertions enabled (and I've even verified the check for the assertion is there in the disassembly, so it's not that clang got too smart somehow).
So if it is a problem with undefined behavior somewhere, it may be that it's breaking compilation elsewhere but the bug is only noticed when it reaches this code path.
If I look at the disassembly of Snes_Spc::read_cpu ... when compiled with -O3, it never does the test for the assert. It just always __assert_fail()s.
This is expected if you're looking at HEAD. Though given the behavior you describe for accesses to [0x10000-0x10100), it looks like I need to revert that part of the change even if it's not perceptibly changing the audio output?
-
reporter The math is wrong but the effect might be identical (this the problem with optimizers in general). Since the result is immediately sign-extended into the same register then subtracting by any multiple of 65536 would seem to be equivalent in effect.
I don't think that's true in this specific case. It's sign extending from a 32-bit type to a 64-bit type here; errors modulo 0x10000 shouldn't be concealed by that.
As far as I can tell, the function "should" be called with 0≤addr<0x10100. Wraparound seems to be implemented here and no-where else.
—
I'm definitely working with 0.6.2 (via Debian).
I've been testing against
02_BOSS_Sonic1_Green_Hill.spc
just because it's the first one in the list. With full debugging (on an -O3 binary, sigh) it shows ... why is there so much preprocessor magic :(given the breakpoint
Snes_Spc::cpu_read if addr >= 0x10000
, it enters the function with addr = 0x10003 (as anticipated), and the "pc" variable in the caller points to a byte 0xCA ( pc-ram = 0xA46 — in the file this shows up at 0xB46 because there's a 0x100 byte header)...Digging through the preprocessor magic, this is actually misleading—PC has already been incremented once and we want to look at the previous byte, which is 0xF7. That should be
MOV A, [d]+Y
orLDA (z),y
in 6502 syntax. The immediate parameter—0xCA—means we should look at the two bytes in RAM at that address (they're 0xFF 0xFF), treat that as a little-endian pointer (0xFFFF), add Y (which is 4) and that gets us ... 0x10003.I'm pretty certain it's doing the right thing here. I mean, the emulated machine's trying to do some kind of nonsense (and whether it should be is a harder question), but I could just as easily set up a tiny trivial test SPC file that just tries to
MOV Y, #$FF / MOV A, !$FFFF+Y
orLDY #$FF / LDA $FFFF,y
and I bet it would still crash in the same way.—
Anyway, when I look at the optimizer output of read_cpu:
Dump of assembler code for function Snes_Spc::cpu_read(int, int): 0x00007ffff72f95f0 <+0>: mov r8d,esi 0x00007ffff72f95f3 <+3>: movsxd rax,esi 0x00007ffff72f95f6 <+6>: sub r8d,0xf0 0x00007ffff72f95fd <+13>: movzx eax,BYTE PTR [rdi+rax*1+0xb64] 0x00007ffff72f9605 <+21>: js 0x7ffff72f9770 <Snes_Spc::cpu_read(int, int)+384> ; returns 0x00007ffff72f960b <+27>: lea ecx,[rsi-0x100] 0x00007ffff72f9611 <+33>: cmp ecx,0xfeff 0x00007ffff72f9617 <+39>: jbe 0x7ffff72f9770 <Snes_Spc::cpu_read(int, int)+384> ; returns 0x00007ffff72f961d <+45>: lea ecx,[rsi-0xfd] 0x00007ffff72f9623 <+51>: push rbx 0x00007ffff72f9624 <+52>: mov rbx,rdi 0x00007ffff72f9627 <+55>: cmp ecx,0x2 0x00007ffff72f962a <+58>: jbe 0x7ffff72f9650 <Snes_Spc::cpu_read(int, int)+96> ; when addr was in range of 0xFD through 0xFF 0x00007ffff72f962c <+60>: test ecx,ecx 0x00007ffff72f962e <+62>: js 0x7ffff72f9688 <Snes_Spc::cpu_read(int, int)+152> ; when addr was in range of 0xF0 through 0xFC 0x00007ffff72f9630 <+64>: lea rcx,[rip+0x118a9] # 0x7ffff730aee0 <_ZZN8Snes_Spc8cpu_readEiiE19__PRETTY_FUNCTION__> 0x00007ffff72f9637 <+71>: lea rsi,[rip+0x11a22] # 0x7ffff730b060 0x00007ffff72f963e <+78>: lea rdi,[rip+0x11a83] # 0x7ffff730b0c8 0x00007ffff72f9645 <+85>: mov edx,0x1f0 0x00007ffff72f964a <+90>: call 0x7ffff72b5020 <__assert_fail@plt> ; there's no test here, it just goes straight to __assert_fail !
-
repo owner The optimizer output that you list is correct.
The direct jump at +90 is guarded by the
js
opcode at +62. That is testing whether the result of comparingecx
against itself set thesigned
flag (i.e. if it's < 0). This is the "else if ( reg < 0 )" clause from the source. Ifreg
is <0 then +62 jumps to +152 to callcpu_read_smp_reg
.Just more optimizer magic I think, this is the third different way I've seen this test compiled in 2 days :)
Either way I do think the assertion is OK in what it's looking for and that the error (if any) is outside of this function. I wonder what the real hardware does for memory accesses this far out of bounds (clearly it can't assert, but maybe it halts?).
So maybe there's a gcc optimizer problem after all (like I said, no crash with clang even with 0.6.2)? It's all very weird...
-
repo owner Actually, no, I was wrong, you were right on the optimizer output thing. I somehow forgot all about the conditional test baked into assert itself, which is what you were talking about.
I mention this because I finally figured it out... it's a miscompilation based on undefined behavior.
If you change the line of code
int result = RAM [addr];
toint result = (addr < sizeof RAM) ? RAM [addr] : 0;
then you get a working function that is able to access memory above0x10000
.I believe the issue is that GCC was treating the
RAM [addr]
access as an implicit acknowledgement thataddr
's value had to be < 0x10000. Combine that with trying to subtract 0x10000 from a value that is "clearly" already less than 0x10000, and now you're about to pass a negative value in the recursive call tocpu_read
.Something something something optimizers and basically _assert_fail is just allowed to fire unconditionally if you end up in that if/else branch for any reason due to undefined behavior.
If you change the access to
RAM
in the way that I described then there's never a "bad" access to memory and the compiler has to behave... -
reporter ... Wow, that's definitively in the realm of "clever enough to be obfuscatory".
I've noticed elsewhere in game-music-emu that sometimes included the protective layer (permitting unchecked out-of-bounds access) as part of the ram array (see Ay_Emu.h "byte ram[0x10000 + 0x100]") and sometimes as a separate arrays (Sap_Emu.h, Spc_Emu.h). I'm certain there's no answer available, but I do wonder why they differ.
-
repo owner Ensure padding to guard emulator memory access is in the memory block.
Using a separate padding block, even if it can be immediately adjacent in the overall struct, is undefined behavior and newer compilers are starting to miscompile code using this kind of guard.
See Issue 18 for an example.
→ <<cset b2dc7f059da9>>
- Log in to comment
Taking a look. Though at this point I'm just amazed that someone was able to essentially emulate the Genesis audio in SNES SPC, this is really cool!