Immediate segfault on ARM target... would like debugging tips...
As the title states, I've cross-compiled the latest Inferno source to produce an "emu-g" and tried running it on an ARM9 processor w/o FPU where it immediately segfaults.
I've managed to capture a core file and have copied shared libraries from the target to my development host to open with arm-*-gdb, but due to a lack of symbols in the target shared libraries, I'm not getting any quick hints at where the problem happens (other than something trying to access or de-reference memory location 0x0).
For background, I've successfully run this same binary on a Raspbery Pi and then under qemu-arm using the same shared libraries from the target.
Also, I've compiled and run other fairly large projects (LuaJIT, RSC's libtask+custom code) for this target, so I'm relatively comfortable that the dev tools aren't (obviously at least) at fault.
Can any of you suggest a strategy for debugging this? I'll probably start littering the code with print statements to narrow this down, but I know there'a a more elegant path.
I'll also post this to the mailing list (don't flame me dudes).
Thanks!
Comments (10)
-
-
- marked as critical
-
reporter Thanks for the gut-check and hint. I'll give this a try and follow-up here.
-
reporter Ok, I'm being a bit of a keyboard-monkey now, but I'm questioning my sanity at this point.
To test the "coherence" thing, I went to my dev machine and blindly did a "hg pull ; hg update" then did a "mk CONF=emu-g nuke mkdirs install" (before making any additions) and got the following message:
arm-gcc -o o.emu-g asm-arm.o os.o kproc-pthreads.o segflush-arm.o emu-g.root.o lock.o devmnt.o devsrv.o srv.o ipif6-posix.o devmem.o devpipe.o devcmd.o cmd.o devssl.o devfs.o devcons.o devenv.o devprof.o devroot.o devprog.o devip.o devcap.o devdup.o devindir.o deveia.o ipaux.o cache.o proc.o random.o error.o uqid.o qio.o discall.o inferno.o exception.o chan.o env.o devtab.o dis.o print.o dial.o exportfs.o main.o parse.o alloc.o latin1.o dev.o sysfile.o pgrp.o errstr.o emu-g.o /home/josephs/arm/inferno-os/Linux/arm/lib/libinterp.a /home/josephs/arm/inferno-os/Linux/arm/lib/libmath.a /home/josephs/arm/inferno-os/Linux/arm/lib/libkeyring.a /home/josephs/arm/inferno-os/Linux/arm/lib/libsec.a /home/josephs/arm/inferno-os/Linux/arm/lib/libmp.a /home/josephs/arm/inferno-os/Linux/arm/lib/lib9.a -lm -lpthread lock.o: In function `lock': lock.c:(.text+0x8): undefined reference to `_tas' lock.c:(.text+0x1c): undefined reference to `_tas' lock.c:(.text+0x40): undefined reference to `_tas' lock.o: In function `canlock': lock.c:(.text+0x74): undefined reference to `_tas' collect2: ld returned 1 exit status mk: arm-gcc -c -DROOT="/home/josephs/arm/inferno-os" ... : exit status=exit(1) mk: echo "(cd $SYSTARG; ... : exit status=exit(1) mk: for j in ... : exit status=exit(1)
Unless I'm highly forgetful, I didn't see this the last time I built. Granted, I probably shouldn't have done the "hg update", but it's just a habit I've gotten into since I only re-visit Inferno every few months.
I see emu/Linux/arm-tas-v[57].S in the codebase. Trying to learn how these get (or doesn't get) selected.
What other demons have I awakened here, or is it just a personal hell?
VBR, -joe
-
It's my fault. If you do another pull -uv you'll get the tas file, and some mkfile changes to use it.
-
reporter I gave this a try. I had to tweak my gcc options for this to compile:
added -march=armv6, as -march=armv7 or -march=armv9 were complaining about illegal opcodes
Anyway, I still think I have an architecture mis-match, as when I run the code on the target now, I get an illegal instruction trap.
I've run out of time to investigate this today but will re-visit later.
-
reporter Update w/o success (one last try before stepping away):
I changed the target to -march=armv5 and edited emu/Linux/mkfile-arm to use the arm-tas-v5.S file and re-built, but still get a segfault.
I'll resume with print-style debugging later.
-
I've just ran into the problem with "coherence" on NetBSD. What happens is that emu provides its own malloc() that does locking and so eventually calls coherence(). If emu is dynamically linked, the init/fini code runs before main() has a chance to set coherence. If the init/fini code calls malloc() it calls emu's malloc, which calls coherence(), which is still null. In case of NetBSD it was some libc init code that was reading some sysctl nodes.
Note, BTW, that emu/port/fns.h defines, not declares "coherence", so a common copy ends up in each object and linker takes care of merging them. This is not a problem per se, but still kinda icky.
I've fixed this in my copy with https://bitbucket.org/nbuwe/inferno-os/commits/dbaf2f1a92f6b939c5c2bf6c5d4380bd71a2a7f9
-
The definition of coherence deliberately takes advantage of a C rule to avoid anyone having to change an existing fork of emu (eg, for a new platform), to account for the arrival of "coherence". If a given port doesn't define it, it will be nil and main will fill it in. If as on Windows or now NetBSD, it needs to be defined from the start, the os.c for that platform should do it, but still no other platforms need source changes.
The curious thing is that I can't find any implementation that I've got that does define it as anything other than nofence.
-
I haven't pushed the change yet but I've removed the reliance on being able to have multiple definitions.
- Log in to comment
"Can any of you suggest a strategy for debugging this? I'll probably start littering the code with print statements to narrow this down, but I know there'a a more elegant path."
That's what I ended up doing on Windows, where I had a similar problem recently, and I found it next to impossible to get Visual Studio's debugger to do anything useful, because I couldn't reproduce the magic I'd successfully used a few months earlier. I'd see how far it gets in startup. On Windows, it didn't get as far as main, which was a big clue. The problem turned out to be that something in the C language start-up before main, called something in Inferno (malloc probably) that needed lock, which needed coherence, which wasn't set.
I added
void (*coherence)(void) = nofence;
to emu/Nt/os.c. For ARM, it would need to be the right barrier for the particular architecture, but initially it will be running uniprocessor, so nofence is also good at start-up. If this turns out to be widespread on Linux and other host systems, I'll make that the default everywhere and override it later.