Immediate segfault on ARM target... would like debugging tips...

Issue #335 new
Joseph Stewart created an issue

As the title states, I've cross-compiled the latest Inferno source to produce an "emu-g" and tried running it on an ARM9 processor w/o FPU where it immediately segfaults.

I've managed to capture a core file and have copied shared libraries from the target to my development host to open with arm-*-gdb, but due to a lack of symbols in the target shared libraries, I'm not getting any quick hints at where the problem happens (other than something trying to access or de-reference memory location 0x0).

For background, I've successfully run this same binary on a Raspbery Pi and then under qemu-arm using the same shared libraries from the target.

Also, I've compiled and run other fairly large projects (LuaJIT, RSC's libtask+custom code) for this target, so I'm relatively comfortable that the dev tools aren't (obviously at least) at fault.

Can any of you suggest a strategy for debugging this? I'll probably start littering the code with print statements to narrow this down, but I know there'a a more elegant path.

I'll also post this to the mailing list (don't flame me dudes).

Thanks!

Comments (10)

  1. Charles Forsyth

    "Can any of you suggest a strategy for debugging this? I'll probably start littering the code with print statements to narrow this down, but I know there'a a more elegant path."

    That's what I ended up doing on Windows, where I had a similar problem recently, and I found it next to impossible to get Visual Studio's debugger to do anything useful, because I couldn't reproduce the magic I'd successfully used a few months earlier. I'd see how far it gets in startup. On Windows, it didn't get as far as main, which was a big clue. The problem turned out to be that something in the C language start-up before main, called something in Inferno (malloc probably) that needed lock, which needed coherence, which wasn't set.

    I added

    void (*coherence)(void) = nofence;

    to emu/Nt/os.c. For ARM, it would need to be the right barrier for the particular architecture, but initially it will be running uniprocessor, so nofence is also good at start-up. If this turns out to be widespread on Linux and other host systems, I'll make that the default everywhere and override it later.

  2. Joseph Stewart reporter

    Ok, I'm being a bit of a keyboard-monkey now, but I'm questioning my sanity at this point.

    To test the "coherence" thing, I went to my dev machine and blindly did a "hg pull ; hg update" then did a "mk CONF=emu-g nuke mkdirs install" (before making any additions) and got the following message:

    arm-gcc  -o o.emu-g asm-arm.o os.o kproc-pthreads.o segflush-arm.o emu-g.root.o lock.o devmnt.o devsrv.o srv.o ipif6-posix.o devmem.o devpipe.o devcmd.o cmd.o devssl.o devfs.o devcons.o devenv.o devprof.o devroot.o devprog.o devip.o devcap.o devdup.o devindir.o deveia.o ipaux.o cache.o proc.o random.o error.o uqid.o qio.o discall.o inferno.o exception.o chan.o env.o devtab.o dis.o print.o dial.o exportfs.o main.o parse.o alloc.o latin1.o dev.o sysfile.o pgrp.o errstr.o emu-g.o /home/josephs/arm/inferno-os/Linux/arm/lib/libinterp.a /home/josephs/arm/inferno-os/Linux/arm/lib/libmath.a /home/josephs/arm/inferno-os/Linux/arm/lib/libkeyring.a /home/josephs/arm/inferno-os/Linux/arm/lib/libsec.a /home/josephs/arm/inferno-os/Linux/arm/lib/libmp.a /home/josephs/arm/inferno-os/Linux/arm/lib/lib9.a -lm -lpthread
    lock.o: In function `lock':
    lock.c:(.text+0x8): undefined reference to `_tas'
    lock.c:(.text+0x1c): undefined reference to `_tas'
    lock.c:(.text+0x40): undefined reference to `_tas'
    lock.o: In function `canlock':
    lock.c:(.text+0x74): undefined reference to `_tas'
    collect2: ld returned 1 exit status
    mk: arm-gcc -c -DROOT="/home/josephs/arm/inferno-os" ...  : exit status=exit(1)
    mk: echo "(cd $SYSTARG; ...  : exit status=exit(1)
    mk: for j in ...  : exit status=exit(1)
    

    Unless I'm highly forgetful, I didn't see this the last time I built. Granted, I probably shouldn't have done the "hg update", but it's just a habit I've gotten into since I only re-visit Inferno every few months.

    I see emu/Linux/arm-tas-v[57].S in the codebase. Trying to learn how these get (or doesn't get) selected.

    What other demons have I awakened here, or is it just a personal hell?

    VBR, -joe

  3. Charles Forsyth

    It's my fault. If you do another pull -uv you'll get the tas file, and some mkfile changes to use it.

  4. Joseph Stewart reporter

    I gave this a try. I had to tweak my gcc options for this to compile:

    added -march=armv6, as -march=armv7 or -march=armv9 were complaining about illegal opcodes

    Anyway, I still think I have an architecture mis-match, as when I run the code on the target now, I get an illegal instruction trap.

    I've run out of time to investigate this today but will re-visit later.

  5. Joseph Stewart reporter

    Update w/o success (one last try before stepping away):

    I changed the target to -march=armv5 and edited emu/Linux/mkfile-arm to use the arm-tas-v5.S file and re-built, but still get a segfault.

    I'll resume with print-style debugging later.

  6. Valery Ushakov

    I've just ran into the problem with "coherence" on NetBSD. What happens is that emu provides its own malloc() that does locking and so eventually calls coherence(). If emu is dynamically linked, the init/fini code runs before main() has a chance to set coherence. If the init/fini code calls malloc() it calls emu's malloc, which calls coherence(), which is still null. In case of NetBSD it was some libc init code that was reading some sysctl nodes.

    Note, BTW, that emu/port/fns.h defines, not declares "coherence", so a common copy ends up in each object and linker takes care of merging them. This is not a problem per se, but still kinda icky.

    I've fixed this in my copy with https://bitbucket.org/nbuwe/inferno-os/commits/dbaf2f1a92f6b939c5c2bf6c5d4380bd71a2a7f9

  7. Charles Forsyth

    The definition of coherence deliberately takes advantage of a C rule to avoid anyone having to change an existing fork of emu (eg, for a new platform), to account for the arrival of "coherence". If a given port doesn't define it, it will be nil and main will fill it in. If as on Windows or now NetBSD, it needs to be defined from the start, the os.c for that platform should do it, but still no other platforms need source changes.

    The curious thing is that I can't find any implementation that I've got that does define it as anything other than nofence.

  8. Charles Forsyth

    I haven't pushed the change yet but I've removed the reliance on being able to have multiple definitions.

  9. Log in to comment