Issue #5 new

Valgrind + two or more fork()s + relatively deep call stack = segfault in child process

Alexey Kryuchkov
created an issue

Bug summary

Using valgrind on a program which forks at least two children causes segmentation fault in second child after it allocates a certain amount of memory on the call stack.

valgrind dies along with the killed child, resulting in no valgrind summary being printed for the perished process. The parent process, and its first child, continue to run and terminate normally, with normal valgrind summary being printed for them.

Under Linux, running the same program with valgrind does not result in any segmentation faults or other errors.

How to reproduce

Test program

Consider the following C program:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

#define PER_LEVEL_BUF_N 1024
#define MAX_LEVEL 10

void stack_filler(int level) {
    char per_level[PER_LEVEL_BUF_N];
    printf("I (%d) am on level %d\n", getpid(), level);
    if (level >= MAX_LEVEL) {
        return;
    }
    stack_filler(level+1);
}

void fork_one(void) {
    int pid = fork();
    if (pid == 0) {
        stack_filler(0);
        exit(0);
    } else { 
        printf("Spawned child %d\n", pid);
    }
}

int main(void) {
    fork_one();
    fork_one();
    return 0;
}

(The above program is also attached to this issue as a file.)

Save it as test.c and compile with default options:

$ gcc test.c
$ ls
a.out   test.c

Normal run

Run the compiled program without valgrind to produce the expected output:

$ ./a.out
Spawned child 60117
Spawned child 60118
I (60117) am on level 0
I (60117) am on level 1
I (60117) am on level 2
I (60117) am on level 3
I (60117) am on level 4
I (60117) am on level 5
I (60117) am on level 6
I (60117) am on level 7
I (60117) am on level 8
I (60117) am on level 9
I (60117) am on level 10
I (60118) am on level 0
I (60118) am on level 1
I (60118) am on level 2
I (60118) am on level 3
I (60118) am on level 4
I (60118) am on level 5
I (60118) am on level 6
I (60118) am on level 7
I (60118) am on level 8
I (60118) am on level 9
I (60118) am on level 10
$

Crash under valgrind

Now run the test program with valgrind:

$ valgrind ./a.out 
==60119== Memcheck, a memory error detector
==60119== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==60119== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==60119== Command: ./a.out
==60119== 
Spawned child 60120
I (60120) am on level 0
I (60120) am on level 1
I (60120) am on level 2
I (60120) am on level 3
I (60120) am on level 4
I (60120) am on level 5
I (60120) am on level 6
I (60120) am on level 7
I (60120) am on level 8
I (60120) am on level 9
I (60120) am on level 10
==60120== 
==60120== HEAP SUMMARY:
==60120==     in use at exit: 4,096 bytes in 1 blocks
==60120==   total heap usage: 1 allocs, 0 frees, 4,096 bytes allocated
==60120== 
==60120== LEAK SUMMARY:
Spawned child 60121
I (60121) am on level 0
==60120==    definitely lost: 0 bytes in 0 blocks
I (60121) am on level 1
==60120==    indirectly lost: 0 bytes in 0 blocks
I (60121) am on level 2
==60120==      possibly lost: 0 bytes in 0 blocks
==60119== 
==60120==    still reachable: 4,096 bytes in 1 blocks
==60119== HEAP SUMMARY:
==60119==     in use at exit: 4,096 bytes in 1 blocks
==60119==   total heap usage: 1 allocs, 0 frees, 4,096 bytes allocated
==60119== 
==60120==         suppressed: 0 bytes in 0 blocks
==60119== LEAK SUMMARY:
I (60121) am on level 3
==60120== Rerun with --leak-check=full to see details of leaked memory
==60119==    definitely lost: 0 bytes in 0 blocks
I (60121) am on level 4
==60120== 
==60119==    indirectly lost: 0 bytes in 0 blocks
==60120== For counts of detected and suppressed errors, rerun with: -v
==60119==      possibly lost: 0 bytes in 0 blocks
==60120== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==60119==    still reachable: 4,096 bytes in 1 blocks
==60119==         suppressed: 0 bytes in 0 blocks
==60119== Rerun with --leak-check=full to see details of leaked memory
==60119== 
==60119== For counts of detected and suppressed errors, rerun with: -v
==60119== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
$ 

There are no further lines printed.

Notice that the last line printed by the second child (60121 in this case) is:

I (60121) am on level 4

Also notice that there is no valgrind summary printed for this process.

Turns out, the second child process is killed on segmentation fault:

$ dmesg | tail -1
pid 60121 (memcheck-amd64-free), uid 1001: exited on signal 11 (core dumped)

And there is indeed a core dump lying nearby:

$ ls     
a.out               memcheck-amd64-free.core    test.c

The core dump

Examining the dump with gdb yields nothing interesting besides a broken call stack:

$ gdb /usr/local/lib/valgrind/memcheck-amd64-freebsd memcheck-amd64-free.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...

warning: core file may not match specified executable file.
Core was generated by `memcheck-amd64-free'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000400901f13 in ?? ()
(gdb) bt
#0  0x0000000400901f13 in ?? ()
#1  0x000000000000b876 in ?? ()
#2  0x0000000039ba2f90 in ?? ()
#3  0x00000004008a5f30 in ?? ()
#4  0x0000000039ba2f80 in ?? ()
#5  0x0000000000001880 in ?? ()
#6  0x00000007ff000560 in ?? ()
#7  0x0000000038003510 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000039ba32a0 in ?? ()
#10 0x0000000039ba3298 in ?? ()
#11 0x0000000000000001 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000039ba4798 in ?? ()
#14 0x00000004008a5ed8 in ?? ()
#15 0x0000000000000049 in ?? ()
#16 0x000000003806302e in ?? ()
#17 0x000000003806419c in ?? ()
#18 0x000000003807dd24 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0xdeadbeefdeadbeef in ?? ()
#21 0xdeadbeefdeadbeef in ?? ()
#22 0xdeadbeefdeadbeef in ?? ()
#23 0xdeadbeefdeadbeef in ?? ()
Cannot access memory at address 0x4008a6000
(gdb) info registers
rax            0x5555   21845
rbx            0x7feffeda8  34342956456
rcx            0x40079fb6a  17187863402
rdx            0x40079c000  17187848192
rsi            0x0  0
rdi            0x7feffeda8  34342956456
rbp            0x39ba2f90   0x39ba2f90
rsp            0x4008a5e20  0x4008a5e20
r8             0x1531ca0    22224032
r9             0x7feffeda8  34342956456
r10            0x0  0
r11            0x38007ff0   939556848
r12            0x7feffed40  34342956352
r13            0x1534c30    22236208
r14            0x0  0
r15            0x400733 4196147
rip            0x400901f13  0x400901f13
eflags         0x10246  66118
cs             0x43 67
ss             0x3b 59
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0
(gdb)

Or, perhaps I was supposed to run gdb with the test program as argument? No real difference here:

$ gdb ./a.out memcheck-amd64-free.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...

warning: core file may not match specified executable file.
Core was generated by `memcheck-amd64-free'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/valgrind/vgpreload_core-amd64-freebsd.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/valgrind/vgpreload_core-amd64-freebsd.so
Reading symbols from /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000400901f13 in ?? ()
(gdb) bt
#0  0x0000000400901f13 in ?? ()
#1  0x000000000000b876 in ?? ()
#2  0x0000000039ba2f90 in ?? ()
#3  0x00000004008a5f30 in ?? ()
#4  0x0000000039ba2f80 in ?? ()
#5  0x0000000000001880 in ?? ()
#6  0x00000007ff000560 in ?? ()
#7  0x0000000038003510 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000039ba32a0 in ?? ()
#10 0x0000000039ba3298 in ?? ()
#11 0x0000000000000001 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000039ba4798 in ?? ()
#14 0x00000004008a5ed8 in ?? ()
#15 0x0000000000000049 in ?? ()
#16 0x000000003806302e in ?? ()
#17 0x0000000000000049 in ?? ()
#18 0x00000001380636f3 in ?? ()
#19 0x0000000039ba2f80 in ?? ()
#20 0x0000004900000000 in ?? ()
#21 0x0000000039ba2f80 in ?? ()
#22 0x0000b87e39ba2f80 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000049 in ?? ()
#25 0x0000000000000001 in ?? ()
#26 0x00000007ff000560 in ?? ()
#27 0x0000000000001880 in ?? ()
#28 0x0000000039ba2f80 in ?? ()
#29 0x00000004008a5f30 in ?? ()
#30 0x000000003806419c in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#34 0x00000004008a5f4c in ?? ()
#35 0x00000004008a5f40 in ?? ()
#36 0x000000003802eb83 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000001 in ?? ()
#40 0x000000003802e9a0 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000001 in ?? ()
#45 0x0000000000000001 in ?? ()
#46 0x0000000039ba2f80 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x000000003807dd24 in ?? ()
#51 0xdeadbeefdeadbeef in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000000 in ?? ()
#55 0x0000000000000000 in ?? ()
#56 0x0000000000000000 in ?? ()
#57 0xdeadbeefdeadbeef in ?? ()
#58 0xdeadbeefdeadbeef in ?? ()
#59 0xdeadbeefdeadbeef in ?? ()
#60 0xdeadbeefdeadbeef in ?? ()
Cannot access memory at address 0x4008a6000
(gdb)

Some limited analysis

Segmentation fault in the child process seems to occur after a certain amount of memory was allocated on the call stack. In the example above, segfault occured on the 5th recursive call, with each call stack frame occupying at least PER_LEVEL_BUF_N=1024 bytes. Changing PER_LEVEL_BUF_N to 512 and letting the program run longer (MAX_LEVEL=20) results in segfault on the 9th recursive call.

Tinkering with these constants leads to the theory that the call stack for the crashing second child is, somehow, limited to N=C1 + (1024 + C2) * 4 bytes, where C1 and C2 are the amount of memory allocated on the call stack prior to the first call to stack_filler(), and the overhead of calling stack_filler(), respectively.

Also, the first child process is not affected by this behaviour. Removing the second call to fork_one() from the above test program results in no segmentation faults or other abnormal behaviour.

Environment

Reproducible at least in FreeBSD 9.0 (amd64) running in VirtualBox.

$ uname -a
FreeBSD frya 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
$ valgrind --version
valgrind-3.6.1
$ gcc --version
gcc (GCC) 4.2.1 20070831 patched [FreeBSD]

The described behaviour was reproduced with both valgrind installed as a binary package from official FreeBSD repository, and valgrind compiled from source cloned from bitbucket.

Comments (0)

  1. Log in to comment