Segfault in multi-thread mode

Issue #56 resolved
Mate Soos created an issue

I have some serious problem with M4RI in multi-threaded CryptoMiniSat: I often get a segfault. I'm not sure if I'm doing something wrong, or if there is something else at play. Here is how I'm using m4ri. Every thread does the following, using thread-local variables everywhere, including mzd_t *mat, etc:

mzd_t *mat = mzd_init(thisXors.size(), numCols);
assert(mzd_is_zero(mat));
[...]
mzd_write_bit(mat, row, var, 1); <- lots
mzd_echelonize(mat, true);
[..]
const bool rhs = mzd_read_bit(mat, i, numCols-1); <-lots
mzd_free(mat);

The backtrace I'm getting is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbdffb700 (LWP 3557)]
malloc_consolidate (av=av@entry=0x7fff9c000020) at malloc.c:4088
4088    malloc.c: No such file or directory.
(gdb) bt
#0  malloc_consolidate (av=av@entry=0x7fff9c000020) at malloc.c:4088
#1  0x00007ffff6af95d4 in _int_malloc (av=av@entry=0x7fff9c000020, bytes=bytes@entry=128) at malloc.c:3727
#2  0x00007ffff6afa64e in _int_memalign (av=av@entry=0x7fff9c000020, alignment=alignment@entry=64, 
    bytes=bytes@entry=16) at malloc.c:4340
#3  0x00007ffff6afbeca in __GI___libc_memalign (alignment=64, bytes=16) at malloc.c:3033
#4  0x00007ffff6afdadc in __posix_memalign (memptr=0x7fffbdffa378, alignment=140736381034976, size=0, 
    size@entry=64) at malloc.c:4953
#5  0x00007ffff6653210 in _mm_malloc (size=16, alignment=<optimized out>)
    at /usr/lib/gcc/x86_64-linux-gnu/4.7/include/mm_malloc.h:45
#6  m4ri_mm_malloc (size=16) at ./m4ri/misc.h:678
#7  m4ri_mmc_malloc (size=size@entry=16) at m4ri/mmc.c:68
#8  0x00007ffff6631833 in m4ri_mmc_calloc (size=16, count=1) at m4ri/mmc.h:80
#9  mzd_init (r=r@entry=2, c=c@entry=6) at m4ri/mzd.c:182
#10 0x00007ffff662ce99 in _mzd_echelonize_m4ri (A=0x7ffff685a580 <mzd_cache+4032>, full=1, k=1, heuristic=1, 
    threshold=0.14999999999999999) at m4ri/brilliantrussian.c:699
#11 0x00007ffff7941e8b in CMSat::XorFinder::extractInfoFromBlock (this=this@entry=0x744810, block=..., 
    blockNum=<optimized out>)

where frame 10 is:

#10 0x00007ffff662ce99 in _mzd_echelonize_m4ri (A=0x7ffff685a580 <mzd_cache+4032>, full=1, k=1, heuristic=1, 
    threshold=0.14999999999999999) at m4ri/brilliantrussian.c:699
699       mzd_t *T3 = mzd_init(__M4RI_TWOPOW(k), ncols);
(gdb) l
694
695       mzd_t *U  = mzd_init(kk, ncols);
696       mzd_t *T0 = mzd_init(__M4RI_TWOPOW(k), ncols);
697       mzd_t *T1 = mzd_init(__M4RI_TWOPOW(k), ncols);
698       mzd_t *T2 = mzd_init(__M4RI_TWOPOW(k), ncols);
699       mzd_t *T3 = mzd_init(__M4RI_TWOPOW(k), ncols);
700       mzd_t *T4 = mzd_init(__M4RI_TWOPOW(k), ncols);
701       mzd_t *T5 = mzd_init(__M4RI_TWOPOW(k), ncols);
702       rci_t *L0 = (rci_t*)m4ri_mm_calloc(__M4RI_TWOPOW(k), sizeof(rci_t));
703       rci_t *L1 = (rci_t*)m4ri_mm_calloc(__M4RI_TWOPOW(k), sizeof(rci_t));

I can't go into frame 0. ("malloc.c: No suh file or directory). But I can go into frame 5:

(gdb) f 5
#5  0x00007ffff6653210 in _mm_malloc (size=16, alignment=<optimized out>)
    at /usr/lib/gcc/x86_64-linux-gnu/4.7/include/mm_malloc.h:45
45        if (posix_memalign (&ptr, alignment, size) == 0)
(gdb) l
40        void *ptr;
41        if (alignment == 1)
42          return malloc (size);
43        if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4))
44          alignment = sizeof (void *);
45        if (posix_memalign (&ptr, alignment, size) == 0)
46          return ptr;
47        else
48          return NULL;
49      }

Comments (13)

  1. Martin Albrecht repo owner

    Hi, can you recompile with

    ./configure --enable-openmp
    

    and check if this fixes the issue? M4RI pays no attention - it's foolish I know! - to thread safety unless this is specified. If specifying this switch fixes the problem, then fixing the issue means to separate thread safety stuff from the OpenMP stuff.

  2. Mate Soos reporter

    Thanks for the reply! I'm a bit surprised... wouldn't this mean that if two processes are using m4ri at the same time on a multi-core machine it would also cause a segfault? I'm not sure but I think this means you are using static variables or variables that are only initialized once the library is loaded, and every user (unwittingly) shares them? If so, there are ways around this, .e.g "__thread" in gcc and a new thread-local stroage specifier for C++11 (though rarely implemented...). Anyway, let me get back to you with the --enable-openmp :)

  3. Sean Weaver

    I ran across a similar problem when trying to use pthreads w/ mzd_echelonize_m4ri I don't want to enable openmp because I am not trying to solve one large problem, but rather millions of small problems. To fix, I made the following three changes:

    In mzd.c, add a bit of code to two #if statements as follows:

    static mzd_t* mzd_t_malloc() {
    #if 1 || __M4RI_HAVE_OPENMP
      return (mzd_t*)m4ri_mm_malloc(sizeof(mzd_t));
    #else
    
    static void mzd_t_free(mzd_t *M) {
    #if 1 || __M4RI_HAVE_OPENMP
      m4ri_mm_free(M);
    #else
    

    In mmc.h, comment out #define __M4RI_ENABLE_MMC, like so:

    /**                                                                                                                                                                                                                                          
     * \brief Enable memory block cache (default: enabled).                                                                                                                                                                                      
     */
    //#define __M4RI_ENABLE_MMC
    
  4. Martin Albrecht repo owner

    Yep, we only have guards around this code when OpenMP is enabled. We should have a configure switch to disable mmc.

  5. Sean Weaver

    Would it be possible to make the code thread safe? (or at least a ./configure option to do so) I could create a small test case today that exercises the safety issue using pthreads and send it over.

  6. Martin Albrecht repo owner

    Sounds like a plan, if I have a program to test I'd be happy to work on a configure switch to fix this.

  7. Sean Weaver

    Code pasted below.

    To compile:

    $ gcc -std=gnu99 -g m4ri_pthread_test.c -o m4ri_pthread_test -lpthread -lm4ri -L$HOME/local/packages/m4ri/.libs -I$HOME/local/packages/m4ri
    

    Of course, you'll have to change the paths to match where your copy of m4ri resides.

    To run:

    $ ./m4ri_pthread_test 7
    

    Where the one argument is the number of threads to use. If you use 1, there should be no errors. If you use a larger number, such as 7, you should get some error. The errors you get can vary due to the nature of the parallel program.

    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <unistd.h>
    #include <pthread.h>
    #include "m4ri/m4ri.h"
    
    typedef struct build_thread_t {
      pthread_t pthread;
      uint32_t thread_number;
      uint32_t number_of_runs;
      uint32_t runs_finished;
      uint8_t thread_done;
    } build_thread_t;
    
    
    mzd_t *build_random_matrix() {
      uint32_t rows = 500;
      uint32_t vars = 550;
      mzd_t *matrix = mzd_init(rows, vars+2);
      uint32_t i, j;
    
      for(i = 0; i < rows; i++) {
        for(j = 0; j < 1; j++) {
          uint32_t bit_loc = (abs(rand()) % (vars-1)) + 1;
          mzd_write_bit(matrix, i, bit_loc, 1);
        }
        if(rand()&1)
          mzd_write_bit(matrix, i, vars+1, 1);
      }
      assert(matrix->nrows == rows);
      return matrix;
    }
    
    void solve_random_matrix(mzd_t *matrix) {
      mzd_echelonize_m4ri(matrix,1,0);
      mzd_free(matrix);
    }
    
    void *parallel_solve(void *build_thread_v) {
      build_thread_t *build_thread = (build_thread_t *)build_thread_v; //Type fixing
      uint32_t i;
    
      for(i = 0; i < build_thread->number_of_runs; i++) {
        mzd_t *matrix = build_random_matrix();
        solve_random_matrix(matrix);
        build_thread->runs_finished++;
      }
      build_thread->thread_done = 1;
      return NULL;
    }
    
    int main(int argc, char **argv) {
      uint32_t number_of_runs = 1000000;
      uint32_t i;
    
      if(argc != 2) {
        fprintf(stderr, "Usage: ./m4ri_pthread_test <n>\n");
        fprintf(stderr, "  where n is the number of threads to use\n");
        return 1;
      }
    
      uint32_t number_of_threads = atoi(argv[1]);
    
      if(number_of_threads == 0 || number_of_threads > 100) {
        //Some sanity checking using a pretty arbitrary thread limit of 100.
        fprintf(stderr, "Number of threads must be between 1 and 99 inclusive\n");
        return 1;
      }
    
      build_thread_t *build_thread = malloc(number_of_threads * sizeof(build_thread_t));
    
      uint32_t chunk = number_of_runs/number_of_threads + (number_of_runs%number_of_threads == 0 ? 0 : 1);
      uint32_t start = 0;
      for(i = 0; i < number_of_threads; i++) {
        build_thread[i].thread_number = i;
        build_thread[i].number_of_runs = chunk;
        build_thread[i].runs_finished = 0;
        build_thread[i].thread_done = 0;
        if(pthread_create(&build_thread[i].pthread, NULL, parallel_solve, (void *)&build_thread[i])) {
          fprintf(stderr, "Error creating thread %u\n", i);
          return 1;
        }
      }
    
      uint32_t threads_finished = 0;
      uint32_t runs_finished = 0;
      while(threads_finished != number_of_threads) {
        sleep(1);
        threads_finished = 0;
        runs_finished = 0;
        for(i = 0; i < number_of_threads; i++) {
          threads_finished += build_thread[i].thread_done;
          runs_finished += build_thread[i].runs_finished;
        }
        fprintf(stderr, "\r%u/%u runs finished, %u/%u threads finished", runs_finished, number_of_runs, threads_finished, number_of_threads);
      }
    
      fprintf(stderr, "\r%u/%u runs finished, %u/%u threads finished\n", runs_finished, number_of_runs, threads_finished, number_of_threads);
    
      return 0;
    }
    
  8. Sean Weaver

    Also, if you want to test thread safety of other parts of m4ri, just change the code inside the for(i = 0; i < build_thread->number_of_runs; i++) { ... } loop to call the other functionality instead of the mzd_echelonize_m4ri code I am concerned with.

  9. Martin Albrecht repo owner

    Can you give 1f1ba00 a spin to see if that fixes the issue for you? You'll have to recompile with --enable-thread-safe set.

  10. Sean Weaver

    I did some tests and it works! Thanks so much! I will test it more over the next week and let you know if I find anything that gives trouble.

    While compiling I noticed these two warnings (I am compiling w/ clang on osx).

    In file included from m4ri/ple_russian.c:312:
    ./m4ri/ple_russian_template.h:3:141: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
      ...*M, rci_t startrow, rci_t stoprow, rci_t startcol, int const k[N], const ple_table_t const *table[N]) {
    
    m4ri/io.c:36:120: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'word' (aka 'unsigned long long')
          [-Wformat]
      printf("nrows: %6zu, ncols: %6zu, density: %6.5f, hash: 0x%016zx",(size_t)A->nrows,(size_t)A->ncols,mzd_density(A,1),mzd_hash(A));
                                                                ~~~~~~                                                     ^~~~~~~~~~~
                                                                %016llx
    
  11. Log in to comment