Try using a different implementation of mersenne twister

Issue #1901 new
Maciej Fijalkowski
created an issue

The idea is to try using an external C library for random() to make it faster, e.g. http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html

Comments (4)

  1. ThomasNyberg

    I have an initial attempt at implementing this at this branch/folder of my repo:

    https://bitbucket.org/ThomasNyberg/pypy/src/issue_1901/issue_1901/
    

    I added a README.md file with a lot of information into that folder, but I'll add in the contents of the README here so no one needs to needless jump over there. I don't have a full implementation and I'm not even sure if what I'm doing is right/possible, but either way this has been kind of a fun intro to pypy and cffi. Hopefully my work here is useful, but the experience has been nice nonetheless.

    Here's the README's contents:

    Issue 1901: Speed up pypy's random functionality

    This is an attempt to sort out this issue:

    https://bitbucket.org/pypy/pypy/issues/1901
    

    I.e. the goal is to take the code found here

    https://bitbucket.org/pypy/pypy/src/default/rpython/rlib/rrandom.py
    

    and change it to use compiled C code instead. Following the guidance in the issue link above, I extracted the necessary portions of the numpy random implementation here:

    https://bitbucket.org/pypy/numpy/src/HEAD/numpy/random/mtrand.py
    https://bitbucket.org/pypy/numpy/src/HEAD/numpy/random/setup.py
    

    I tried to only take the pieces that were necessary (i.e. I did not bring any of the distributions code). The C code is found in the mtrand/ folder and it is built using cffi by the _mtrand_build.py script. Note that I simplified the _mtrand_build.py and setup.py scripts so that I could understand them myself. By doing so, I removed some of the parts that are probably needed to compile on Windows. I figure I can add that back in once I actually understand things correctly.

    Assuming you have cloned this repo and built everything, you should be able to just run commands like:

    make venv
    make
    make test
    make install
    make clean
    make clean-venv
    

    There's nothing requiring you to use these make commands, but they are convenient for me to automate things. Also some commands done work well in the wrong order, but honestly I don't care about that. The test in tests/ is the same test as for the original random implementation found here

    https://bitbucket.org/pypy/pypy/src/default/rpython/rlib/test/test_rrandom.py
    

    except there is one added test in my version test_state_assignment. All tests pass except for test_translate. That one fails because apparently the cffi code objects are not valid rpython.

    At a minimum there are the following things left to do:

    1. Move this into the main library. For this I'm a bit confused. The original file is rpython and so if I were to do the same thing I think I should change everything here from using cffi to using rffi. I looked at the repo code and docs for this and don't really understand how to do that (and if it's even the right approach).

    2. Make the test_translate test test pass.

    3. Clean stuff up. E.g. right now I have a weird proxy class StateRepr that allows for access to the internal state in a way so that the old api is consistent (at least from the perspective of the tests). This may be the wrong approach though and maybe a better proxy (or no proxy) is better.

    4. This should of course actually be timed to make sure it's actually faster in the first place.

    5. Once 1-4 here are done, then certain optimizations can maybe be attempted in the code I wrote. But there seems to be no reason to try to optimize it until things fit together correctly and timing scripts exist.

  2. ThomasNyberg

    I made some progress in addition to what I listed above. I updated the README here to reflect the changes: https://bitbucket.org/ThomasNyberg/pypy/src/issue_1901/issue_1901/

    By the way at this point there are still fairly weird stylistic issues as well as minor missing functionality. Right now I'm mainly focusing on getting the build totally sorted out so that I can stop thinking about architecture and more about those other details. In any case, I'm extracting the changes from the README and posting them here for convenience:

    I extracted the necessary portions of the numpy random implementation here:

    https://bitbucket.org/pypy/numpy/src/HEAD/numpy/random/mtrand.py
    https://bitbucket.org/pypy/numpy/src/HEAD/numpy/random/setup.py
    

    I tried to only take the pieces that were necessary (i.e. I did not bring any of the distributions code). The C code is found in the mtrand/ folder. I integrated that code into the codebase in two separate ways. The first as a guide as a separate rpython script in this folder in rmtrand.py. That script can be compiled by running

    make venv
    make
    ./rmtrand-rpy
    

    Basically that file does the following:

    1. It defines the interface to the C code using rffi.
    2. It embeds the tests (as many as possible anyway) from rpython/rlib/tests/test_rrandom.py. The tests that are in there do pass correctly.

    The second way that this has been embedded into the codebase is directly into the file rpython/rlib/rrandom.py. Notice that that file is basically the same as the rmtrand.py file in this folder except with some paths changed to help the full build. I also moved the C files themselves to the folder ../rpython/translator/c/mtrand/. I'm not sure if this is a reasonable place for them, but it seems to put them with other C files in the interpreter.

    In the current setup the interpreter actually will build in its current form (i.e. with just a regular make in the root of the repo). However, the code is not working at run-time. If you go into the pypy/goal folder and try to use the module, it will fail as follows:

    pypy/pypy/goal$ PYTHONPATH=../.. ./pypy-c -c 'from rpython.rlib.rrandom import Random; Random()'
    Traceback (most recent call last):
      File "<module>", line 1, in <module>
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rlib/rrandom.py", line 62, in __init__
        self.init_genrand(seed)
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rlib/rrandom.py", line 85, in init_genrand
        rk_seed(s, self.internal_state)
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rtyper/lltypesystem/rffi.py", line 304, in wrapper
        res = call_external_function(*real_args)
      File "<12-codegen /home/twn/hg/ThomasNyberg/pypy/rpython/rtyper/lltypesystem/rffi.py:203>", line 9, in call_external_function
        res = funcptr(a0, a1)
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rtyper/lltypesystem/lltype.py", line 1380, in __call__
        return callb(*args)
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rtyper/lltypesystem/ll2ctypes.py", line 1300, in __call__
        cfunc = get_ctypes_callable(self.funcptr, self.calling_conv)
      File "/home/twn/hg/ThomasNyberg/pypy/rpython/rtyper/lltypesystem/ll2ctypes.py", line 1273, in get_ctypes_callable
        funcname, place))
    NotImplementedError: function 'rk_seed' not found in any of the libraries ('m', '/tmp/usession-issue_1901-108/shared_cache/externmod_0.so')
    

    It seems like some sort of linking problem. Though the code does seem to compile and hook up right during the building of the interpreter so it seems like the problem is that I'm just not including the compiled C code in the final interpreter binary correctly (presumably due to a misunderstanding of what llexternal actually does).

  3. Log in to comment