pypy3 performance regression vs pypy2

Create issue
Issue #3153 new
Bernat Gabor created an issue

Here is a concrete example:

https://github.com/pypa/virtualenv/blob/rewrite/tasks/make_zipapp.py

This script runs within:

  • CPython 3.8 - 63.82s
  • pypy3 - 447.57s
  • pypy2 - 99.13s on

Benchmarks from https://dev.azure.com/pypa/virtualenv/_build/results?buildId=17548. I'd hope pypy3 to be on par at least with pypy2 here.

Comments (19)

  1. Carl Friedrich Bolz-Tereick

    can you give some hints on how to run this? check out the rewrite branch then run

    pypyp3 tasks/make_zipapp.py
    

    I suppose? but how does it work under pypy2, given that the zipapp module is new in Python3 and it doesn’t seem to be on pypi as a backport either?

  2. Bernat Gabor reporter

    Hmmm, now that I look through though might actually fall back to some CPython from the interpreter. Which explains how pypy2 is close to CPython. 🤔 Sorry for the false positive. Though I assume here then my ask is if we can make pypy, in general, be closer to CPython. 🤔

  3. Carl Friedrich Bolz-Tereick

    on my machine the runtime between cpython3 and pypy3 is way less extreme:

    • cpython38: 43.322s
    • pypy3: 86.63s

    the azure link gives me “build not found” so I can’t check that out. which version of pypy3 are you using?

  4. Bernat Gabor reporter
    platform linux -- Python 3.6.9[pypy-7.2.0-final] - 447.57s setup    tests/integration/test_zipapp.py::test_zipapp_help
    platform win32 -- Python 3.6.9[pypy-7.2.0-final] - 600.50s setup    tests/integration/test_zipapp.py::test_zipapp_help
    platform darwin -- Python 3.6.9[pypy-7.3.0-final] - 668.17s setup    tests/integration/test_zipapp.py::test_zipapp_help
    platform linux -- Python 3.8.0                    - 63.82s setup    tests/integration/test_zipapp.py::test_zipapp_help
    platform win32 -- Python 3.8.0                    - 97.50s setup    tests/integration/test_zipapp.py::test_zipapp_help
    platform darwin -- Python 3.8.1                   - 87.63s setup    tests/integration/test_zipapp.py::test_zipapp_help
    

  5. Carl Friedrich Bolz-Tereick

    the script mainly seems to run pip a lot, so we can use that as a proxy of what is going on. eg the following invocation:

    pip wheel -w /tmp/tmpwt2xea_r/wheel-store --no-deps .
    

    in the toplevel of the virtualenv repo with a suitable temp dir. results for me:

    • cpython27: 3.490s
    • cpython38: 3.329s
    • pypy2: 5.700s
    • pypy3: 7.265s

  6. Carl Friedrich Bolz-Tereick

    aaaah, I see. are those tests run with coverage? if yes, that’s a likely candidate (coverage is known to be slow on pypy)

  7. Bernat Gabor reporter

    The coverage is not run on the script itself, only on the test suite. So I don’t think that’s what could be 🤔

  8. Carl Friedrich Bolz-Tereick

    right, coverage or not makes no difference to me locally. so we still don’t know where the extra factor of 5 comes from on azure

  9. Carl Friedrich Bolz-Tereick

    (the pip numbers I reported above aren’t great, but also not catastrophic. I am much more concerned by the jump from 63.82s to 447.57s)

  10. mattip

    Has anyone been able to reproduce the relative timings on a single machine?

    I think a first step would be to reproduce the original problem before trying to analyze where the slowdown comes from. It could be anything: the VM on azure runs on different or differently-loaded hardware, network problems, a slower disk on the VM, …

  11. mattip

    I don’t see how this is running. The azure pipeline yml does not mention pypy in the matrix. Is it via tox? How does tox find a pypy image? Maybe the timings include downloading and installing pypy?

  12. Bernát Gábor

    You need to look at the rewrite branch it’s there in the pipeline configuration. I'll say this the numbers are similar for reruns, and new builds, which should rule out hardware/image/differences 🤔

  13. Bernát Gábor

    Ill say locally my numbers are not as bad, put pypy3 is still a bit slower than pypy and significantly slower than Cpython 🤔

  14. mattip

    Thanks, I missed the part about the rewrite branch. The actual command executed is coverage run -m pytest --junitxml /home/vsts/work/1/s/.tox/junit.py38.xml tests --int (the link points to python38 linux but the one for pypy3 is the same except for the output file name), so coverage is involved in the test run. Steps to reproduce more or less without too much tox:

    git clone https://github.com/pypa/virtualenv.git
    cd virtualenv
    git checkout rewrite
    <python-or-pypy-in-virtualenv> -mpip install .
    <python-or-pypy-in-virtualenv> -mpip install coverage pytest
    <python-or-pypy-in-virtualenv> -mcoverage run -m pytest --int -k test_zipapp_help
    

    Unfortunately I could not get the test to run to completion on cpython3.7 nor on pypy2 nor on pypy3. But I think much of the slowdown might be due to coverage. Any hints what else might be missing to get the test to run to completion? On cpython3.7 it fails with

    _____________________________________________ ERROR at setup of test_zipapp_help ______________________________________________
    
    tmp_path_factory = TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x7f4cf69bdfd0>, _basetemp=PosixPath('/tmp/pytest-of-matti/pytest-27'))
    
        @pytest.fixture(scope="session")
        def zipapp_test_env(tmp_path_factory):
            base_path = tmp_path_factory.mktemp("zipapp-test")
    >       session = run_via_cli(["-v", "--activators", "", "--seed", "none", str(base_path / "env")])
    
    base_path  = PosixPath('/tmp/pytest-of-matti/pytest-27/zipapp-test0')
    tmp_path_factory = TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x7f4cf69bdfd0>, _basetemp=PosixPath('/tmp/pytest-of-matti/pytest-27'))
    
    tests/integration/test_zipapp.py:63: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/run/__init__.py:21: in run_via_cli
        session = session_via_cli(args)
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/run/__init__.py:44: in session_via_cli
        creator, seeder, activators = tuple(e.create(options) for e in elements)  # create types
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/run/__init__.py:44: in <genexpr>
        creator, seeder, activators = tuple(e.create(options) for e in elements)  # create types
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/run/plugin/creators.py:41: in create
        return super(CreatorSelector, self).create(options)
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/run/plugin/base.py:60: in create
        return self._impl_class(options, self.interpreter)
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/interpreters/create/venv.py:17: in __init__
        self.can_be_inline = interpreter is CURRENT and interpreter.executable == interpreter.system_executable
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/interpreters/discovery/py_info.py:154: in system_executable
        return self.find_exe_based_of(inside_folder=env_prefix)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    self = PythonInfo({'platform': 'linux', 'implementation': 'CPython', 'pypy_version_info': None, 'version_info': VersionInfo(m...aphy/vectors', '/home/matti/pypy_stuff/cryptography/src'], 'file_system_encoding': 'utf-8', 'stdout_encoding': 'UTF8'})
    inside_folder = '/usr'
    
        def find_exe_based_of(self, inside_folder):
            # we don't know explicitly here, do some guess work - our executable name should tell
            possible_names = self._find_possible_exe_names()
            possible_folders = self._find_possible_folders(inside_folder)
            for folder in possible_folders:
                for name in possible_names:
                    candidate = os.path.join(folder, name)
                    if os.path.exists(candidate):
                        info = PythonInfo.from_exe(candidate)
                        keys = {"implementation", "architecture", "version_info"}
                        if all(getattr(info, k) == getattr(self, k) for k in keys):
                            return candidate
            what = "|".join(possible_names)  # pragma: no cover
    >       raise RuntimeError("failed to detect {} in {}".format(what, "|".join(possible_folders)))  # pragma: no cover
    E       RuntimeError: failed to detect cpython3.6.7-64|cpython3.6.7|cpython3.6-64|cpython3.6|cpython3-64|cpython3|cpython-64|cpython|CPython3.6.7-64|CPython3.6.7|CPython3.6-64|CPython3.6|CPython3-64|CPython3|CPython-64|CPython|CPYTHON3.6.7-64|CPYTHON3.6.7|CPYTHON3.6-64|CPYTHON3.6|CPYTHON3-64|CPYTHON3|CPYTHON-64|CPYTHON|python3.6.7-64|python3.6.7|python3.6-64|python3.6|python3-64|python3|python-64|python|PYTHON3.6.7-64|PYTHON3.6.7|PYTHON3.6-64|PYTHON3.6|PYTHON3-64|PYTHON3|PYTHON-64|PYTHON in /usr/bin|/usr
    
    candidate  = '/usr/PYTHON'
    folder     = '/usr'
    info       = PythonInfo({'platform': 'linux2', 'implementation': 'CPython', 'pypy_version_info': None, 'version_info': VersionInfo(...dist-packages', '/usr/lib/python2.7/dist-packages/gtk-2.0'], 'file_system_encoding': 'UTF-8', 'stdout_encoding': None})
    inside_folder = '/usr'
    keys       = {'architecture', 'implementation', 'version_info'}
    name       = 'PYTHON'
    possible_folders = ['/usr/bin', '/usr']
    possible_names = ['cpython3.6.7-64', 'cpython3.6.7', 'cpython3.6-64', 'cpython3.6', 'cpython3-64', 'cpython3', ...]
    self       = PythonInfo({'platform': 'linux', 'implementation': 'CPython', 'pypy_version_info': None, 'version_info': VersionInfo(m...aphy/vectors', '/home/matti/pypy_stuff/cryptography/src'], 'file_system_encoding': 'utf-8', 'stdout_encoding': 'UTF8'})
    what       = 'cpython3.6.7-64|cpython3.6.7|cpython3.6-64|cpython3.6|cpython3-64|cpython3|cpython-64|cpython|CPython3.6.7-64|CPython...hon3-64|python3|python-64|python|PYTHON3.6.7-64|PYTHON3.6.7|PYTHON3.6-64|PYTHON3.6|PYTHON3-64|PYTHON3|PYTHON-64|PYTHON'
    
    ../cpython3_virt/lib/python3.6/site-packages/virtualenv/interpreters/discovery/py_info.py:173: RuntimeError
    

    On PyPy it fails with

    ______________________________________________________ test_zipapp_help _______________________________________________________
    
    call_zipapp = <function call_zipapp.<locals>._run at 0x00007fd1c9be9a60>
    capsys = <_pytest.capture.CaptureFixture object at 0x00007fd1ca0f6758>
    
        def test_zipapp_help(call_zipapp, capsys):
    >       call_zipapp("-h")
    
    call_zipapp = <function call_zipapp.<locals>._run at 0x00007fd1c9be9a60>
    capsys     = <_pytest.capture.CaptureFixture object at 0x00007fd1ca0f6758>
    
    tests/integration/test_zipapp.py:79: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    tests/integration/test_zipapp.py:73: in _run
        subprocess.check_call(cmd)
    ../pypy3.6-7.2.0-linux_x86_64-portable/lib-python/3/subprocess.py:306: in check_call
        retcode = call(*popenargs, **kwargs)
    ../pypy3.6-7.2.0-linux_x86_64-portable/lib-python/3/subprocess.py:287: in call
        with Popen(*popenargs, **kwargs) as p:
    ../pypy3.6-7.2.0-linux_x86_64-portable/lib-python/3/subprocess.py:744: in __init__
        restore_signals, start_new_session)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    self = <subprocess.Popen object at 0x00007fd1ca0f6250>
    args = ['/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3', '/tmp/pytest-of-matti/pytest-28/zipapp0/virtualenv.pyz', '-vv', '/tmp/pytest-of-matti/pytest-28/test_zipapp_help0/env', '-h']
    executable = b'/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3', preexec_fn = None, close_fds = True, pass_fds = ()
    cwd = None, env = None, startupinfo = None, creationflags = 0, shell = False, p2cread = -1, p2cwrite = -1, c2pread = -1
    c2pwrite = -1, errread = -1, errwrite = -1, restore_signals = True, start_new_session = False
    
        def _execute_child(self, args, executable, preexec_fn, close_fds,
                           pass_fds, cwd, env,
                           startupinfo, creationflags, shell,
                           p2cread, p2cwrite,
                           c2pread, c2pwrite,
                           errread, errwrite,
                           restore_signals, start_new_session):
            """Execute program (POSIX version)"""
    
            if isinstance(args, (str, bytes)):
                args = [args]
            else:
                args = list(args)
    
            if shell:
                args = ["/bin/sh", "-c"] + args
                if executable:
                    args[0] = executable
    
            if executable is None:
                executable = args[0]
            orig_executable = executable
    
            # For transferring possible exec failure from child to parent.
            # Data format: "exception name:hex errno:description"
            # Pickle is not used; it is complex and involves memory allocation.
            errpipe_read, errpipe_write = os.pipe()
            # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
            low_fds_to_close = []
            while errpipe_write < 3:
                low_fds_to_close.append(errpipe_write)
                errpipe_write = os.dup(errpipe_write)
            for low_fd in low_fds_to_close:
                os.close(low_fd)
            try:
                try:
                    # We must avoid complex work that could involve
                    # malloc or free in the child process to avoid
                    # potential deadlocks, thus we do all this here.
                    # and pass it to fork_exec()
    
                    if env is not None:
                        env_list = []
                        for k, v in env.items():
                            k = os.fsencode(k)
                            if b'=' in k:
                                raise ValueError("illegal environment variable name")
                            env_list.append(k + b'=' + os.fsencode(v))
                    else:
                        env_list = None  # Use execv instead of execve.
                    executable = os.fsencode(executable)
                    if os.path.dirname(executable):
                        executable_list = (executable,)
                    else:
                        # This matches the behavior of os._execvpe().
                        executable_list = tuple(
                            os.path.join(os.fsencode(dir), executable)
                            for dir in os.get_exec_path(env))
                    fds_to_keep = set(pass_fds)
                    fds_to_keep.add(errpipe_write)
                    self.pid = _posixsubprocess.fork_exec(
                            args, executable_list,
                            close_fds, tuple(sorted(map(int, fds_to_keep))),
                            cwd, env_list,
                            p2cread, p2cwrite, c2pread, c2pwrite,
                            errread, errwrite,
                            errpipe_read, errpipe_write,
                            restore_signals, start_new_session, preexec_fn)
                    self._child_created = True
                finally:
                    # be sure the FD is closed no matter what
                    os.close(errpipe_write)
    
                # self._devnull is not always defined.
                devnull_fd = getattr(self, '_devnull', None)
                if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd:
                    os.close(p2cread)
                if c2pwrite != -1 and c2pread != -1 and c2pwrite != devnull_fd:
                    os.close(c2pwrite)
                if errwrite != -1 and errread != -1 and errwrite != devnull_fd:
                    os.close(errwrite)
                if devnull_fd is not None:
                    os.close(devnull_fd)
                # Prevent a double close of these fds from __init__ on error.
                self._closed_child_pipe_fds = True
    
                # Wait for exec to fail or succeed; possibly raising an
                # exception (limited in size)
                errpipe_data = bytearray()
                while True:
                    part = os.read(errpipe_read, 50000)
                    errpipe_data += part
                    if not part or len(errpipe_data) > 50000:
                        break
            finally:
                # be sure the FD is closed no matter what
                os.close(errpipe_read)
    
            if errpipe_data:
                try:
                    pid, sts = os.waitpid(self.pid, 0)
                    if pid == self.pid:
                        self._handle_exitstatus(sts)
                    else:
                        self.returncode = sys.maxsize
                except ChildProcessError:
                    pass
    
                try:
                    exception_name, hex_errno, err_msg = (
                            errpipe_data.split(b':', 2))
                    # The encoding here should match the encoding
                    # written in by the subprocess implementations
                    # like _posixsubprocess
                    err_msg = err_msg.decode()
                except ValueError:
                    exception_name = b'SubprocessError'
                    hex_errno = b'0'
                    err_msg = 'Bad exception data from child: {!r}'.format(
                                  bytes(errpipe_data))
                child_exception_type = getattr(
                        builtins, exception_name.decode('ascii'),
                        SubprocessError)
                if issubclass(child_exception_type, OSError) and hex_errno:
                    errno_num = int(hex_errno, 16)
                    child_exec_never_called = (err_msg == "noexec")
                    if child_exec_never_called:
                        err_msg = ""
                        # The error must be from chdir(cwd).
                        err_filename = cwd
                    else:
                        err_filename = orig_executable
                    if errno_num != 0:
                        err_msg = os.strerror(errno_num)
                        if errno_num == errno.ENOENT:
                            err_msg += ': ' + repr(err_filename)
    >               raise child_exception_type(errno_num, err_msg, err_filename)
    E               FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3': '/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3'
    
    args       = ['/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3', '/tmp/pytest-of-matti/pytest-28/zipapp0/virtualenv.pyz', '-vv', '/tmp/pytest-of-matti/pytest-28/test_zipapp_help0/env', '-h']
    c2pread    = -1
    c2pwrite   = -1
    child_exception_type = <class 'OSError'>
    child_exec_never_called = False
    close_fds  = True
    creationflags = 0
    cwd        = None
    devnull_fd = None
    env        = None
    env_list   = None
    err_filename = '/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3'
    err_msg    = "No such file or directory: '/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3'"
    errno_num  = 2
    errpipe_data = bytearray(b'OSError:2:')
    errpipe_read = 10
    errpipe_write = 11
    errread    = -1
    errwrite   = -1
    exception_name = bytearray(b'OSError')
    executable = b'/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3'
    executable_list = (b'/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3',)
    fds_to_keep = {11}
    hex_errno  = bytearray(b'2')
    low_fds_to_close = []
    orig_executable = '/tmp/pytest-of-matti/pytest-28/zipapp-test0/env/bin/pypy3'
    p2cread    = -1
    p2cwrite   = -1
    part       = b''
    pass_fds   = ()
    pid        = 20370
    preexec_fn = None
    restore_signals = True
    self       = <subprocess.Popen object at 0x00007fd1ca0f6250>
    shell      = False
    start_new_session = False
    startupinfo = None
    sts        = 65280
    
    ../pypy3.6-7.2.0-linux_x86_64-portable/lib-python/3/subprocess.py:1392: FileNotFoundError
    

  15. Bernat Gabor reporter

    I’m alright with closing this for now 🤔 When I’ll have more free time for this might come back to it with more details.

  16. Log in to comment