Link-time optimization (LTO) disabled

Issue #2572 new
Armin Rigo
created an issue

Link-time optimization (the gcc option -flto) has been disabled again. With gcc 6.2.0 on Ubuntu 14.04, it produces code that really looks invalid after inspection in objdump. To reproduce, take a recent version of trunk (e.g. 0649d557369f), and translate it on Linux64 with gcc (Ubuntu 6.2.0-3ubuntu11~14.04) 6.2.0 20160901. Then look in objdump at read.constfold.NNNN and its caller. This function contains only a jump to itself, creating an infinite loop.

We need to investigate which versions of gcc this bug appears on, and possibly report to gcc.

However, LTO introduces another annoying problem: a PyPy compiled with -flto is entirely undebuggable. Indeed, -g -flto works, but gdb takes at least 30 minutes to load (possibly much longer, I interrupted it). As the one who spends at least a couple of days fighting really obscure PyPy issues in gdb every now and again, I will veto -flto as long as this problem exists: otherwise, the next time we have an issue that cannot be reproduced in debug builds, I will have to spend weeks trying to debug without debugging symbols, and I'm really really not looking forward to that.

Comments (5)

  1. Alecsandru Patrascu

    From what I've encountered so far, whenever a LTO issue popped up, is was related to the buildchain configuration. On GCC, the main concern when using LTO is that you must have the compiler, ld, ar, nm, etc in par. I don't know exactly what are the versions of the tools in your buildchain, but I suspect that your environment is not LTO friendly. Even though you compiled 6.2.0 from scratch (or installed it through apt), the main culprit is ld, as it does not fully understand the data format that GCC 6 is giving it to him.

    On Ubuntu 14.04 the default ld is 2.24, which works best on LTO with GCC <=4.9.X; starting with GCC5 you tend to get these kind of weird bugs.

    I had some weird issues on CPython with LTO enabled a while ago, when I was using Ubuntu 14.04 and GCC5.1. I am now using Ubuntu 16.04, with GCC 5.4.0 (default), GCC 6.2.0 (compiled by me), ld 2.26.1 and I have no problems with LTO. If I were to use GCC7 I think I'd get other weird bugs (not tested though).

    However, this issue you opened underlines a quite hard problem when it comes to working with PyPy - debugging problems in various workloads/scenarios and you see a lot of people reporting issues with almost no information from their dump, mostly because they don't know how to build a PyPy version that is debug friendly. I don't know if this is feasible, but having two pre-compiled releases, one "production ready" (pgo,lto,03, etc) and one "debugging ready" (lldebug, no pgo, no lto), the latter to be used out-of-the-box whenever an user finds a problem, can at least provide more valuable debugging info from users.

  2. Armin Rigo reporter

    On the machine where the miscompilation appears: GNU ld (GNU Binutils for Ubuntu) 2.24. So maybe upgrading ld would help, as you describe.

    However, there is the other blocker too. There are occasionally real bugs that cannot be reproduced in lldebug, or indeed with any different PyPy than the one on which it showed up. As long as -g -flto builds of PyPy cannot be loaded in gdb (and a fortiori rr), we'll get in the situation where there are rare but real bugs in PyPy and we cannot do anything about them.

  3. Log in to comment