The import sys... code puts files in easy-install.pth at the very beginning of sys.path, which is surprising and often inappropriate, and leads to many broken installations. Making easy-install.pth a regular .pth file means packages installed to a given site-packages via easy_install have the same import priority as that site-packages dir as if they had been installed with pip, rather than jumping ahead of user packages, the stdlib, and even '.'.
Since nobody should be using easy_install (or even python setup.py install, as I understand it), I think the only entries in easy-install.pth should be the result of pip install -e. It seems that the only difference in behavior between pip install and pip install -e should be that they are 'symlinked' (via .pth), and not a wildly different import priority. Making easy-install.pth a vanilla .pth file reduces the difference between editable and non-editable installs, which seems like a good thing.
I'm inclined to accept this change if it can be shown that it won't have serious negative impact on most/all environments.
What are the implications of this change for existing users? What are the implications for environments that still rely on easy_install? What are the use cases that inspired this code, and are they obviated?
Although I take your point that pip is the recommended installer, there are still use cases that pip doesn't solve (primarily resolution of a namespace package from more than one location and dynamic working set resolution such as with setup.py test). As a result, there isn't a working transition story for these cases, so I want to be careful not to break the use of easy_install.
The only reliance on the sys.path priority jumping I am aware of has been the ability to override stdlib modules, which was relied on by the readline package on OS X. When pip started taking over, this stopped working because you can't jump priority with pip installs (except -e). The response of the readline package was to decide that overriding the stdlib was not a good idea in the first place, and renamed the package gnureadline. So pip's removal of this behavior was ultimately a boon in the only case I am aware of.
That's an encouraging anecdote, hinting at the severity of impact that this change might have. I think we can accept this PR, but I'm going to leave it on the shelf until the codebase is stable following other pending changes so that this change can be vetted discretely.
And it would be great to find other people more experienced with the advanced usage of easy_install, who can speak to any existing use cases for the priority jumping, and we can figure out if there are other ways to accomplish their goals. I'm not aware of any, though.
I did some digging. The code being removed was added in 4396a0b8d222 with the comment:
Added automatic handling of installation conflicts. Eggs are now shifted to the front of sys.path, in an order consistent with where they came from, making EasyInstall seamlessly co-operate with system package managers. The --delete-conflicting and --ignore-conflicts-at-my-risk options are now no longer necessary, and will generate warnings at the end of a run if you use them.
It seems this use case is still relevant. Easy_install was adapted to explicitly override packages installed by system package managers. If this functionality is removed, it will cause packages installed by system package managers (or by any other installer that merely copies the packages to a site-packages directory) to always take precedence. Although I can see how that might be desirable in some cases, that change is a pretty substantial one, and probably shouldn't be taken lightly.
At the time that was committed, easy_install was the new, emergent packaging tool, intended to replace the status quo, so it made sense for it to take precedence. Today, the landscape is different, with easy_install largely deprecated in favor of other techniques.
What I'd like to do is consider a less aggressive approach, one that's backward compatible, but allows for opt-in, such as through an environment variable. Maybe EASY_INSTALL_PATH_PRECEDENCE, where values of 'internal' and 'system' refer to the current behavior and the proposed behavior, and 'internal' is the default for now.
@Min RK, can you do that? Also, can you include a note in the changelog explaining the change? Finally, we'll need an update to the documentation explaining this feature and its usage (and possibly updating other documentation expressing the old behavior; I haven't looked). Thanks.
In that case, I would also propose a smaller change, that setup.py [install|develop] uses a new, setuptools.pth file that lacks sys.path modifications, rather than taking the easy_install route. I think it is very important that this be the default behavior, since it causes so many problems. I will also try to make sure pip never triggers easy_install (pip install -e > setup.py develop does now), so at least pip users can get consistent behavior.
making EasyInstall seamlessly co-operate with system package managers.
That is perhaps an optimistic statement, since it is exactly this behavior that causes easy-install to have frequent and severe conflicts with system package managers. Better cooperation with system package managers is precisely why I aim to remove it. Are there any anecdotes more recently that suggest this is still the case? It does not appear to be true on any of my debian-based systems.
I will try to come up with an alternate proposal, thanks. I see three ways to interpret EASY_INSTALL_PATH_PRECEDENCE as you put it:
install-time flag, which toggles whether the sys.path modification is inserted into easy-install.pth (basically a toggle for this PR). This seems problematic, since the most recent install will change the behavior of all previous setuptools-installed packages)
runtime-flag, which switches whether the sys.path modification has an effect. This also doesn't seem quite right, since if there are any packages that depend on the old behavior, it has to be all or nothing.
install-time flag, which toggles between easy-install.pth and setuptools.pth (with/without sys.path modifications, respectively). This one is probably my current preference, after thinking on it for a bit.
I worked on proposal #3 yesterday and everything was going great until I got to upgrade/uninstall. Two installation paths work fine and are easily controlled by a simple switch. However, having two separate managed .pth files makes the removal of entries associated with upgrade and uninstall more complicated, and there are assumptions all over the place that there is only one set of files to manage.
Adding a second .pth file to manage also means that pip would need an update, since it has some special handling for easy-install.pth that would need extending to setuptools-install.pth.
Having seen that, I'm not sure what to do here. I would very much like to avoid being responsible for increasing the complexity of setuptools, but I don't see how to preserve the old behavior and include a path forward that behaves more sensibly without doing that. I'll keep plugging away there, but I do think a better ultimate plan is to totally isolate the old, recommended-against easy_install behavior, and divorce it from any other setuptools call (e.g. setup.py develop shouldn't inherit from easy_install, setup.py install should call pip instead of easy_install, shouldn't make an egg, etc.).
Do you have a preference among those options, or another interpretation I should put together?
It wasn't an easy choice, which is why I'm still not decisive about it. I lean toward (1), because it's a simpler implementation and because I don't want to fork yet another installer called "setuptools". Your findings seem to reinforce my instinct (I also don't want to be responsible for increasing the complexity of setuptools).
I don't think we want to explore using pip from other setuptools calls. That would add a circular dependency and drastically complicate bootstrapping and vendoring. I'm not quite sure how those choices come into play.
I appreciate the effort you're putting into this, and I'm afraid I'm not able to give it the attention necessary to provide better guidance at this time. I encourage you to keep working through it, and maybe explore how approach (1) might be made to work.
Thanks. I think implementing (1) should be easy, actually. I'm more concerned about the behavior - one install with/without the flag can change how all previously packages will be imported (for better or worse). If this is the way to go, it might be useful to ensure there is a call that toggles this behavior without having to install a new package.
I don't think we want to explore using pip from other setuptools calls.
Yeah, that makes sense, though to the world, setuptools+pip are really an inseparable pair. I'm mainly concerned with changing the default behavior of setup.py install and setup.py develop when setuptools is available. These should probably do what pip install . does, rather than eggy things. Avoiding calling pip makes sense, but it should do what pip does instead of what setuptools currently does.
+1 to this change, we spend a couple of days debugging why miniconda/conda would install and completly break all pythons installations. Conda python was finding half of the Python Stdlib from conda install, and half of Stdlib from system install.
It was really hard to debug, as all conda invocation where crashing in incomprehensible manner, and even pip was broken. Import order made no sens at all, PYTHONPATH and PATH had no effects.
[setup.py install/develop] should probably do what pip install . does, rather than eggy things.
If I understand correctly, pip install -e invokes setup.py develop, and when it comes to installing the dependencies, it's still necessary to have pip install eggs to support developing namespace packages (pip 3).
Have you considered another, more aggressive approach? Rather than trying to manage a whole new set of dependencies, what if instead you were to simply (and aggressively) re-write easy_install.pth with or without the sys.path manipulation?
Then, users could opt-in to this behavior in their own time, fight out issues with upgrades, and eventually come to rely on the behavior, at which point others might join to create a dominant adoption and change the default?
I hope this gets merged, the 'jump silently to the front of the queue' behavior of setuptools that @Min RK is trying to patch here is one of the reasons why, for many years, I have had a hard policy of 'no setuptools over my dead body' in IPython. While that behavior may appear to be a good idea and help users in some immediate scenarios, I have seen it over and over introduce hideous, incomprehensible behavior into the system, that is furthermore nearly impossible to debug or work against, since it jumps over every mechanism offered by Python: it bypasses PYTHONPATH, site.py and everything else, to modify the import path at the very last minute directly at runtime. So the only way to understand what's going on is to do runtime debugging.
I consider this behavior of setuptools, for all intents and purposes, indistinguishable from malware, and I never allow it on my systems (I still use manually managed PYTHONPATH for that reason). I have made patches to remove unconditional use of setuptools from other open source projects so other users aren't subjected to this behavior against their will.
While easy_install might have been a necessary evil back in its day, it's high time we clean up this situation with a pip that behaves in a predictable, declarative fashion, respecting the conventions the platform defines (such as PYTHONPATH). Thanks so much for working on this!
Thanks, I think the optional env switch is a good first step. Hopefully I will find some time to make a patch to pip, so that pip install -e no longer calls setup.py develop, which is the only call in modern installation commands that still relies on easy_install.pth. That way, users of pip can avoid ever making calls to unmodified setuptools commands.