#4 Merged
Repository
embray embray
Branch
fix-sdist
Repository
tarek tarek
Branch
default

Make sdist work more like distutils wrt package_data

Author
  1. Erik Bray avatarErik Bray
Reviewers
Description

The sdist in distribute does not use package_data to determine whether or not such files should be included in the source dist. Instead, the only way to include package_data is to ensure that all package_data files are in VCS, or to list them in MANIFEST.in.

While neither of those options are enormous hardships, it's still different enough from distutils to be annoying. For example, for a project stored in git the sdist won't be created properly *unless* setuptools-git is installed. I think it's wrong to have such a build requirement for such basic functionality as building a source distribution. Furthermore, we have generated files that should also be included in the dist, but won't get picked up unless they're manually listed in MANIFEST.in, meaning they have to be specified in two places (the setup.py *and* MANIFEST.in).

The only other option is revert back to the distutils sdist by way of cmdclass, but that's a little kludgy-looking).

I think the best option is restore the functionality of having sdist automatically add any files specified as package_data in the manifest. This behavior can still be disabled, however, by setting include_package_data=True (which we don't use anyways).

I think this should also resolve the issues raised in #218, though perhaps a documentation update to go along with this is still in order.

Comments (4)

  1. Erik Bray author

    Totally forgot about this. AFAICT this is still an issue that needs fixing. I've included a test, and some slight updates to the docs. This should clarify the issue raised in #218: Now the situation is as such:

    If include_package_data=True, package_data is ignored when generating the manifest. Otherwise, every file matched by package_data is automatically included in the manifest regardless of whether it's tracked by version control.

    1. Chris Jerdonek

      This might be obvious, but am I correct in that it seems like `include_package_data` does the opposite of what it is called? Why is this, or am I misreading? In particular, the previous comment implies (rephrasing), "if `include_package_data=False`, then every file matched by `package_data` is automatically included...." Thanks for the fix, by the way. :)

      1. Erik Bray author

        Yes, include_package_data is confusingly named, but we can't change it either. package_data wasn't added to distutils until Python2.4, so setuptools' include_package_data actually predates it, and means "automatically scan for any non-py files under a package and include them in the manifest if they're tracked by revision control". Setuptools also added package_data for finer-grained control. But when package_data was added to distutils it did not bring include_package_data with it, due to lack of interest in maintaining support for different VCSs in the stdlib.

        So include_package_data and just package_data are actually two different things, hence the confusion.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.