# Proposed Changes to Configuration Files

Author: Éric Araujo Carl Meyer, folks at PyCon 2010, people in #distutils PSF 1.0

Warning

This document is not part of the documentation of Distutils2. It is a design/discussion document that serves to explain directions, collect feedback and votes, and will ultimately be rewritten as proper documentation (without all the explanations about choices) and moved into the relevant doc files.

One goal of Distutils2 is to put all the information required to build and install a distribution into a static configuration file (:file:setup.cfg) instead of Python code (:file:setup.py). This information (i.e. the arguments to distutils.core.setup) is split into metadata, files and customization hooks.

In the olden days, Distutils configuration files were used only to give options to commands. They were also designed to be extensible: Third-party tools relying on Distutils or providing new commands could tell their users to add a section in the distribution’s :file:setup.cfg file or in their user config file to set options. There is a simple API to get these options merged from all configuration files (which will be even simpler in Distutils2).

Moving the distribution configuration from a script to a static file makes it easier for tools to get the information without having to run code (the :file:setup.py script). It will also allow a variety of tools written in any language to work from the same information.

The new sections are different from usual sections that give options to commands because they make no sense in system or user configuration files. While specifying install or sdist options in a system or user configuration file is useful, options like author name or scripts to include in a distribution have to be in the project’s config file only.

In another discussion, it may be good to think about configuration files precedence rules; e.g. if a user specifies an installation directory in their own config file, why is the distribution’s file able to override that choice?

## Metadata

The first kind of :func:setup arguments that should be supported in :file:setup.cfg are the ones that give metadata. Fields are defined in PEP 345 and progress is tracked on Python #8252. These fields are uncontroversial:

[metadata]
name = RestingParrot
version = 0.6.4
author = Carl Meyer
author-email= carl@oddbird.net
summary = A sample project demonstrating distutils2 packaging
classifiers =
Development Status :: 4 - Beta
Environment :: Console (Text Based)
Environment :: X11 Applications :: GTK; python_version < '3'
License :: OSI Approved :: MIT License
Programming Language :: Python
Programming Language :: Python :: 2
Programming Language :: Python :: 3
requires-dist =
PetShoppe
MichaelPalin (> 1.1)
pywin32; sys.platform == 'win32'
pysqlite2; python_version < '2.5'
inotify (0.0.1); sys.platform == 'linux2'
requires-external = libxml2
provides-dist = distutils2-sample-project (0.2)
unittest2-sample-project
project-url =
Main repository, http://bitbucket.org/carljm/sample-distutils2-project
Fork in progress, http://bitbucket.org/Merwok/sample-distutils2-project


Multi-value fields use newline-separated values, since the values themselves may contain spaces. The first (or only) value may be on the same line as the key or on the following one.

The config file parser automatically supports case variation and underscores in place of hyphen in field names. Our own documentation should be consistent and use only lower case and hyphens, for simplicity and non-ugliness.

The PEP 345 environment markers used here will be passed to the DistributionMetadata instance without processing, as intended: The class knows which fields are allowed to use a marker and how to interpret them.

Some fields don’t have a specified format or can be improved. Proposals follow.

### Encoding

codename: use-utf8

The encoding of the config file is UTF-8. This encoding enables using Unicode characters in string fields, and is also a superset of ASCII.

### Avoid Metadata-Version

codename: no-metadata-version

This field does not have to be in the file, since the DistributionMetadata class detects the right version from the fields that are present.

### Use CSV for Keywords and Requires-Python

codename: keywords-csv

PEP 345 only says that the Keywords field is “a list”; the example uses space-delimited values, but distutils and distutils2 print out comma-separated values, which allows having keywords with spaces in them (e.g. “version control”). The Requires-Python field is described as comma-separated values.

Since keywords and supported versions are typically much shorter than classifiers or dependencies, I propose that :file:setup.cfg use a comma-separated list of values, with leading and trailing spaces removed for user convenience (i.e. keywords = version control, packaging, testing, unit testing will give the list ['version control', 'packaging', 'testing', 'unit testing']). More examples:

requires-python = 2.6

requires-python = >=2.4, <=3.0

codename: keywords-no-csv

Alternatively, if it is deemed confusing to have two ways of giving multi-value fields, the field can be newline-separated like other fields already defined. Consistency would win other convenience.

### Merge author and author-email

codename: merge-author-email

Merge name and email in a single field for author (and maintainer):

author = Carl Meyer <carl@oddbird.net>
maintainer = Éric Araujo <merwok@netwok.org>


It is a common format, easy to parse (we do not support any valid RFC 2822 email field, just specifically name <email>). PEP 345 :file:METADATA files separate author name and email, but for user-written :file:setup.cfg this format is nicer.

### Get description from a file

codename: desc-from-file

Use the contents of a file as value for description. Long descriptions typically contain blank lines, which are stripped by our config file parser in Python < 3.2, so including the long description directly in the config file is a non-starter. People often already have the description in a :file:README file or equivalent, which they can edit and check with reST-enabled tools. Thus this proposal replaces a very common idiom in setup scripts, prevents duplication and desynchronisation, and avoids the need for me to touch regular expressions to tweak the parser in unholy ways. The value is a path relative to the directory containing the :file:setup.cfg (.. disallowed). Examples:

description = README

description = lib/python/unicorn/README.rst


The value can be a list of files, to be concatenated in order and used as description. Now that Distutils2 and PyPI allow uploading documentation and adding arbitrary links in a project page, the need for overly long description values is reduced, but some users would still want to concatenate e.g. :file:README and :file:NEWS. Thus, this field should allow a newline-separated list of files (it allows paths with spaces and complies with already-defined way of giving multi-value fields).

Files listed in that field are automatically added to distributions.

### Fix misnamed fields

The things listed in requires-dist and friends have a name and a version (optional) but no particular distribution format, therefore they’re not distributions but releases (or arguably projects). Editing accepted PEPs is hard, but it’s better to do it now before our tools and terminology are used in the wild. This item is listed only for completeness, not to call a vote; Alexis is the owner of this request, and the field names in :file:setup.cfg will follow the latest version of PEP 345.

## Files

Most other :func:setup arguments have to do with files. Arguments list Python modules, extension modules (written in C or C++), packages, scripts, files related to a package, other files. As specified in Python #8253, a new section is introduced, files; nothing else is already defined.

### Listing modules and packages

codename: merge-mod-pkg

For the new format, it is proposed that the three kinds of modules (Python modules, extension modules and packages) be merged into a single option. Given this file structure:

$tree . ├── haven.py ├── pirate.c ├── setup.cfg └── ship ├── cabins │ ├── captain.py │ └── __init__.py ├── hull.c └── __init__.py  This configuration is enough to include the two modules, the package, its subpackage and all submodules in the distribution and have them processed by the relevant commands (build_*, sdist, etc.): [files] modules = haven pirate ship  (See below for the rationale to separate with newlines instead of arbitrary whitespace. See appendix for implementation details explaining how this merged list will be easily parsed.) It is not possible to have a module and a package with the same name. In addition to being documented, this restriction could also be a runtime warning. #### Naming Calling packages “modules” may be confusing to some people, e.g. beginners, even if it’s technically correct. Other proposed names include “source” (too vague) and “importables” (unused in the documentation and ugly). codename: no-merge-mod-pkg Alternatively, modules and packages could be listed separately. Since it appears that people tend to use either one or the other in their projects, there would be no cognitive overload in defining two fields instead of one: modules = haven pirate  packages = ship exclude-packages = ship.hull  This may also prevent future problems when namespace packages are supported by Python, or maybe not; I have to read PEP 382 closely to get a better undestanding of file layout and possible detection (esp. recursion) issues. #### Recursion codename: pkg-recursion Package listing is recursive; real-world use of :func:setuptools.find_packages shows that this is a useful feature. There is a way to control it: modules = ship exclude-modules = ship.hull  codename: [pkg-recursion-boolean] Additionally, a boolean option could control all-or-nothing recursion: modules = ship recursive-modules = 0  It is not clear that this would solve problems, e.g. for the mx project which is developped in one tree but packaged as separate PyPI projects; maybe it’s best not to define this option right now, try to convert complex projects and then revise this proposal. #### Replacing package_dir codename: pkg-dir-prefix Instead of replacing the package_dir argument with a field of the same name, it is proposed to merge it with the packages (or modules) values: packages = ship src:parrot src2:thing exclude-packages = src:parrot.tests  This translates this file structure: $ tree .
├── ship
│   └── __init__.py
├── src
│   └── parrot
│       ├── __init__.py
│       └── tests
│           └── __init__.py
└── src2
└── thing
└── __init__.py


Note that these semantics are different from setup(packages=['thing'], package_dir={'thing': 'src2'}: In the new proposal, src2:thing does not mean that the :file:src2 directory is to be renamed :file:thing in the build directory (reference), but that the directory :file:src2 contains another directory named :file:thing (with its :file:__init__.py file and other submodules). The changed semantics are more intuitive.

Using : as a separator forbids using it in directory name, which would not be very sane anyway. Using it instead of a space or a slash also allows putting packages in a deep subdirectory:

packages = client/bindings/python:parrotlib


This source directory syntax is also available for modules (in case the proposal to merge them with packages is rejected):

modules = ham/lib/python:ham
cheese/lib/python:cheese

codename: pkg-dir-no-prefix

If a study of setup scripts in projects distributed on the Cheeseshop reveals that an overwhelming majority uses only one package_dir or none, this alternate, simpler proposal would be enough:

packages = parrot
exclude-packages = parrot.tests
package-dir = src


#### Replacing conditionals

codename: env-markers-for-files

We can define the packages or modules field to be newline-separated and accept PEP 345 environment markers to support source distributions that contain e.g. code for both 2.x and 3.x, like httplib2 does:

packages =
python2:httplib2; python_version < '3'
python3:httplib2; python_version > '2'


(Form using the alternate package_dir proposal:

packages = httplib2
package-dir = python2; python_version < '3'
python3; python_version > '2'


Using a multi-line value for this field is kind of ugly, though.)

Since there is no else, each condition has to be written twice (once in normal form, once in reverse, which can be tricky and/or tedious), and the values available as EXPR in environment markers (Python version, OS name,  etc.) do not provide all that is required. One example from Mercurial that is not trivial to reverse (or maybe I’m just bad at boolean logic):

if sys.platform == 'win32' and sys.version_info < (2, 5, 0, 'final'):
pymodules.append('mercurial.pure.osutil')


Example that can’t be translated:

if sys.platform == 'linux2' and os.uname()[2] > '2.6':
# The inotify extension is only usable with Linux 2.6 kernels.
...


For such cases, the solution seems to use a pre-build hook to edit the lists of modules and packages. For trivial cases, environment markers provide a solution that does not require writing any code, so they’re still useful in the files section.

### Extension modules

codename: extensions-section

A new section family is proposed to describe extension modules, to replace instantiation of :class:Extension objects with the right options in :file:setup.py. Each extension module has to be listed in the files field and described in its own section:

[files]
modules = ship.pirate

[extension: ship.pirate]
sources = ship/pirate.c
headers = Python.h pirate.h
include-dirs = include
optional = 1


The section name is the string extension: followed by optional whitespace and the full name of the module, field names are directly taken from :class:Extension arguments, values are simple adaptations (string arguments are single values, string lists are multi-value fields (whitespace-separated or newline-separated, to be decided), booleans are :mod:ConfigParser booleans).

codename: [vars-in-extmod]

If deemed useful, simple variables could be added to these sections; see definition.

codename: extensions-section-flat

An alternate proposal that requires only one section but may prove more difficult to write and to parse is derived from the older Setup format, deprecated in Distutils and removed in Distutils2 (see :file:{python3.2}/Lib/distutils/tests/Setup.sample):

[extensions]
pirate = pirate.c
ship.hull = ship/hull.c


The format is module = source files [arguments to the compiler]. More involved example from SDL:

[extensions]
_camera = src/_camera.c src/camera_v4l2.c src/camera_v4l.c $SDL$DEBUG
_numericsurfarray = src/_numericsurfarray.c $SDL$DEBUG
font = src/font.c $SDL$FONT $DEBUG scrap = src/scrap.c$SDL $SCRAP$DEBUG


This example introduces variables, which can be any string. A simple proposal for the assignment syntax:

$DEBUG =$GFX = src/SDL_gfx/SDL_gfxPrimitives.c
$SDL = -I/usr/include/SDL -D_REENTRANT -lSDL$FONT = -lSDL_ttf

### Resources

A PEP needs to be written. See the design document.

## Customization hooks

codename: distclass-cmdclass

(This is not related to pre/post-command hooks, which will probably be set in the relevant command sections or in a new one.)

The third kind of :func:setup arguments are customization hooks.

distclass: Specify a class to use instead of :class:distutils2.dist.Distribution Mapping of command names to classes, to replace existing commands or provide new ones, e.g. setup(..., cmdclass={'build_py': build_py_2to3, 'lint': LintCommand})  Usually sys.argv[0], used to generate error messages with the correct script name in case it it not :file:setup.py. List of arguments to use instead of sys.argv[1:]

The last two arguments are not needed in :file:setup.cfg, whereas the first two have are useful and can use this simple syntax:

[global]
distclass = shop.cheese.HamDistribution
cmdclass =
build_py = distutils2.build_py.build_py_2to3; python_version >= '3'
test = lib:_buildhelper.TestCommand


As you can see, environment markers and source directory specifiers are allowed. The fields are located in the global section, alongside command-packages.

codename: [rename-cmdclass]

Additional proposal: Give the field a clearer name. command-classes or commands (and change it in the Python code too). If there is a good reason to keep it short, it should at least be a plural form, i.e. cmdclasses.

## Appendix: Making things simple for users

Distutils2 will ship with a little program called mkpkg (which will soon get a better name) that generates a :file:setup.cfg file thanks to questions asked to the user (what is the project name, its version, etc.). As much as possible, the program will propose answers (e.g. using the :func:find_package function to get the list of packages, mocking sys.modules['distutils'] to run :file:setup.py scripts in a sandbox and get information from it, etc.) so that the user just has to press :kbd:Enter to validate, or write the correct value and validate.

The program will also help people do the right thing, e.g. use a version number compliant with PEP 386, fill the license field only when there is no suitable Trove classifier for the chosen license, in other words give useful hints for people that don’t read PEPs or documentation.

Some values could be specified in the user configuration file:

[mkpkg]
author = John Smith <john@example.org>
project-url-template =
Code repository, http://example.org/hacking/projects/{name}
Documentation, http://packages.python.org/{name}


For people wanting to upgrade progressively, Distutils2 includes a lib2to3-based converter to rewrite imports (:mod:distutils2 provides a :func:setup function with a signature compatible with the one from :mod:distutils and a :func:find_packages function similar to the one from :mod:setuptools), to allow projects to keep using a setup script while they transition their practices and installation documentation.

## Appendix: Implementation details

### Multi-value fields

The config file parser strips leading and trailing whitespace for free, we just have to handle the case of the first line being empty (in the line spam =\nham, the config file format considers there is an empty line after the equals). Handling that case is as simple as value.strip().splitlines()).

### Support code

Helper functions to split the source directory specifier, resolve a dotted name and split an environment marker will be provided in :mod:util and :mod:config for use by third-party tools. :mod:config will also provide higler-level functions to get a :class:DistributionMetadata instance from a :file:setup.cfg file, a list of Python modules, a list of Python packages filtered according to the environment, access a config section defined by a Distutils2 extension, and so on.

If the proposal to merge the lists of modules and packages is accepted, Distutils2 code will have to sort this list into the three lists used by Distribution, following these simple rules:

1. If the name has an extension section (or if it is listed in the extensions section, depending on the proposal that gets accepted), an instance of :class:Extension is created and added to distribution.ext_modules;
2. If the name corresponds to an existing directory which contains an __init__.py file, it is added to distribution.packages;
3. The name is added to distribution.py_modules.

## Appendix: Mapping from arguments to fields

Argument in :file:setup.py Field in :file:setup.cfg
description summary
long_description description (changed meaning)
author author
author_email merged with author
maintainer maintainer
maintainer_email merged with maintainer
url home-page
N/A project-url
every other metadata field unchanged
packages packages or modules
py_modules modules
ext_modules modules (+ extension(s) section)
ext_package unsupported
distclass distclass
cmdclass cmdclasses
script_name N/A
script_args N/A
options fields in sections named after commands

## Appendix: Rejected ideas

### Get metadata from hooks

Some users would like to specify callback functions instead of writing some values, to avoid repetition. Let’s take version as example. It is very common to have it stored as a tag in version control, and there are a number of helper functions to get this information. We could have this kind of field:

[metadata]
get-version = _buildhelper.get_hg_version


This proposal has to be rejected, since it strongly conflicts with the point of having static metadata. If the metadata is fully defined by a file format, then any tool in any language can follow the specification to implement a parser and do useful things with the values, without depending on Distutils2, setup scripts or Python at all. People who really cannot write the version number in :file:setup.cfg for some reason can still use a :file:setup.py and benefit from fixes and features in Distutils2, but they won’t be static metadata-compliant.

Furthermore, the version example is not a strong argument. When doing a release, updating the version number in :file:setup.cfg is but a minor and quick step. Documentation needs to be checked, translations built, the version number has to be updated in :file:README, :file:NEWS or :file:CHANGES, source code, so using a hook to set version in :file:setup.cfg would remove only one tidbit of work.

Other fields may be duplicated in documentation files and :file:setup.cfg, i.e. author, summary and project URIs, but this duplication has a very small cost. Dependencies, classifiers and keywords are only in :file:setup.cfg, so wouldn’t benefit from hooks at all.

In conclusion, other solutions can be explored. Since the version number is easily retrieved from :file:setup.cfg, a trivial shell function can be written to create the VCS tag from the static metadata. People who love automation typically write a small script or makefile to do all operations related to a release (adjust version numbers in relevant files, run lint tools, run i18n tools, etc.), then check if the result look good (not always trusting automated tools is sane), commit, tag, push, send announcements, register the release in catalogs and so on.