Source

python-peps / pep-0376.txt

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
PEP: 376
Title: Database of Installed Python Distributions
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Feb-2009
Python-Version: 2.7, 3.2
Post-History:

Abstract
========

The goal of this PEP is to provide a standard infrastructure to manage
project distributions installed on a system, so all tools that are
installing or removing projects are interoperable.

To achieve this goal, the PEP proposes a new format to describe installed
distributions on a system. It also describes a reference implementation
for the standard library.

In the past an attempt was made to create an installation database (see PEP 262
[#pep262]_).

Combined with PEP 345, the current proposal supersedes PEP 262.


Rationale
=========

There are two problems right now in the way distributions are installed in
Python:

- There are too many ways to do it and this makes interoperation difficult.
- There is no API to get information on installed distributions.

How distributions are installed
-------------------------------

Right now, when a distribution is installed in Python, every element can
be installed in a different directory.

For instance, `Distutils` installs the pure Python code in the `purelib`
directory, which is ``lib\python2.6\site-packages`` for unix-like systems and
Mac OS X, or `Lib/site-packages` under Python's installation directory for
Windows.

Additionally, the `install_egg_info` subcommand of the Distutils `install`
command adds an `.egg-info` file for the project into the `purelib`
directory.

For example, for the `docutils` distribution, which contains one package an
extra module and executable scripts, three elements are installed in
`site-packages`:

- `docutils`: The ``docutils`` package.
- `roman.py`: An extra module used by `docutils`.
- `docutils-0.5-py2.6.egg-info`: A file containing the distribution metadata
  as described in PEP 314 [#pep314]_. This file corresponds to the file
  called `PKG-INFO`, built by the `sdist` command.

Some executable scripts, such as `rst2html.py`, are also added in the
`bin` directory of the Python installation.

Another project called `setuptools` [#setuptools]_ has two other formats
to install distributions, called `EggFormats` [#eggformats]_:

- a self-contained `.egg` directory, that contains all the distribution files
  and the distribution metadata in a file called `PKG-INFO` in a subdirectory
  called `EGG-INFO`. `setuptools` creates other files in that directory that can
  be considered as complementary metadata.

- an `.egg-info` directory installed in `site-packages`, that contains the same
  files `EGG-INFO` has in the `.egg` format.

The first format is automatically used when you install a distribution that
uses the ``setuptools.setup`` function in its setup.py file, instead of
the ``distutils.core.setup`` one.

`setuptools` also add a reference to the distribution into an
``easy-install.pth`` file.

Last, the `setuptools` project provides an executable script called
`easy_install` [#easyinstall]_ that installs all distributions, including
distutils-based ones in self-contained `.egg` directories.

If you want to have standalone `.egg-info` directories for your distributions,
e.g. the second `setuptools` format, you have to force it when you work
with a setuptools-based distribution or with the `easy_install` script.
You can force it by using the `-–single-version-externally-managed` option
**or** the `--root` option. This will make the `setuptools` project install
the project like distutils does.

This option is used by :

- the `pip` [#pip]_ installer
- the Fedora packagers [#fedora]_.
- the Debian packagers [#debian]_.

Uninstall information
---------------------

Distutils doesn't provide an `uninstall` command. If you want to uninstall
a distribution, you have to be a power user and remove the various elements
that were installed, and then look over the `.pth` file to clean them if
necessary.

And the process differs depending on the tools you have used to install the
distribution and if the distribution's `setup.py` uses Distutils or
Setuptools.

Under some circumstances, you might not be able to know for sure that you
have removed everything, or that you didn't break another distribution by
removing a file that is shared among several distributions.

But there's a common behavior: when you install a distribution, files are
copied in your system. And it's possible to keep track of these files for
later removal.

Moreover, the Pip project has gained an `uninstall` feature lately. It
records all installed files, using the `record` option of the `install`
command.

What this PEP proposes
----------------------

To address those issues, this PEP proposes a few changes:

- A new `.dist-info` structure using a directory, inspired on one format of
  the `EggFormats` standard from `setuptools`.
- New APIs in `pkgutil` to be able to query the information of installed
  distributions.
- An uninstall function and an uninstall script in Distutils.


One .dist-info directory per installed distribution
===================================================

This PEP proposes an installation format inspired by one of the options in the
`EggFormats` standard, the one that uses a distinct directory located in the
site-packages directory.

This distinct directory is named as follows::

    name + '-' + version + '.dist-info'

This `.dist-info` directory can contain these files:

- `METADATA`: contains metadata, as described in PEP 345, PEP 314 and PEP 241.
- `RECORD`: records the list of installed files
- `INSTALLER`: records the name of the tool used to install the project
- `REQUESTED`: the presence of this file indicates that the project
   installation was explicitly requested (i.e., not installed as a dependency).

The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED may
be missing.

This proposal will not impact Python itself because the metadata files are not
used anywhere yet in the standard library besides Distutils.

It will impact the `setuptools` and `pip` projects but, given the fact that
they already work with a directory that contains a `PKG-INFO` file, the change
will have no deep consequences.


RECORD
------

A `RECORD` file is added inside the `.dist-info` directory at installation
time when installing a source distribution using the `install` command.
Notice that when installing a binary distribution created with `bdist` command
or a `bdist`-based command, the `RECORD` file will be installed as well since
these commands use the `install` command to create binary distributions.

The `RECORD` file holds the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
be generated by default. This allows the implementation of an uninstallation
feature, as explained later in this PEP. The `install` command also provides
an option to prevent the `RECORD` file from being written and this option
should be used when creating system packages.

Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.

This RECORD file is inspired from PEP 262 FILES [#pep262]_.

The `RECORD` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
these options:

- field delimiter : `,`
- quoting char :  `"`.
- line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``)

When a distribution is installed, files can be installed under:

- the **base location**: path defined by the ``--install-lib`` option,
  which defaults to the site-packages directory.

- the **installation prefix**: path defined by the ``--prefix`` option, which
  defaults to ``sys.prefix``.

- any other path on the system.


Each record is composed of three elements:

- the file's **path**

  - a '/'-separated path, relative to the **base location**, if the file is
    under the **base location**.

  - a '/'-separated path, relative to the **base location**, if the file
    is under the  **installation prefix** AND if the **base location** is a
    subpath of the **installation prefix**.

  - an absolute path, using the local platform separator

- a hash of the file's contents.
  Notice that `pyc` and `pyo` generated files don't have any hash because
  they are automatically produced from `py` files. So checking the hash
  of the corresponding `py` file is enough to decide if the file and
  its associated `pyc` or `pyo` files have changed.

  The hash is either the empty string, the **MD5** hash of
  the file, encoded in hex, or the hash algorithm as named in
  ``hashlib.algorithms_guaranteed``, followed by the equals character
  ``=``, followed by the urlsafe-base64-nopad encoding of the digest 
  (``base64.urlsafe_b64encode(digest)`` with trailing ``=`` removed).

- the file's size in bytes

The ``csv`` module is used to generate this file, so the field separator is
",". Any "," character found within a field is escaped automatically by
``csv``.

When the file is read, the `U` option is used so the universal newline
support (see PEP 278 [#pep278]_) is activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.

Here's an example of a RECORD file (extract)::

    lib/python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
    lib/python2.6/site-packages/docutils/__init__.pyc,,
    lib/python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
    lib/python2.6/site-packages/docutils/core.pyc,,
    lib/python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
    lib/python2.6/site-packages/roman.pyc,,
    /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
    /usr/local/bin/rst2html.pyc,,
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
    lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD,,

Notice that the `RECORD` file can't contain a hash of itself and is just mentioned here

A project that installs a `config.ini` file in `/etc/myapp` will be added like this::

    /etc/myapp/config.ini,b690274f621402dda63bf11ba5373bf2,9544

For a windows platform, the drive letter is added for the absolute paths,
so a file that is copied in c:\MyApp\ will be::

    c:\etc\myapp\config.ini,b690274f621402dda63bf11ba5373bf2,9544


INSTALLER
---------

The `install` command has a new option called `installer`. This option
is the name of the tool used to invoke the installation. It's an normalized
lower-case string matching `[a-z0-9_\-\.]`.

    $ python setup.py install --installer=pkg-system

It defaults to `distutils` if not provided.

When a distribution is installed, the INSTALLER file is generated in the
`.dist-info` directory with this value, to keep track of **who** installed the
distribution. The file is a single-line text file.


REQUESTED
---------

Some install tools automatically detect unfulfilled dependencies and
install them. In these cases, it is useful to track which
distributions were installed purely as a dependency, so if their
dependent distribution is later uninstalled, the user can be alerted
of the orphaned dependency.

If a distribution is installed by direct user request (the usual
case), a file REQUESTED is added to the .dist-info directory of the
installed distribution. The REQUESTED file may be empty, or may
contain a marker comment line beginning with the "#" character.

If an install tool installs a distribution automatically, as a
dependency of another distribution, the REQUESTED file should not be
created.

The ``install`` command of distutils by default creates the REQUESTED
file. It accepts ``--requested`` and ``--no-requested`` options to explicitly
specify whether the file is created.

If a distribution that was already installed on the system as a dependency
is later installed by name, the distutils ``install`` command will
create the REQUESTED file in the .dist-info directory of the existing
installation.


Implementation details
======================

New functions and classes in pkgutil
------------------------------------

To use the `.dist-info` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is `pkgutil`.

Functions
~~~~~~~~~

The new functions added in the ``pkgutil`` module are :

- ``distinfo_dirname(name, version)`` -> directory name

    ``name`` is converted to a standard distribution name by replacing any
    runs of non-alphanumeric characters with a single '-'.

    ``version`` is converted to a standard version string. Spaces become
    dots, and all other non-alphanumeric characters (except dots) become
    dashes, with runs of multiple dashes condensed to a single dash.

    Both attributes are then converted into their filename-escaped form,
    i.e. any '-' characters are replaced with '_' other than the one in
    'dist-info' and the one separating the name from the version number.

- ``get_distributions()`` -> iterator of ``Distribution`` instances.

  Provides an iterator that looks for ``.dist-info`` directories in
  ``sys.path`` and returns ``Distribution`` instances for
  each one of them.

- ``get_distribution(name)`` -> ``Distribution`` or None.

- ``obsoletes_distribution(name, version=None)`` -> iterator of ``Distribution``
  instances.

  Iterates over all distributions to find which distributions *obsolete*
  ``name``. If a ``version`` is provided, it will be used to filter the results.

- ``provides_distribution(name, version=None)`` -> iterator of ``Distribution``
  instances.

  Iterates over all distributions to find which distributions *provide*
  ``name``. If a ``version`` is provided, it will be used to filter the results.
  Scans all elements in ``sys.path`` and looks for all directories ending with
  ``.dist-info``. Returns a ``Distribution`` corresponding to the
  ``.dist-info`` directory that contains a METADATA that matches `name`
  for the `name` metadata.

  This function only returns the first result founded, since no more than one
  values are expected. If the directory is not found, returns None.

- ``get_file_users(path)`` -> iterator of ``Distribution`` instances.

  Iterates over all distributions to find out which distributions uses ``path``.
  ``path`` can be a local absolute path or a relative '/'-separated path.

  A local absolute path is an absolute path in which occurrences of '/'
  have been replaced by the system separator given by ``os.sep``.


Distribution class
~~~~~~~~~~~~~~~~~~

A new class called ``Distribution`` is created with the path of the
`.dist-info` directory provided to the constructor. It reads the metadata
contained in `METADATA` when it is instantiated.

``Distribution(path)`` -> instance

  Creates a ``Distribution`` instance for the given ``path``.

``Distribution`` provides the following attributes:

- ``name``: The name of the distribution.

- ``metadata``: A ``DistributionMetadata`` instance loaded with the
  distribution's METADATA file.

- ``requested``: A boolean that indicates whether the REQUESTED
  metadata file is present (in other words, whether the distribution was
  installed by user request).

And following methods:

- ``get_installed_files(local=False)`` -> iterator of (path, hash, size)

  Iterates over the `RECORD` entries and return a tuple ``(path, hash, size)``
  for each line. If ``local`` is ``True``, the path is transformed into a
  local absolute path. Otherwise the raw value from `RECORD` is returned.

  A local absolute path is an absolute path in which occurrences of '/'
  have been replaced by the system separator given by ``os.sep``.

- ``uses(path)`` -> Boolean

  Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
  can be a local absolute path or a relative '/'-separated path.

- ``get_distinfo_file(path, binary=False)`` -> file object

   Returns a file located under the `.dist-info` directory.

   Returns a ``file`` instance for the file pointed by ``path``.

   ``path`` has to be a '/'-separated path relative to the `.dist-info`
   directory or an absolute path.

   If ``path`` is an absolute path and doesn't start with the `.dist-info`
   directory path, a ``DistutilsError`` is raised.

   If ``binary`` is ``True``, opens the file in read-only binary mode (`rb`),
   otherwise opens it in read-only mode (`r`).

- ``get_distinfo_files(local=False)`` -> iterator of paths

  Iterates over the `RECORD` entries and returns paths for each line if the path
  is pointing to a file located in the `.dist-info` directory or one of its
  subdirectories.

  If ``local`` is ``True``, each path is transformed into a
  local absolute path. Otherwise the raw value from `RECORD` is returned.


Notice that the API is organized in five classes that work with directories
and Zip files (so it works with files included in Zip files, see PEP 273 for
more details [#pep273]_). These classes are described in the documentation
of the prototype implementation for interested readers [#prototype]_.

Examples
~~~~~~~~

Let's use some of the new APIs with our `docutils` example::

    >>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
    >>> dist = get_distribution('docutils')
    >>> dist.name
    'docutils'
    >>> dist.metadata.version
    '0.5'

    >>> distinfo_dirname('docutils', '0.5')
    'docutils-0.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5')
    'python_ldap-2.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5 a---5')
    'python_ldap-2.5.a_5.dist-info'

    >>> for path, hash, size in dist.get_installed_files()::
    ...     print '%s %s %d' % (path, hash, size)
    ...
    python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
    python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
    python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
    /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
    python2.6/site-packages/docutils-0.5.dist-info/RECORD

    >>> dist.uses('docutils/core.py')
    True

    >>> dist.uses('/usr/local/bin/rst2html.py')
    True

    >>> dist.get_distinfo_file('METADATA')
    <open file at ...>

    >>> dist.requested
    True


New functions in Distutils
--------------------------

Distutils already provides a very basic way to install a distribution, which
is running the `install` command over the `setup.py` script of the
distribution.

Distutils2 [#pep262]_ will provide a very basic ``uninstall`` function, that
is added in ``distutils2.util`` and takes the name of the distribution to
uninstall as its argument. ``uninstall`` uses the APIs described earlier and
remove all unique files, as long as their hash didn't change. Then it removes
empty directories left behind.

``uninstall`` returns a list of uninstalled files::

    >>> from distutils2.util import uninstall
    >>> uninstall('docutils')
    ['/opt/local/lib/python2.6/site-packages/docutils/core.py',
     ...
     '/opt/local/lib/python2.6/site-packages/docutils/__init__.py']

If the distribution is not found, a ``DistutilsUninstallError`` is raised.

Filtering
~~~~~~~~~

To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It's
called for each file that is removed. If the callable returns `True`, the
file is removed. If it returns False, it's left alone.

Examples::

    >>> def _remove_and_log(path):
    ...     logging.info('Removing %s' % path)
    ...     return True
    ...
    >>> uninstall('docutils', _remove_and_log)

    >>> def _dry_run(path):
    ...     logging.info('Removing %s (dry run)' % path)
    ...     return False
    ...
    >>> uninstall('docutils', _dry_run)

Of course, a third-party tool can use lower-level ``pkgutil`` APIs to
implement its own uninstall feature.

Installer marker
~~~~~~~~~~~~~~~~

As explained earlier in this PEP, the `install` command adds an `INSTALLER`
file in the `.dist-info` directory with the name of the installer.

To avoid removing distributions that were installed by another packaging
system, the ``uninstall`` function takes an extra argument ``installer`` which
defaults to ``distutils2``.

When called, ``uninstall`` controls that the ``INSTALLER`` file matches
this argument. If not, it raises a ``DistutilsUninstallError``::

    >>> uninstall('docutils')
    Traceback (most recent call last):
    ...
    DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'

    >>> uninstall('docutils', installer='cool-pkg-manager')

This allows a third-party application to use the ``uninstall`` function
and strongly suggest that no other program remove a distribution it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.

Adding an Uninstall script
~~~~~~~~~~~~~~~~~~~~~~~~~~

An `uninstall` script is added in Distutils2. and is used like this::

    $ python -m distutils2.uninstall projectname

Notice that script doesn't control if the removal of a distribution breaks
another distribution. Although it makes sure that all the files it removes
are not used by any other distribution, by using the uninstall function.

Also note that this uninstall script pays no attention to the
REQUESTED metadata; that is provided only for use by external tools to
provide more advanced dependency management.

Backward compatibility and roadmap
==================================

These changes don't introduce any compatibility problems since they
will be implemented in:

- pkgutil in new functions
- distutils2

The plan is to include the functionality outlined in this PEP in pkgutil for
Python 3.2, and in Distutils2.

Distutils2 will also contain a backport of the new pgkutil, and can be used for
2.4 onward.

Distributions installed using existing, pre-standardization formats do not have
the necessary metadata available for the new API, and thus will be
ignored. Third-party tools may of course to continue to support previous
formats in addition to the new format, in order to ease the transition.


References
==========

.. [#distutils]
   http://docs.python.org/distutils

.. [#distutils2]
   http://hg.python.org/distutils2

.. [#pep262]
   http://www.python.org/dev/peps/pep-0262

.. [#pep314]
   http://www.python.org/dev/peps/pep-0314

.. [#setuptools]
   http://peak.telecommunity.com/DevCenter/setuptools

.. [#easyinstall]
   http://peak.telecommunity.com/DevCenter/EasyInstall

.. [#pip]
   http://pypi.python.org/pypi/pip

.. [#eggformats]
   http://peak.telecommunity.com/DevCenter/EggFormats

.. [#pep273]
   http://www.python.org/dev/peps/pep-0273

.. [#pep278]
   http://www.python.org/dev/peps/pep-0278

.. [#fedora]
   http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools

.. [#debian]
   http://wiki.debian.org/DebianPython/NewPolicy

.. [#prototype]
   http://bitbucket.org/tarek/pep376/

Acknowledgements
================

Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many people at
Pycon and Distutils-SIG.

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: