python-peps / pep-3147.txt

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
PEP: 3147
Title: PYC Repository Directories
Version: $Revision$
Last-Modified: $Date$
Author: Barry Warsaw <barry@python.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 2009-12-16
Python-Version: 3.2
Post-History: 2010-01-30, 2010-02-25, 2010-03-03, 2010-04-12
Resolution: http://mail.python.org/pipermail/python-dev/2010-April/099414.html


Abstract
========

This PEP describes an extension to Python's import mechanism which
improves sharing of Python source code files among multiple installed
different versions of the Python interpreter.  It does this by
allowing more than one byte compilation file (.pyc files) to be
co-located with the Python source file (.py file).  The extension
described here can also be used to support different Python
compilation caches, such as JIT output that may be produced by an
Unladen Swallow [1]_ enabled C Python.


Background
==========

CPython compiles its source code into "byte code", and for performance
reasons, it caches this byte code on the file system whenever the
source file has changes.  This makes loading of Python modules much
faster because the compilation phase can be bypassed.  When your
source file is `foo.py`, CPython caches the byte code in a `foo.pyc`
file right next to the source.

Byte code files contain two 32-bit big-endian numbers followed by the
marshaled [2]_ code object.  The 32-bit numbers represent a magic
number and a timestamp.  The magic number changes whenever Python
changes the byte code format, e.g. by adding new byte codes to its
virtual machine.  This ensures that pyc files built for previous
versions of the VM won't cause problems.  The timestamp is used to
make sure that the pyc file match the py file that was used to create
it.  When either the magic number or timestamp do not match, the py
file is recompiled and a new pyc file is written.

In practice, it is well known that pyc files are not compatible across
Python major releases.  A reading of import.c [3]_ in the Python
source code proves that within recent memory, every new CPython major
release has bumped the pyc magic number.


Rationale
=========

Linux distributions such as Ubuntu [4]_ and Debian [5]_ provide more
than one Python version at the same time to their users.  For example,
Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1,
with Python 2.6 being the default.

This causes a conflict for third party Python source files installed
by the system, because you cannot compile a single Python source file
for more than one Python version at a time.  When Python finds a `pyc`
file with a non-matching magic number, it falls back to the slower
process of recompiling the source.  Thus if your system installed a
`/usr/share/python/foo.py`, two different versions of Python would
fight over the `pyc` file and rewrite it each time the source is
compiled.  (The standard library is unaffected by this, since multiple
versions of the stdlib *are* installed on such distributions..)

Furthermore, in order to ease the burden on operating system packagers
for these distributions, the distribution packages do not contain
Python version numbers [6]_; they are shared across all Python
versions installed on the system.  Putting Python version numbers in
the packages would be a maintenance nightmare, since all the packages
- *and their dependencies* - would have to be updated every time a new
Python release was added or removed from the distribution.  Because of
the sheer number of packages available, this amount of work is
infeasible.

(PEP 384 [7]_ has been proposed to address binary compatibility issues
of third party extension modules across different versions of Python.)

Because these distributions cannot share pyc files, elaborate
mechanisms have been developed to put the resulting pyc files in
non-shared locations while the source code is still shared.  Examples
include the symlink-based Debian regimes python-support [8]_ and
python-central [9]_.  These approaches make for much more complicated,
fragile, inscrutable, and fragmented policies for delivering Python
applications to a wide range of users.  Arguably more users get Python
from their operating system vendor than from upstream tarballs.  Thus,
solving this pyc sharing problem for CPython is a high priority for
such vendors.

This PEP proposes a solution to this problem.


Proposal
========

Python's import machinery is extended to write and search for byte
code cache files in a single directory inside every Python package
directory.  This directory will be called `__pycache__`.

Further, pyc file names will contain a magic string (called a "tag")
that differentiates the Python version they were compiled for.  This
allows multiple byte compiled cache files to co-exist for a single
Python source file.

The magic tag is implementation defined, but should contain the
implementation name and a version number shorthand, e.g. `cpython-32`.
It must be unique among all versions of Python, and whenever the magic
number is bumped, a new magic tag must be defined.  An example `pyc`
file for Python 3.2 is thus `foo.cpython-32.pyc`.

The magic tag is available in the `imp` module via the `get_tag()`
function.  This is parallel to the `imp.get_magic()` function.

This scheme has the added benefit of reducing the clutter in a Python
package directory.

When a Python source file is imported for the first time, a
`__pycache__` directory will be created in the package directory, if
one does not already exist.  The pyc file for the imported source will
be written to the `__pycache__` directory, using the magic-tag
formatted name.  If either the creation of the `__pycache__` directory
or the pyc file inside that fails, the import will still succeed, just
as it does in a pre-PEP-3147 world.

If the py source file is missing, the pyc file inside `__pycache__`
will be ignored.  This eliminates the problem of accidental stale pyc
file imports.

For backward compatibility, Python will still support pyc-only
distributions, however it will only do so when the pyc file lives in
the directory where the py file *would* have been, i.e. not in the
`__pycache__` directory.  pyc file outside of `__pycache__` will only
be imported if the py source file is missing.

Tools such as `py_compile` [15]_ and `compileall` [16]_ will be
extended to create PEP 3147 formatted layouts automatically, but will
have an option to create pyc-only distribution layouts.


Examples
---------

What would this look like in practice?

Let's say we have a Python package named `alpha` which contains a
sub-package name `beta`.  The source directory layout before byte
compilation might look like this::

    alpha/
        __init__.py
        one.py
        two.py
        beta/
            __init__.py
            three.py
            four.py

After byte compiling this package with Python 3.2, you would see the
following layout::

    alpha/
        __pycache__/
            __init__.cpython-32.pyc
            one.cpython-32.pyc
            two.cpython-32.pyc
        __init__.py
        one.py
        two.py
        beta/
            __pycache__/
                __init__.cpython-32.pyc
                three.cpython-32.pyc
                four.cpython-32.pyc
            __init__.py
            three.py
            four.py

*Note: listing order may differ depending on the platform.*

Let's say that two new versions of Python are installed, one is Python
3.3 and another is Unladen Swallow.  After byte compilation, the file
system would look like this::

    alpha/
        __pycache__/
            __init__.cpython-32.pyc
            __init__.cpython-33.pyc
            __init__.unladen-10.pyc
            one.cpython-32.pyc
            one.cpython-33.pyc
            one.unladen-10.pyc
            two.cpython-32.pyc
            two.cpython-33.pyc
            two.unladen-10.pyc
        __init__.py
        one.py
        two.py
        beta/
            __pycache__/
                __init__.cpython-32.pyc
                __init__.cpython-33.pyc
                __init__.unladen-10.pyc
                three.cpython-32.pyc
                three.cpython-33.pyc
                three.unladen-10.pyc
                four.cpython-32.pyc
                four.cpython-33.pyc
                four.unladen-10.pyc
            __init__.py
            three.py
            four.py

As you can see, as long as the Python version identifier string is
unique, any number of pyc files can co-exist.  These identifier
strings are described in more detail below.

A nice property of this layout is that the `__pycache__` directories
can generally be ignored, such that a normal directory listing would
show something like this::

    alpha/
        __pycache__/
        __init__.py
        one.py
        two.py
        beta/
            __pycache__/
            __init__.py
            three.py
            four.py

This is much less cluttered than even today's Python.


Python behavior
===============

When Python searches for a module to import (say `foo`), it may find
one of several situations.  As per current Python rules, the term
"matching pyc" means that the magic number matches the current
interpreter's magic number, and the source file's timestamp matches
the timestamp in the `pyc` file exactly.


Case 0: The steady state
------------------------

When Python is asked to import module `foo`, it searches for a
`foo.py` file (or `foo` package, but that's not important for this
discussion) along its `sys.path`.  If found, Python looks to see if
there is a matching `__pycache__/foo.<magic>.pyc` file, and if so,
that `pyc` file is loaded.


Case 1: The first import
------------------------

When Python locates the `foo.py`, if the `__pycache__/foo.<magic>.pyc`
file is missing, Python will create it, also creating the
`__pycache__` directory if necessary.  Python will parse and byte
compile the `foo.py` file and save the byte code in
`__pycache__/foo.<magic>.pyc`.


Case 2: The second import
-------------------------

When Python is asked to import module `foo` a second time (in a
different process of course), it will again search for the `foo.py`
file along its `sys.path`.  When Python locates the `foo.py` file, it
looks for a matching `__pycache__/foo.<magic>.pyc` and finding this,
it reads the byte code and continues as usual.


Case 3: __pycache__/foo.<magic>.pyc with no source
---------------------------------------------------

It's possible that the `foo.py` file somehow got removed, while
leaving the cached pyc file still on the file system.  If the
`__pycache__/foo.<magic>.pyc` file exists, but the `foo.py` file used
to create it does not, Python will raise an `ImportError` when asked
to import foo.  In other words, Python will not import a pyc file from
the cache directory unless the source file exists.


Case 4: legacy pyc files and source-less imports
------------------------------------------------

Python will ignore all legacy pyc files when a source file exists next
to it.  In other words, if a `foo.pyc` file exists next to the
`foo.py` file, the pyc file will be ignored in all cases

In order to continue to support source-less distributions though, if
the source file is missing, Python will import a lone pyc file if it
lives where the source file would have been.


Case 5: read-only file systems
------------------------------

When the source lives on a read-only file system, or the `__pycache__`
directory or pyc file cannot otherwise be written, all the same rules
apply.  This is also the case when `__pycache__` happens to be written
with permissions which do not allow for writing containing pyc files.



Flow chart
==========

Here is a flow chart describing how modules are loaded:

.. image:: pep-3147-1.png
   :scale: 75


Alternative Python implementations
==================================

Alternative Python implementations such as Jython [11]_, IronPython
[12]_, PyPy [13]_, Pynie [14]_, and Unladen Swallow can also use the
`__pycache__` directory to store whatever compilation artifacts make
sense for their platforms.  For example, Jython could store the class
file for the module in `__pycache__/foo.jython-32.class`.


Implementation strategy
=======================

This feature is targeted for Python 3.2, solving the problem for those
and all future versions.  It may be back-ported to Python 2.7.
Vendors are free to backport the changes to earlier distributions as
they see fit.  For backports of this feature to Python 2, when the
`-U` flag is used, a file such as `foo.cpython-27u.pyc` can be
written.


Effects on existing code
========================

Adoption of this PEP will affect existing code and idioms, both inside
Python and outside.  This section enumerates some of these effects.


Detecting PEP 3147 availability
-------------------------------

The easiest way to detect whether your version of Python provides PEP
3147 functionality is to do the following check::

    >>> import imp
    >>> has3147 = hasattr(imp, 'get_tag')


__file__
---------

In Python 3, when you import a module, its `__file__` attribute points
to its source `py` file (in Python 2, it points to the `pyc` file).  A
package's `__file__` points to the `py` file for its `__init__.py`.
E.g.::

    >>> import foo
    >>> foo.__file__
    'foo.py'
    # baz is a package
    >>> import baz
    >>> baz.__file__
    'baz/__init__.py'

Nothing in this PEP would change the semantics of `__file__`.

This PEP proposes the addition of an `__cached__` attribute to
modules, which will always point to the actual `pyc` file that was
read or written.  When the environment variable
`$PYTHONDONTWRITEBYTECODE` is set, or the `-B` option is given, or if
the source lives on a read-only filesystem, then the `__cached__`
attribute will point to the location that the `pyc` file *would* have
been written to if it didn't exist.  This location of course includes
the `__pycache__` subdirectory in its path.

For alternative Python implementations which do not support `pyc`
files, the `__cached__` attribute may point to whatever information
makes sense.  E.g. on Jython, this might be the `.class` file for the
module: `__pycache__/foo.jython-32.class`.  Some implementations may
use multiple compiled files to create the module, in which case
`__cached__` may be a tuple.  The exact contents of `__cached__` are
Python implementation specific.

It is recommended that when nothing sensible can be calculated,
implementations should set the `__cached__` attribute to `None`.


py_compile and compileall
-------------------------

Python comes with two modules, `py_compile` [15]_ and `compileall`
[16]_ which support compiling Python modules external to the built-in
import machinery.  `py_compile` in particular has intimate knowledge
of byte compilation, so these will be updated to understand the new
layout.  The `-b` flag is added to `compileall` for writing legacy
`.pyc` byte-compiled file path names.


bdist_wininst and the Windows installer
---------------------------------------

These tools also compile modules explicitly on installation.  If they
do not use `py_compile` and `compileall`, then they would also have to
be modified to understand the new layout.


File extension checks
---------------------

There exists some code which checks for files ending in `.pyc` and
simply chops off the last character to find the matching `.py` file.
This code will obviously fail once this PEP is implemented.

To support this use case, we'll add two new methods to the `imp`
package [17]_:

 * `imp.cache_from_source(py_path)` -> `pyc_path`
 * `imp.source_from_cache(pyc_path)` -> `py_path`

Alternative implementations are free to override these functions to
return reasonable values based on their own support for this PEP.
These methods are allowed to return `None` when the implementation (or
PEP 302 loader [18]_ in effect) for whatever reason cannot calculate
the appropriate file name.  They should not raise exceptions.


Backports
---------

For versions of Python earlier than 3.2 (and possibly 2.7), it is
possible to backport this PEP.  However, in Python 3.2 (and possibly
2.7), this behavior will be turned on by default, and in fact, it will
replace the old behavior.  Backports will need to support the old
layout by default.  We suggest supporting PEP 3147 through the use of
an environment variable called `$PYTHONENABLECACHEDIR` or the command
line switch `-Xenablecachedir` to enable the feature.


Makefiles and other dependency tools
------------------------------------

Makefiles and other tools which calculate dependencies on `.pyc` files
(e.g. to byte-compile the source if the `.pyc` is missing) will have
to be updated to check the new paths.


Alternatives
============

This section describes some alternative approaches or details that
were considered and rejected during the PEP's development.


Hexadecimal magic tags
----------------------

pyc files inside of the `__pycache__` directories contain a magic tag
in their file names.  These are mnemonic tags for the actual magic
numbers used by the importer.  We could have used the hexadecimal
representation [10]_ of the binary magic number as a unique
identifier.  For example, in Python 3.2::

    >>> from binascii import hexlify
    >>> from imp import get_magic
    >>> 'foo.{}.pyc'.format(hexlify(get_magic()).decode('ascii'))
    'foo.580c0d0a.pyc'

This isn't particularly human friendly though, thus the magic tag
proposed in this PEP.


PEP 304
-------

There is some overlap between the goals of this PEP and PEP 304 [19]_,
which has been withdrawn.  However PEP 304 would allow a user to
create a shadow file system hierarchy in which to store `pyc` files.
This concept of a shadow hierarchy for `pyc` files could be used to
satisfy the aims of this PEP.  Although the PEP 304 does not indicate
why it was withdrawn, shadow directories have a number of problems.
The location of the shadow `pyc` files would not be easily discovered
and would depend on the proper and consistent use of the
`$PYTHONBYTECODE` environment variable both by the system and by end
users.  There are also global implications, meaning that while the
system might want to shadow `pyc` files, users might not want to, but
the PEP defines only an all-or-nothing approach.

As an example of the problem, a common (though fragile) Python idiom
for locating data files is to do something like this::

    from os import dirname, join
    import foo.bar
    data_file = join(dirname(foo.bar.__file__), 'my.dat')

This would be problematic since `foo.bar.__file__` will give the
location of the `pyc` file in the shadow directory, and it may not be
possible to find the `my.dat` file relative to the source directory
from there.


Fat byte compilation files
--------------------------

An earlier version of this PEP described "fat" Python byte code files.
These files would contain the equivalent of multiple `pyc` files in a
single `pyf` file, with a lookup table keyed off the appropriate magic
number.  This was an extensible file format so that the first 5
parallel Python implementations could be supported fairly efficiently,
but with extension lookup tables available to scale `pyf` byte code
objects as large as necessary.

The fat byte compilation files were fairly complex, and inherently
introduced difficult race conditions, so the current simplification of
using directories was suggested.  The same problem applies to using
zip files as the fat pyc file format.


Multiple file extensions
------------------------

The PEP author also considered an approach where multiple thin byte
compiled files lived in the same place, but used different file
extensions to designate the Python version.  E.g. foo.pyc25,
foo.pyc26, foo.pyc31 etc.  This was rejected because of the clutter
involved in writing so many different files.  The multiple extension
approach makes it more difficult (and an ongoing task) to update any
tools that are dependent on the file extension.


.pyc
----

A proposal was floated to call the `__pycache__` directory `.pyc` or
some other dot-file name.  This would have the effect on \*nix systems
of hiding the directory.  There are many reasons why this was
rejected by the BDFL [20]_ including the fact that dot-files are only
special on some platforms, and we actually do *not* want to hide these
completely from users.


Reference implementation
========================

Work on this code is tracked in a Bazaar branch on Launchpad [22]_
until it's ready for merge into Python 3.2.  The work-in-progress diff
can also be viewed [23]_ and is updated automatically as new changes
are uploaded.

A Rietveld code review issue [24]_ has been opened as of 2010-04-01 (no,
this is not an April Fools joke :).


References
==========

.. [1] PEP 3146

.. [2] The marshal module:
   http://www.python.org/doc/current/library/marshal.html

.. [3] import.c:
   http://svn.python.org/view/python/branches/py3k/Python/import.c?view=markup

.. [4] Ubuntu: <http://www.ubuntu.com>

.. [5] Debian: <http://www.debian.org>

.. [6] Debian Python Policy:
   http://www.debian.org/doc/packaging-manuals/python-policy/

.. [7] PEP 384

.. [8] python-support:
   http://wiki.debian.org/DebianPythonFAQ#Whatispython-support.3F

.. [9] python-central:
   http://wiki.debian.org/DebianPythonFAQ#Whatispython-central.3F

.. [10] binascii.hexlify():
   http://www.python.org/doc/current/library/binascii.html#binascii.hexlify

.. [11] Jython: http://www.jython.org/

.. [12] IronPython: http://ironpython.net/

.. [13] PyPy: http://codespeak.net/pypy/dist/pypy/doc/

.. [14] Pynie: http://code.google.com/p/pynie/

.. [15] py_compile: http://docs.python.org/library/py_compile.html

.. [16] compileall: http://docs.python.org/library/compileall.html

.. [17] imp: http://www.python.org/doc/current/library/imp.html

.. [18] PEP 302

.. [19] PEP 304

.. [20] http://www.mail-archive.com/python-dev@python.org/msg45203.html

.. [21] importlib: http://docs.python.org/3.1/library/importlib.html

.. [22] https://code.launchpad.net/~barry/python/pep3147

.. [23] https://code.launchpad.net/~barry/python/pep3147/+merge/22648

.. [24] http://codereview.appspot.com/842043/show


ACKNOWLEDGMENTS
===============

Barry Warsaw's original idea was for fat Python byte code files.
Martin von Loewis reviewed an early draft of the PEP and suggested the
simplification to store traditional `pyc` and `pyo` files in a
directory.  Many other people reviewed early versions of this PEP and
provided useful feedback including but not limited to:

* David Malcolm
* Josselin Mouette
* Matthias Klose
* Michael Hudson
* Michael Vogt
* Piotr Ożarowski
* Scott Kitterman
* Toshio Kuratomi


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.