Source

python-peps / pep-0307.txt

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
PEP: 307
Title: Extensions to the pickle protocol
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum, Tim Peters
Status: Final
Type: Standards Track
Content-Type: text/plain
Created: 31-Jan-2003
Post-History: 7-Feb-2003


Introduction

    Pickling new-style objects in Python 2.2 is done somewhat clumsily
    and causes pickle size to bloat compared to classic class
    instances.  This PEP documents a new pickle protocol in Python 2.3
    that takes care of this and many other pickle issues.

    There are two sides to specifying a new pickle protocol: the byte
    stream constituting pickled data must be specified, and the
    interface between objects and the pickling and unpickling engines
    must be specified.  This PEP focuses on API issues, although it
    may occasionally touch on byte stream format details to motivate a
    choice.  The pickle byte stream format is documented formally by
    the standard library module pickletools.py (already checked into
    CVS for Python 2.3).

    This PEP attempts to fully document the interface between pickled
    objects and the pickling process, highlighting additions by
    specifying "new in this PEP".  (The interface to invoke pickling
    or unpickling is not covered fully, except for the changes to the
    API for specifying the pickling protocol to picklers.)


Motivation

    Pickling new-style objects causes serious pickle bloat.  For
    example,

        class C(object): # Omit "(object)" for classic class
            pass
        x = C()
        x.foo = 42
        print len(pickle.dumps(x, 1))

    The binary pickle for the classic object consumed 33 bytes, and for
    the new-style object 86 bytes.

    The reasons for the bloat are complex, but are mostly caused by
    the fact that new-style objects use __reduce__ in order to be
    picklable at all.  After ample consideration we've concluded that
    the only way to reduce pickle sizes for new-style objects is to
    add new opcodes to the pickle protocol.  The net result is that
    with the new protocol, the pickle size in the above example is 35
    (two extra bytes are used at the start to indicate the protocol
    version, although this isn't strictly necessary).


Protocol versions

    Previously, pickling (but not unpickling) distinguished between
    text mode and binary mode.  By design, binary mode is a
    superset of text mode, and unpicklers don't need to know in
    advance whether an incoming pickle uses text mode or binary mode.
    The virtual machine used for unpickling is the same regardless of
    the mode; certain opcodes simply aren't used in text mode.

    Retroactively, text mode is now called protocol 0, and binary mode
    protocol 1.  The new protocol is called protocol 2.  In the
    tradition of pickling protocols, protocol 2 is a superset of
    protocol 1.  But just so that future pickling protocols aren't
    required to be supersets of the oldest protocols, a new opcode is
    inserted at the start of a protocol 2 pickle indicating that it is
    using protocol 2.  To date, each release of Python has been able to
    read pickles written by all previous releases.  Of course pickles
    written under protocol N can't be read by versions of Python
    earlier than the one that introduced protocol N.

    Several functions, methods and constructors used for pickling used
    to take a positional argument named 'bin' which was a flag,
    defaulting to 0, indicating binary mode.  This argument is renamed
    to 'protocol' and now gives the protocol number, still defaulting
    to 0.

    It so happens that passing 2 for the 'bin' argument in previous
    Python versions had the same effect as passing 1.  Nevertheless, a
    special case is added here:  passing a negative number selects the
    highest protocol version supported by a particular implementation.
    This works in previous Python versions, too, and so can be used to
    select the highest protocol available in a way that's both backward
    and forward compatible.  In addition, a new module constant
    HIGHEST_PROTOCOL is supplied by both pickle and cPickle, equal to
    the highest protocol number the module can read.  This is cleaner
    than passing -1, but cannot be used before Python 2.3.

    The pickle.py module has supported passing the 'bin' value as a
    keyword argument rather than a positional argument.  (This is not
    recommended, since cPickle only accepts positional arguments, but
    it works...)  Passing 'bin' as a keyword argument is deprecated,
    and a PendingDeprecationWarning is issued in this case.  You have
    to invoke the Python interpreter with -Wa or a variation on that
    to see PendingDeprecationWarning messages.  In Python 2.4, the
    warning class may be upgraded to DeprecationWarning.


Security issues

    In previous versions of Python, unpickling would do a "safety
    check" on certain operations, refusing to call functions or
    constructors that weren't marked as "safe for unpickling" by
    either having an attribute __safe_for_unpickling__ set to 1, or by
    being registered in a global registry, copy_reg.safe_constructors.

    This feature gives a false sense of security: nobody has ever done
    the necessary, extensive, code audit to prove that unpickling
    untrusted pickles cannot invoke unwanted code, and in fact bugs in
    the Python 2.2 pickle.py module make it easy to circumvent these
    security measures.

    We firmly believe that, on the Internet, it is better to know that
    you are using an insecure protocol than to trust a protocol to be
    secure whose implementation hasn't been thoroughly checked.  Even
    high quality implementations of widely used protocols are
    routinely found flawed; Python's pickle implementation simply
    cannot make such guarantees without a much larger time investment.
    Therefore, as of Python 2.3, all safety checks on unpickling are
    officially removed, and replaced with this warning:

      *** Do not unpickle data received from an untrusted or
          unauthenticated source ***

    The same warning applies to previous Python versions, despite the
    presence of safety checks there.


Extended __reduce__ API

    There are several APIs that a class can use to control pickling.
    Perhaps the most popular of these are __getstate__ and
    __setstate__; but the most powerful one is __reduce__.  (There's
    also __getinitargs__, and we're adding __getnewargs__ below.)

    There are several ways to provide __reduce__ functionality: a
    class can implement a __reduce__ method or a __reduce_ex__ method
    (see next section), or a reduce function can be declared in
    copy_reg (copy_reg.dispatch_table maps classes to functions).  The
    return values are interpreted exactly the same, though, and we'll
    refer to these collectively as __reduce__.

    IMPORTANT: pickling of classic class instances does not look for a
    __reduce__ or __reduce_ex__ method or a reduce function in the
    copy_reg dispatch table, so that a classic class cannot provide
    __reduce__ functionality in the sense intended here.  A classic
    class must use __getinitargs__ and/or __getstate__ to customize
    pickling.  These are described below.

    __reduce__ must return either a string or a tuple.  If it returns
    a string, this is an object whose state is not to be pickled, but
    instead a reference to an equivalent object referenced by name.
    Surprisingly, the string returned by __reduce__ should be the
    object's local name (relative to its module); the pickle module
    searches the module namespace to determine the object's module.

    The rest of this section is concerned with the tuple returned by
    __reduce__.  It is a variable size tuple, of length 2 through 5.
    The first two items (function and arguments) are required.  The
    remaining items are optional and may be left off from the end;
    giving None for the value of an optional item acts the same as
    leaving it off.  The last two items are new in this PEP.  The items
    are, in order:

    function     Required.
                 A callable object (not necessarily a function) called
                 to create the initial version of the object; state
                 may be added to the object later to fully reconstruct
                 the pickled state.  This function must itself be
                 picklable.  See the section about __newobj__ for a
                 special case (new in this PEP) here.

    arguments    Required.
                 A tuple giving the argument list for the function.
                 As a special case, designed for Zope 2's
                 ExtensionClass, this may be None; in that case,
                 function should be a class or type, and
                 function.__basicnew__() is called to create the
                 initial version of the object.  This exception is
                 deprecated.

    Unpickling invokes function(*arguments) to create an initial object,
    called obj below.  If the remaining items are left off, that's the end
    of unpickling for this object and obj is the result.    Else obj is
    modified at unpickling time by each item specified, as follows.

    state        Optional.
                 Additional state.  If this is not None, the state is
                 pickled, and obj.__setstate__(state) will be called
                 when unpickling.  If no __setstate__ method is
                 defined, a default implementation is provided, which
                 assumes that state is a dictionary mapping instance
                 variable names to their values.  The default
                 implementation calls

                     obj.__dict__.update(state)

                 or, if the update() call fails,

                     for k, v in state.items():
                         setattr(obj, k, v)

    listitems    Optional, and new in this PEP.
                 If this is not None, it should be an iterator (not a
                 sequence!) yielding successive list items.  These list
                 items will be pickled, and appended to the object using
                 either obj.append(item) or obj.extend(list_of_items).
                 This is primarily used for list subclasses, but may
                 be used by other classes as long as they have append()
                 and extend() methods with the appropriate signature.
                 (Whether append() or extend() is used depends on which
                 pickle protocol version is used as well as the number
                 of items to append, so both must be supported.)

    dictitems    Optional, and new in this PEP.
                 If this is not None, it should be an iterator (not a
                 sequence!) yielding successive dictionary items, which
                 should be tuples of the form (key, value).  These items
                 will be pickled, and stored to the object using
                 obj[key] = value.  This is primarily used for dict
                 subclasses, but may be used by other classes as long
                 as they implement __setitem__.

    Note: in Python 2.2 and before, when using cPickle, state would be
    pickled if present even if it is None; the only safe way to avoid
    the __setstate__ call was to return a two-tuple from __reduce__.
    (But pickle.py would not pickle state if it was None.)  In Python
    2.3, __setstate__ will never be called at unpickling time when
    __reduce__ returns a state with value None at pickling time.

    A __reduce__ implementation that needs to work both under Python
    2.2 and under Python 2.3 could check the variable
    pickle.format_version to determine whether to use the listitems
    and dictitems features.  If this value is >= "2.0" then they are
    supported.  If not, any list or dict items should be incorporated
    somehow in the 'state' return value, and the __setstate__ method
    should be prepared to accept list or dict items as part of the
    state (how this is done is up to the application).


The __reduce_ex__ API

    It is sometimes useful to know the protocol version when
    implementing __reduce__.  This can be done by implementing a
    method named __reduce_ex__ instead of __reduce__.  __reduce_ex__,
    when it exists, is called in preference over __reduce__ (you may
    still provide __reduce__ for backwards compatibility).  The
    __reduce_ex__ method will be called with a single integer
    argument, the protocol version.

    The 'object' class implements both __reduce__ and __reduce_ex__;
    however, if a subclass overrides __reduce__ but not __reduce_ex__,
    the __reduce_ex__ implementation detects this and calls
    __reduce__.


Customizing pickling absent a __reduce__ implementation

    If no __reduce__ implementation is available for a particular
    class, there are three cases that need to be considered
    separately, because they are handled differently:

    1. classic class instances, all protocols

    2. new-style class instances, protocols 0 and 1

    3. new-style class instances, protocol 2

    Types implemented in C are considered new-style classes.  However,
    except for the common built-in types, these need to provide a
    __reduce__ implementation in order to be picklable with protocols
    0 or 1.  Protocol 2 supports built-in types providing
    __getnewargs__, __getstate__ and __setstate__ as well.


Case 1: pickling classic class instances

    This case is the same for all protocols, and is unchanged from
    Python 2.1.

    For classic classes, __reduce__ is not used.  Instead, classic
    classes can customize their pickling by providing methods named
    __getstate__, __setstate__ and __getinitargs__.  Absent these, a
    default pickling strategy for classic class instances is
    implemented that works as long as all instance variables are
    picklable.  This default strategy is documented in terms of
    default implementations of __getstate__ and __setstate__.

    The primary ways to customize pickling of classic class instances
    is by specifying __getstate__ and/or __setstate__ methods.  It is
    fine if a class implements one of these but not the other, as long
    as it is compatible with the default version.

    The __getstate__ method

      The __getstate__ method should return a picklable value
      representing the object's state without referencing the object
      itself.  If no __getstate__ method exists, a default
      implementation is used that returns self.__dict__.

    The __setstate__ method

      The __setstate__ method should take one argument; it will be
      called with the value returned by __getstate__ (or its default
      implementation).

      If no __setstate__ method exists, a default implementation is
      provided that assumes the state is a dictionary mapping instance
      variable names to values.  The default implementation tries two
      things:

      - First, it tries to call self.__dict__.update(state).

      - If the update() call fails with a RuntimeError exception, it
        calls setattr(self, key, value) for each (key, value) pair in
        the state dictionary.  This only happens when unpickling in
        restricted execution mode (see the rexec standard library
        module).

    The __getinitargs__ method

      The __setstate__ method (or its default implementation) requires
      that a new object already exists so that its __setstate__ method
      can be called.  The point is to create a new object that isn't
      fully initialized; in particular, the class's __init__ method
      should not be called if possible.

      These are the possibilities:

      - Normally, the following trick is used: create an instance of a
        trivial classic class (one without any methods or instance
        variables) and then use __class__ assignment to change its
        class to the desired class.  This creates an instance of the
        desired class with an empty __dict__ whose __init__ has not
        been called.

      - However, if the class has a method named __getinitargs__, the
        above trick is not used, and a class instance is created by
        using the tuple returned by __getinitargs__ as an argument
        list to the class constructor.  This is done even if
        __getinitargs__ returns an empty tuple -- a __getinitargs__
        method that returns () is not equivalent to not having
        __getinitargs__ at all.  __getinitargs__ *must* return a
        tuple.

      - In restricted execution mode, the trick from the first bullet
        doesn't work; in this case, the class constructor is called
        with an empty argument list if no __getinitargs__ method
        exists.  This means that in order for a classic class to be
        unpicklable in restricted execution mode, it must either
        implement __getinitargs__ or its constructor (i.e., its
        __init__ method) must be callable without arguments.


Case 2: pickling new-style class instances using protocols 0 or 1

    This case is unchanged from Python 2.2.  For better pickling of
    new-style class instances when backwards compatibility is not an
    issue, protocol 2 should be used; see case 3 below.

    New-style classes, whether implemented in C or in Python, inherit
    a default __reduce__ implementation from the universal base class
    'object'.

    This default __reduce__ implementation is not used for those
    built-in types for which the pickle module has built-in support.
    Here's a full list of those types:

    - Concrete built-in types: NoneType, bool, int, float, complex,
      str, unicode, tuple, list, dict.  (Complex is supported by
      virtue of a __reduce__ implementation registered in copy_reg.)
      In Jython, PyStringMap is also included in this list.

    - Classic instances.

    - Classic class objects, Python function objects, built-in
      function and method objects, and new-style type objects (==
      new-style class objects).  These are pickled by name, not by
      value: at unpickling time, a reference to an object with the
      same name (the fully qualified module name plus the variable
      name in that module) is substituted.

    The default __reduce__ implementation will fail at pickling time
    for built-in types not mentioned above, and for new-style classes
    implemented in C:  if they want to be picklable, they must supply
    a custom __reduce__ implementation under protocols 0 and 1.

    For new-style classes implemented in Python, the default
    __reduce__ implementation (copy_reg._reduce) works as follows:

    Let D be the class on the object to be pickled.  First, find the
    nearest base class that is implemented in C (either as a
    built-in type or as a type defined by an extension class).  Call
    this base class B, and the class of the object to be pickled D.
    Unless B is the class 'object', instances of class B must be
    picklable, either by having built-in support (as defined in the
    above three bullet points), or by having a non-default
    __reduce__ implementation.  B must not be the same class as D
    (if it were, it would mean that D is not implemented in Python).

    The callable produced by the default __reduce__ is
    copy_reg._reconstructor, and its arguments tuple is
    (D, B, basestate), where basestate is None if B is the builtin
    object class, and basestate is

        basestate = B(obj)

    if B is not the builtin object class.  This is geared toward
    pickling subclasses of builtin types, where, for example,
    list(some_list_subclass_instance) produces "the list part" of
    the list subclass instance.

    The object is recreated at unpickling time by
    copy_reg._reconstructor, like so:

        obj = B.__new__(D, basestate)
        B.__init__(obj, basestate)

    Objects using the default __reduce__ implementation can customize
    it by defining __getstate__ and/or __setstate__ methods.  These
    work almost the same as described for classic classes above, except
    that if __getstate__ returns an object (of any type) whose value is
    considered false (e.g. None, or a number that is zero, or an empty
    sequence or mapping), this state is not pickled and __setstate__
    will not be called at all.  If __getstate__ exists and returns a
    true value, that value becomes the third element of the tuple
    returned by the default __reduce__, and at unpickling time the
    value is passed to __setstate__.  If __getstate__ does not exist,
    but obj.__dict__ exists, then  obj.__dict__ becomes the third
    element of the tuple returned by  __reduce__, and again at
    unpickling time the value is passed to obj.__setstate__.  The
    default __setstate__ is the same as that for classic classes,
    described above.

    Note that this strategy ignores slots.  Instances of new-style
    classes that have slots but no __getstate__ method cannot be
    pickled by protocols 0 and 1; the code explicitly checks for
    this condition.

    Note that pickling new-style class instances ignores __getinitargs__
    if it exists (and under all protocols).  __getinitargs__ is
    useful only for classic classes.


Case 3: pickling new-style class instances using protocol 2

    Under protocol 2, the default __reduce__ implementation inherited
    from the 'object' base class is *ignored*.  Instead, a different
    default implementation is used, which allows more efficient
    pickling of new-style class instances than possible with protocols
    0 or 1, at the cost of backward incompatibility with Python 2.2
    (meaning no more than that a protocol 2 pickle cannot be unpickled
    before Python 2.3).

    The customization uses three special methods: __getstate__,
    __setstate__ and __getnewargs__ (note that __getinitargs__ is again
    ignored).  It is fine if a class implements one or more but not all
    of these, as long as it is compatible with the default
    implementations.

    The __getstate__ method

      The __getstate__ method should return a picklable value
      representing the object's state without referencing the object
      itself.  If no __getstate__ method exists, a default
      implementation is used which is described below.

      There's a subtle difference between classic and new-style
      classes here: if a classic class's __getstate__ returns None,
      self.__setstate__(None) will be called as part of unpickling.
      But if a new-style class's __getstate__ returns None, its
      __setstate__ won't be called at all as part of unpickling.

      If no __getstate__ method exists, a default state is computed.
      There are several cases:

      - For a new-style class that has no instance __dict__ and no
        __slots__, the default state is None.

      - For a new-style class that has an instance __dict__ and no
        __slots__, the default state is self.__dict__.

      - For a new-style class that has an instance __dict__ and
        __slots__, the default state is a tuple consisting of two
        dictionaries:  self.__dict__, and a dictionary mapping slot
        names to slot values.  Only slots that have a value are
        included in the latter.

      - For a new-style class that has __slots__ and no instance
        __dict__, the default state is a tuple whose first item is
        None and whose second item is a dictionary mapping slot names
        to slot values described in the previous bullet.

    The __setstate__ method

      The __setstate__ method should take one argument; it will be
      called with the value returned by __getstate__ or with the
      default state described above if no __getstate__ method is
      defined.

      If no __setstate__ method exists, a default implementation is
      provided that can handle the state returned by the default
      __getstate__, described above.

    The __getnewargs__ method

      Like for classic classes, the __setstate__ method (or its
      default implementation) requires that a new object already
      exists so that its __setstate__ method can be called.

      In protocol 2, a new pickling opcode is used that causes a new
      object to be created as follows:

        obj = C.__new__(C, *args)

      where C is the class of the pickled object, and args is either
      the empty tuple, or the tuple returned by the __getnewargs__
      method, if defined.  __getnewargs__ must return a tuple.  The
      absence of a __getnewargs__ method is equivalent to the existence
      of one that returns ().


The __newobj__ unpickling function

    When the unpickling function returned by __reduce__ (the first
    item of the returned tuple) has the name __newobj__, something
    special happens for pickle protocol 2.  An unpickling function
    named __newobj__ is assumed to have the following semantics:

      def __newobj__(cls, *args):
          return cls.__new__(cls, *args)

    Pickle protocol 2 special-cases an unpickling function with this
    name, and emits a pickling opcode that, given 'cls' and 'args',
    will return cls.__new__(cls, *args) without also pickling a
    reference to __newobj__ (this is the same pickling opcode used by
    protocol 2 for a new-style class instance when no __reduce__
    implementation exists).  This is the main reason why protocol 2
    pickles are much smaller than classic pickles.  Of course, the
    pickling code cannot verify that a function named __newobj__
    actually has the expected semantics.  If you use an unpickling
    function named __newobj__ that returns something different, you
    deserve what you get.

    It is safe to use this feature under Python 2.2; there's nothing
    in the recommended implementation of __newobj__ that depends on
    Python 2.3.


The extension registry

    Protocol 2 supports a new mechanism to reduce the size of pickles.

    When class instances (classic or new-style) are pickled, the full
    name of the class (module name including package name, and class
    name) is included in the pickle.  Especially for applications that
    generate many small pickles, this is a lot of overhead that has to
    be repeated in each pickle.  For large pickles, when using
    protocol 1, repeated references to the same class name are
    compressed using the "memo" feature; but each class name must be
    spelled in full at least once per pickle, and this causes a lot of
    overhead for small pickles.

    The extension registry allows one to represent the most frequently
    used names by small integers, which are pickled very efficiently:
    an extension code in the range 1-255 requires only two bytes
    including the opcode, one in the range 256-65535 requires only
    three bytes including the opcode.

    One of the design goals of the pickle protocol is to make pickles
    "context-free": as long as you have installed the modules
    containing the classes referenced by a pickle, you can unpickle
    it, without needing to import any of those classes ahead of time.

    Unbridled use of extension codes could jeopardize this desirable
    property of pickles.  Therefore, the main use of extension codes
    is reserved for a set of codes to be standardized by some
    standard-setting body.  This being Python, the standard-setting
    body is the PSF.  From time to time, the PSF will decide on a
    table mapping extension codes to class names (or occasionally
    names of other global objects; functions are also eligible).  This
    table will be incorporated in the next Python release(s).

    However, for some applications, like Zope, context-free pickles
    are not a requirement, and waiting for the PSF to standardize
    some codes may not be practical.  Two solutions are offered for
    such applications.

    First, a few ranges of extension codes are reserved for private
    use.  Any application can register codes in these ranges.
    Two applications exchanging pickles using codes in these ranges
    need to have some out-of-band mechanism to agree on the mapping
    between extension codes and names.

    Second, some large Python projects (e.g. Zope) can be assigned a
    range of extension codes outside the "private use" range that they
    can assign as they see fit.

    The extension registry is defined as a mapping between extension
    codes and names.  When an extension code is unpickled, it ends up
    producing an object, but this object is gotten by interpreting the
    name as a module name followed by a class (or function) name.  The
    mapping from names to objects is cached.  It is quite possible
    that certain names cannot be imported; that should not be a
    problem as long as no pickle containing a reference to such names
    has to be unpickled.  (The same issue already exists for direct
    references to such names in pickles that use protocols 0 or 1.)

    Here is the proposed initial assigment of extension code ranges:

      First  Last Count  Purpose

          0     0     1  Reserved -- will never be used
          1   127   127  Reserved for Python standard library
        128   191    64  Reserved for Zope
        192   239    48  Reserved for 3rd parties
        240   255    16  Reserved for private use (will never be assigned)
        256   MAX   MAX  Reserved for future assignment

    MAX stands for 2147483647, or 2**31-1.  This is a hard limitation
    of the protocol as currently defined.

    At the moment, no specific extension codes have been assigned yet.


Extension registry API

    The extension registry is maintained as private global variables
    in the copy_reg module.  The following three functions are defined
    in this module to manipulate the registry:

    add_extension(module, name, code)
        Register an extension code.  The module and name arguments
        must be strings; code must be an int in the inclusive range 1
        through MAX.  This must either register a new (module, name)
        pair to a new code, or be a redundant repeat of a previous
        call that was not canceled by a remove_extension() call; a
        (module, name) pair may not be mapped to more than one code,
        nor may a code be mapped to more than one (module, name)
        pair.  (XXX Aliasing may actually cause a problem for this
        requirement; we'll see as we go.)

    remove_extension(module, name, code)
        Arguments are as for add_extension().  Remove a previously
        registered mapping between (module, name) and code.

    clear_extension_cache()
        The implementation of extension codes may use a cache to speed
        up loading objects that are named frequently.  This cache can
        be emptied (removing references to cached objects) by calling
        this method.

    Note that the API does not enforce the standard range assignments.
    It is up to applications to respect these.


The copy module

    Traditionally, the copy module has supported an extended subset of
    the pickling APIs for customizing the copy() and deepcopy()
    operations.

    In particular, besides checking for a __copy__ or __deepcopy__
    method, copy() and deepcopy() have always looked for __reduce__,
    and for classic classes, have looked for __getinitargs__,
    __getstate__ and __setstate__.

    In Python 2.2, the default __reduce__ inherited from 'object' made
    copying simple new-style classes possible, but slots and various
    other special cases were not covered.

    In Python 2.3, several changes are made to the copy module:

    - __reduce_ex__ is supported (and always called with 2 as the
      protocol version argument).

    - The four- and five-argument return values of __reduce__ are
      supported.

    - Before looking for a __reduce__ method, the
      copy_reg.dispatch_table is consulted, just like for pickling.

    - When the __reduce__ method is inherited from object, it is
      (unconditionally) replaced by a better one that uses the same
      APIs as pickle protocol 2: __getnewargs__, __getstate__, and
      __setstate__, handling list and dict subclasses, and handling
      slots.

    As a consequence of the latter change, certain new-style classes
    that were copyable under Python 2.2 are not copyable under Python
    2.3.  (These classes are also not picklable using pickle protocol
    2.)  A minimal example of such a class:

        class C(object):
            def __new__(cls, a):
                return object.__new__(cls)

    The problem only occurs when __new__ is overridden and has at
    least one mandatory argument in addition to the class argument.

    To fix this, a __getnewargs__ method should be added that returns
    the appropriate argument tuple (excluding the class).


Pickling Python longs

    Pickling and unpickling Python longs takes time quadratic in
    the number of digits, in protocols 0 and 1.  Under protocol 2,
    new opcodes support linear-time pickling and unpickling of longs.


Pickling bools

    Protocol 2 introduces new opcodes for pickling True and False
    directly.  Under protocols 0 and 1, bools are pickled as integers,
    using a trick in the representation of the integer in the pickle
    so that an unpickler can recognize that a bool was intended.  That
    trick consumed 4 bytes per bool pickled.  The new bool opcodes
    consume 1 byte per bool.


Pickling small tuples

    Protocol 2 introduces new opcodes for more-compact pickling of
    tuples of lengths 1, 2 and 3.  Protocol 1 previously introduced
    an opcode for more-compact pickling of empty tuples.


Protocol identification

    Protocol 2 introduces a new opcode, with which all protocol 2
    pickles begin, identifying that the pickle is protocol 2.
    Attempting to unpickle a protocol 2 pickle under older versions
    of Python will therefore raise an "unknown opcode" exception
    immediately.


Pickling of large lists and dicts

    Protocol 1 pickles large lists and dicts "in one piece", which
    minimizes pickle size, but requires that unpickling create a temp
    object as large as the object being unpickled.  Part of the
    protocol 2 changes break large lists and dicts into pieces of no
    more than 1000 elements each, so that unpickling needn't create
    a temp object larger than needed to hold 1000 elements.  This
    isn't part of protocol 2, however:  the opcodes produced are still
    part of protocol 1.  __reduce__ implementations that return the
    optional new listitems or dictitems iterators also benefit from
    this unpickling temp-space optimization.


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: