Source

python-peps / pep-3118.txt

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
PEP: 3118
Title: Revising the buffer protocol
Version: $Revision$
Last-Modified: $Date$
Author: Travis Oliphant <oliphant@ee.byu.edu>, Carl Banks <pythondev@aerojockey.com>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Aug-2006
Python-Version: 3000
Post-History:

Abstract
========

This PEP proposes re-designing the buffer interface (``PyBufferProcs``
function pointers) to improve the way Python allows memory sharing in
Python 3.0

In particular, it is proposed that the character buffer portion 
of the API be eliminated and the multiple-segment portion be 
re-designed in conjunction with allowing for strided memory
to be shared.   In addition, the new buffer interface will 
allow the sharing of any multi-dimensional nature of the
memory and what data-format the memory contains. 

This interface will allow any extension module to either 
create objects that share memory or create algorithms that
use and manipulate raw memory from arbitrary objects that 
export the interface. 


Rationale
=========

The Python 2.X buffer protocol allows different Python types to
exchange a pointer to a sequence of internal buffers.  This
functionality is *extremely* useful for sharing large segments of
memory between different high-level objects, but it is too limited and
has issues:

1. There is the little used "sequence-of-segments" option
   (bf_getsegcount) that is not well motivated. 

2. There is the apparently redundant character-buffer option
   (bf_getcharbuffer)

3. There is no way for a consumer to tell the buffer-API-exporting
   object it is "finished" with its view of the memory and
   therefore no way for the exporting object to be sure that it is
   safe to reallocate the pointer to the memory that it owns (for
   example, the array object reallocating its memory after sharing
   it with the buffer object which held the original pointer led
   to the infamous buffer-object problem).

4. Memory is just a pointer with a length. There is no way to
   describe what is "in" the memory (float, int, C-structure, etc.)

5. There is no shape information provided for the memory.  But,
   several array-like Python types could make use of a standard
   way to describe the shape-interpretation of the memory
   (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
   Libraries, ctypes, NumPy, data-base interfaces, etc.)

6. There is no way to share discontiguous memory (except through
   the sequence of segments notion).  

   There are two widely used libraries that use the concept of
   discontiguous memory: PIL and NumPy.  Their view of discontiguous
   arrays is different, though.  The proposed buffer interface allows
   sharing of either memory model.  Exporters will typically use only one        
   approach and consumers may choose to support discontiguous 
   arrays of each type however they choose. 

   NumPy uses the notion of constant striding in each dimension as its
   basic concept of an array. With this concept, a simple sub-region
   of a larger array can be described without copying the data.   
   Thus, stride information is the additional information that must be
   shared. 

   The PIL uses a more opaque memory representation. Sometimes an
   image is contained in a contiguous segment of memory, but sometimes
   it is contained in an array of pointers to the contiguous segments
   (usually lines) of the image.  The PIL is where the idea of multiple
   buffer segments in the original buffer interface came from.   

   NumPy's strided memory model is used more often in computational
   libraries and because it is so simple it makes sense to support
   memory sharing using this model.  The PIL memory model is sometimes 
   used in C-code where a 2-d array can then be accessed using double
   pointer indirection:  e.g. ``image[i][j]``.

   The buffer interface should allow the object to export either of these
   memory models.  Consumers are free to either require contiguous memory
   or write code to handle one or both of these memory models. 

Proposal Overview
=================

* Eliminate the char-buffer and multiple-segment sections of the
  buffer-protocol.

* Unify the read/write versions of getting the buffer.

* Add a new function to the interface that should be called when
  the consumer object is "done" with the memory area.  

* Add a new variable to allow the interface to describe what is in
  memory (unifying what is currently done now in struct and
  array)

* Add a new variable to allow the protocol to share shape information

* Add a new variable for sharing stride information

* Add a new mechanism for sharing arrays that must 
  be accessed using pointer indirection. 

* Fix all objects in the core and the standard library to conform
  to the new interface

* Extend the struct module to handle more format specifiers

* Extend the buffer object into a new memory object which places
  a Python veneer around the buffer interface. 

* Add a few functions to make it easy to copy contiguous data
  in and out of object supporting the buffer interface. 

Specification
=============

While the new specification allows for complicated memory sharing,
simple contiguous buffers of bytes can still be obtained from an
object.  In fact, the new protocol allows a standard mechanism for
doing this even if the original object is not represented as a
contiguous chunk of memory.

The easiest way to obtain a simple contiguous chunk of memory is
to use the provided C-API to obtain a chunk of memory.


Change the ``PyBufferProcs`` structure to ::

    typedef struct {
         getbufferproc bf_getbuffer;
         releasebufferproc bf_releasebuffer;
    } PyBufferProcs;

Both of these routines are optional for a type object

::

    typedef int (*getbufferproc)(PyObject *obj, PyBuffer *view, int flags)

This function returns ``0`` on success and ``-1`` on failure (and raises an
error). The first variable is the "exporting" object.  The second
argument is the address to a bufferinfo structure.  Both arguments must
never be NULL.

The third argument indicates what kind of buffer the consumer is
prepared to deal with and therefore what kind of buffer the exporter
is allowed to return.  The new buffer interface allows for much more
complicated memory sharing possibilities.  Some consumers may not be
able to handle all the complexibity but may want to see if the
exporter will let them take a simpler view to its memory.  

In addition, some exporters may not be able to share memory in every
possible way and may need to raise errors to signal to some consumers
that something is just not possible.  These errors should be
``PyErr_BufferError`` unless there is another error that is actually
causing the problem. The exporter can use flags information to
simplify how much of the PyBuffer structure is filled in with
non-default values and/or raise an error if the object can't support a
simpler view of its memory.

The exporter should always fill in all elements of the buffer
structure (with defaults or NULLs if nothing else is requested). The
PyBuffer_FillInfo function can be used for simple cases.


Access flags
------------

Some flags are useful for requesting a specific kind of memory
segment, while others indicate to the exporter what kind of
information the consumer can deal with.  If certain information is not
asked for by the consumer, but the exporter cannot share its memory
without that information, then a ``PyErr_BufferError`` should be raised.

``PyBUF_SIMPLE``

   This is the default flag state (0). The returned buffer may or may
   not have writable memory.  The format will be assumed to be
   unsigned bytes .  This is a "stand-alone" flag constant.  It never
   needs to be \|'d to the others.  The exporter will raise an error if
   it cannot provide such a contiguous buffer of bytes.

``PyBUF_WRITABLE``

   The returned buffer must be writable.  If it is not writable,
   then raise an error.

``PyBUF_FORMAT``
        
   The returned buffer must have true format information if this flag
   is provided.  This would be used when the consumer is going to be
   checking for what 'kind' of data is actually stored.  An exporter
   should always be able to provide this information if requested.  If
   format is not explicitly requested then the format must be returned
   as ``NULL`` (which means "B", or unsigned bytes)

``PyBUF_ND``

   The returned buffer must provide shape information. The memory will
   be assumed C-style contiguous (last dimension varies the fastest).
   The exporter may raise an error if it cannot provide this kind of
   contiguous buffer.  If this is not given then shape will be NULL.
   
``PyBUF_STRIDES`` (implies ``PyBUF_ND``)

   The returned buffer must provide strides information (i.e. the
   strides cannot be NULL).  This would be used when the consumer can
   handle strided, discontiguous arrays. Handling strides
   automatically assumes you can handle shape. The exporter may raise
   an error if cannot provide a strided-only representation of the
   data (i.e. without the suboffsets).

| ``PyBUF_C_CONTIGUOUS``
| ``PyBUF_F_CONTIGUOUS``
| ``PyBUF_ANY_CONTIGUOUS``

   These flags indicate that the returned buffer must be respectively,
   C-contiguous (last dimension varies the fastest), Fortran
   contiguous (first dimension varies the fastest) or either one.  
   All of these flags imply PyBUF_STRIDES and guarantee that the
   strides buffer info structure will be filled in correctly. 

``PyBUF_INDIRECT`` (implies ``PyBUF_STRIDES``)

   The returned buffer must have suboffsets information (which can be
   NULL if no suboffsets are needed).  This would be used when the
   consumer can handle indirect array referencing implied by these
   suboffsets.


Specialized combinations of flags for specific kinds of memory_sharing. 

  Multi-dimensional (but contiguous)

   | ``PyBUF_CONTIG`` (``PyBUF_ND | PyBUF_WRITABLE``)
   | ``PyBUF_CONTIG_RO`` (``PyBUF_ND``)

  Multi-dimensional using strides but aligned

   | ``PyBUF_STRIDED`` (``PyBUF_STRIDES | PyBUF_WRITABLE``)
   | ``PyBUF_STRIDED_RO`` (``PyBUF_STRIDES``)

  Multi-dimensional using strides and not necessarily aligned

   | ``PyBUF_RECORDS`` (``PyBUF_STRIDES | PyBUF_WRITABLE | PyBUF_FORMAT``)
   | ``PyBUF_RECORDS_RO`` (``PyBUF_STRIDES | PyBUF_FORMAT``)

  Multi-dimensional using sub-offsets

   | ``PyBUF_FULL`` (``PyBUF_INDIRECT | PyBUF_WRITABLE | PyBUF_FORMAT``)
   | ``PyBUF_FULL_RO`` (``PyBUF_INDIRECT | PyBUF_FORMAT``)

Thus, the consumer simply wanting a contiguous chunk of bytes from
the object would use ``PyBUF_SIMPLE``, while a consumer that understands
how to make use of the most complicated cases could use ``PyBUF_FULL``.

The format information is only guaranteed to be non-NULL if
``PyBUF_FORMAT`` is in the flag argument, otherwise it is expected the
consumer will assume unsigned bytes.

There is a C-API that simple exporting objects can use to fill-in the
buffer info structure correctly according to the provided flags if a
contiguous chunk of "unsigned bytes" is all that can be exported.


The Py_buffer struct
--------------------

The bufferinfo structure is::

  struct bufferinfo {
       void *buf;
       Py_ssize_t len;
       int readonly;
       const char *format;
       int ndim;
       Py_ssize_t *shape;
       Py_ssize_t *strides;
       Py_ssize_t *suboffsets;
       Py_ssize_t itemsize;
       void *internal;
  } Py_buffer;

Before calling the bf_getbuffer function, the bufferinfo structure can
be filled with whatever, but the ``buf`` field must be NULL when
requesting a new buffer.  Upon return from bf_getbuffer, the
bufferinfo structure is filled in with relevant information about the
buffer.  This same bufferinfo structure must be passed to
bf_releasebuffer (if available) when the consumer is done with the
memory. The caller is responsible for keeping a reference to obj until
releasebuffer is called (i.e. the call to bf_getbuffer does not alter
the reference count of obj).

The members of the bufferinfo structure are:

``buf``
    a pointer to the start of the memory for the object

``len``
    the total bytes of memory the object uses.  This should be the
    same as the product of the shape array multiplied by the number of
    bytes per item of memory.

``readonly``
    an integer variable to hold whether or not the memory is readonly.
    1 means the memory is readonly, zero means the memory is writable.

``format``
    a NULL-terminated format-string (following the struct-style syntax
    including extensions) indicating what is in each element of
    memory.  The number of elements is len / itemsize, where itemsize
    is the number of bytes implied by the format.  This can be NULL which
    implies standard unsigned bytes ("B").

``ndim``
    a variable storing the number of dimensions the memory represents.
    Must be >=0.  A value of 0 means that shape and strides and suboffsets
    must be ``NULL`` (i.e. the memory represents a scalar). 

``shape``
    an array of ``Py_ssize_t`` of length ``ndims`` indicating the
    shape of the memory as an N-D array.  Note that ``((*shape)[0] *
    ... * (*shape)[ndims-1])*itemsize = len``.  If ndims is 0 (indicating
    a scalar), then this must be ``NULL``.

``strides``
    address of a ``Py_ssize_t*`` variable that will be filled with a
    pointer to an array of ``Py_ssize_t`` of length ``ndims`` (or ``NULL``
    if ``ndims`` is 0).  indicating the number of bytes to skip to get to
    the next element in each dimension.  If this is not requested by
    the caller (``PyBUF_STRIDES`` is not set), then this should be set
    to NULL which indicates a C-style contiguous array or a 
    PyExc_BufferError raised if this is not possible. 

``suboffsets``
    address of a ``Py_ssize_t *`` variable that will be filled with a
    pointer to an array of ``Py_ssize_t`` of length ``*ndims``.  If
    these suboffset numbers are >=0, then the value stored along the
    indicated dimension is a pointer and the suboffset value dictates
    how many bytes to add to the pointer after de-referencing.  A
    suboffset value that it negative indicates that no de-referencing
    should occur (striding in a contiguous memory block).  If all 
    suboffsets are negative (i.e. no de-referencing is needed, then
    this must be NULL (the default value).  If this is not requested
    by the caller (PyBUF_INDIRECT is not set), then this should be 
    set to NULL or an PyExc_BufferError raised if this is not possible. 

    For clarity, here is a function that returns a pointer to the
    element in an N-D array pointed to by an N-dimesional index when
    there are both non-NULL strides and suboffsets::

      void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
                             Py_ssize_t *suboffsets, Py_ssize_t *indices) {
          char *pointer = (char*)buf;
          int i;
          for (i = 0; i < ndim; i++) {
              pointer += strides[i] * indices[i];
              if (suboffsets[i] >=0 ) {
                  pointer = *((char**)pointer) + suboffsets[i];
              }
          }
          return (void*)pointer;
      }

    Notice the suboffset is added "after" the dereferencing occurs.
    Thus slicing in the ith dimension would add to the suboffsets in
    the (i-1)st dimension.  Slicing in the first dimension would change
    the location of the starting pointer directly (i.e. buf would
    be modified).  

``itemsize``
    This is a storage for the itemsize (in bytes) of each element of the shared
    memory.  It is technically un-necessary as it can be obtained using
    ``PyBuffer_SizeFromFormat``, however an exporter may know this
    information without parsing the format string and it is necessary
    to know the itemsize for proper interpretation of striding.
    Therefore, storing it is more convenient and faster.

``internal``
    This is for use internally by the exporting object.  For example,
    this might be re-cast as an integer by the exporter and used to 
    store flags about whether or not the shape, strides, and suboffsets
    arrays must be freed when the buffer is released.   The consumer
    should never alter this value. 
    

The exporter is responsible for making sure that any memory pointed to
by buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called.  If the exporter wants to be able to change
an object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called.


Releasing the buffer
--------------------

The same bufferinfo struct should be used in the release-buffer
interface call.  The caller is responsible for the memory of the
Py_buffer structure itself.

::

    typedef void (*releasebufferproc)(PyObject *obj, Py_buffer *view)

Callers of getbufferproc must make sure that this function is called
when memory previously acquired from the object is no longer needed.
The exporter of the interface must make sure that any memory pointed
to in the bufferinfo structure remains valid until releasebuffer is
called.

If the bf_releasebuffer function is not provided (i.e. it is NULL),
then it does not ever need to be called.
    
Exporters will need to define a bf_releasebuffer function if they can
re-allocate their memory, strides, shape, suboffsets, or format
variables which they might share through the struct bufferinfo.
Several mechanisms could be used to keep track of how many getbuffer
calls have been made and shared.  Either a single variable could be
used to keep track of how many "views" have been exported, or a
linked-list of bufferinfo structures filled in could be maintained in
each object.  

All that is specifically required by the exporter, however, is to
ensure that any memory shared through the bufferinfo structure remains
valid until releasebuffer is called on the bufferinfo structure
exporting that memory.


New C-API calls are proposed
============================

::

    int PyObject_CheckBuffer(PyObject *obj)

Return 1 if the getbuffer function is available otherwise 0.

::

    int PyObject_GetBuffer(PyObject *obj, Py_buffer *view, 
                           int flags)  

This is a C-API version of the getbuffer function call.  It checks to
make sure object has the required function pointer and issues the
call.  Returns -1 and raises an error on failure and returns 0 on
success.

::

    void PyBuffer_Release(PyObject *obj, Py_buffer *view)

This is a C-API version of the releasebuffer function call.  It checks
to make sure the object has the required function pointer and issues
the call.  This function always succeeds even if there is no releasebuffer
function for the object.

::

    PyObject *PyObject_GetMemoryView(PyObject *obj)

Return a memory-view object from an object that defines the buffer interface.

A memory-view object is an extended buffer object that could replace
the buffer object (but doesn't have to as that could be kept as a
simple 1-d memory-view object).  Its C-structure is ::

  typedef struct {
      PyObject_HEAD
      PyObject *base;
      Py_buffer view;
  } PyMemoryViewObject;

This is functionally similar to the current buffer object except a
reference to base is kept and the memory view is not re-grabbed.
Thus, this memory view object holds on to the memory of base until it
is deleted.

This memory-view object will support multi-dimensional slicing and be
the first object provided with Python to do so.  Slices of the
memory-view object are other memory-view objects with the same base
but with a different view of the base object.

When an "element" from the memory-view is returned it is always a
bytes object whose format should be interpreted by the format
attribute of the memoryview object.  The struct module can be used to
"decode" the bytes in Python if desired.  Or the contents can be
passed to a NumPy array or other object consuming the buffer protocol.

The Python name will be 

``__builtin__.memoryview``

Methods:

|  ``__getitem__``  (will support multi-dimensional slicing)
|  ``__setitem__``  (will support multi-dimensional slicing)  
|  ``tobytes``      (obtain a new bytes-object of a copy of the memory).
|  ``tolist``       (obtain a "nested" list of the memory.  Everything
                    is interpreted into standard Python objects
                    as the struct module unpack would do -- in fact
                    it uses struct.unpack to accomplish it). 

Attributes (taken from the memory of the base object):

* ``format``
* ``itemsize``
* ``shape``
* ``strides``
* ``suboffsets``
* ``readonly``
* ``ndim``


::

    Py_ssize_t PyBuffer_SizeFromFormat(const char *)

Return the implied itemsize of the data-format area from a struct-style
description.

::

    PyObject * PyMemoryView_GetContiguous(PyObject *obj,  int buffertype, 
                                          char fortran)

Return a memoryview object to a contiguous chunk of memory represented
by obj. If a copy must be made (because the memory pointed to by obj
is not contiguous), then a new bytes object will be created and become
the base object for the returned memory view object. 

The buffertype argument can be PyBUF_READ, PyBUF_WRITE,
PyBUF_UPDATEIFCOPY to determine whether the returned buffer should be
readable, writable, or set to update the original buffer if a copy
must be made.  If buffertype is PyBUF_WRITE and the buffer is not
contiguous an error will be raised.  In this circumstance, the user
can use PyBUF_UPDATEIFCOPY to ensure that a a writable temporary
contiguous buffer is returned.  The contents of this contiguous buffer
will be copied back into the original object after the memoryview
object is deleted as long as the original object is writable.  If this
is not allowed by the original object, then a BufferError is raised.

If the object is multi-dimensional, then if fortran is 'F', the first
dimension of the underlying array will vary the fastest in the buffer.
If fortran is 'C', then the last dimension will vary the fastest
(C-style contiguous). If fortran is 'A', then it does not matter and
you will get whatever the object decides is more efficient.  If a copy
is made, then the memory must be freed by calling ``PyMem_Free``.

You receive a new reference to the memoryview object.

:: 

    int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
                              char fortran)

Copy ``len`` bytes of data pointed to by the contiguous chunk of
memory pointed to by ``buf`` into the buffer exported by obj.  Return
0 on success and return -1 and raise an error on failure.  If the
object does not have a writable buffer, then an error is raised.  If
fortran is 'F', then if the object is multi-dimensional, then the data
will be copied into the array in Fortran-style (first dimension varies
the fastest).  If fortran is 'C', then the data will be copied into
the array in C-style (last dimension varies the fastest).  If fortran
is 'A', then it does not matter and the copy will be made in whatever
way is more efficient.

::

     int PyObject_CopyData(PyObject *dest, PyObject *src)

These last three C-API calls allow a standard way of getting data in and
out of Python objects into contiguous memory areas no matter how it is
actually stored.  These calls use the extended buffer interface to perform
their work. 

::

    int PyBuffer_IsContiguous(Py_buffer *view, char fortran)

Return 1 if the memory defined by the view object is C-style (fortran
= 'C') or Fortran-style (fortran = 'F') contiguous or either one
(fortran = 'A').  Return 0 otherwise.

::

    void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape,
                                        Py_ssize_t *strides, Py_ssize_t itemsize,
                                        char fortran)

Fill the strides array with byte-strides of a contiguous (C-style if
fortran is 'C' or Fortran-style if fortran is 'F' array of the given
shape with the given number of bytes per element.

::

    int PyBuffer_FillInfo(Py_buffer *view, void *buf, 
                          Py_ssize_t len, int readonly, int infoflags)

Fills in a buffer-info structure correctly for an exporter that can
only share a contiguous chunk of memory of "unsigned bytes" of the
given length.  Returns 0 on success and -1 (with raising an error) on
error. 

::

    PyExc_BufferError

A new error object for returning buffer errors which arise because an
exporter cannot provide the kind of buffer that a consumer expects.
This will also be raised when a consumer requests a buffer from an
object that does not provide the protocol.


Additions to the struct string-syntax
=====================================

The struct string-syntax is missing some characters to fully
implement data-format descriptions already available elsewhere (in
ctypes and NumPy for example).  The Python 2.5 specification is 
at http://docs.python.org/library/struct.html.

Here are the proposed additions:


================  ===========
Character         Description
================  ===========
't'               bit (number before states how many bits)
'?'               platform _Bool type
'g'               long double  
'c'               ucs-1 (latin-1) encoding 
'u'               ucs-2 
'w'               ucs-4 
'O'               pointer to Python Object 
'Z'               complex (whatever the next specifier is)
'&'               specific pointer (prefix before another character) 
'T{}'             structure (detailed layout inside {}) 
'(k1,k2,...,kn)'  multi-dimensional array of whatever follows 
':name:'          optional name of the preceeding element 
'X{}'             pointer to a function (optional function 
                    signature inside {} with any return value
                    preceeded by -> and placed at the end)
================  ===========

The struct module will be changed to understand these as well and
return appropriate Python objects on unpacking.  Unpacking a
long-double will return a decimal object or a ctypes long-double.
Unpacking 'u' or 'w' will return Python unicode.  Unpacking a
multi-dimensional array will return a list (of lists if >1d).
Unpacking a pointer will return a ctypes pointer object. Unpacking a
function pointer will return a ctypes call-object (perhaps). Unpacking
a bit will return a Python Bool.  White-space in the struct-string
syntax will be ignored if it isn't already.  Unpacking a named-object
will return some kind of named-tuple-like object that acts like a
tuple but whose entries can also be accessed by name. Unpacking a
nested structure will return a nested tuple.

Endian-specification ('!', '@','=','>','<', '^') is also allowed
inside the string so that it can change if needed.  The
previously-specified endian string is in force until changed.  The
default endian is '@' which means native data-types and alignment.  If
un-aligned, native data-types are requested, then the endian
specification is '^'.

According to the struct-module, a number can preceed a character
code to specify how many of that type there are.  The
``(k1,k2,...,kn)`` extension also allows specifying if the data is
supposed to be viewed as a (C-style contiguous, last-dimension
varies the fastest) multi-dimensional array of a particular format.

Functions should be added to ctypes to create a ctypes object from
a struct description, and add long-double, and ucs-2 to ctypes.

Examples of Data-Format Descriptions
====================================

Here are some examples of C-structures and how they would be
represented using the struct-style syntax. 

<named> is the constructor for a named-tuple (not-specified yet).

float
    ``'d'`` <--> Python float
complex double
    ``'Zd'`` <--> Python complex
RGB Pixel data
    ``'BBB'`` <--> (int, int, int)
    ``'B:r: B:g: B:b:'`` <--> <named>((int, int, int), ('r','g','b'))
    
Mixed endian (weird but possible)
    ``'>i:big: <i:little:'`` <--> <named>((int, int), ('big', 'little'))

Nested structure
    ::

        struct {
             int ival;
             struct {
                 unsigned short sval;
                 unsigned char bval;
                 unsigned char cval;
             } sub;
        }
        """i:ival: 
           T{
              H:sval: 
              B:bval: 
              B:cval:
            }:sub:
        """
Nested array
    ::

        struct {
             int ival;
             double data[16*4];
        }
        """i:ival: 
           (16,4)d:data:
        """

Note, that in the last example, the C-structure compared against is
intentionally a 1-d array and not a 2-d array data[16][4].  The reason
for this is to avoid the confusions between static multi-dimensional
arrays in C (which are layed out contiguously) and dynamic
multi-dimensional arrays which use the same syntax to access elements,
data[0][1], but whose memory is not necessarily contiguous.  The
struct-syntax *always* uses contiguous memory and the
multi-dimensional character is information about the memory to be
communicated by the exporter.  

In other words, the struct-syntax description does not have to match
the C-syntax exactly as long as it describes the same memory layout.
The fact that a C-compiler would think of the memory as a 1-d array of
doubles is irrelevant to the fact that the exporter wanted to
communicate to the consumer that this field of the memory should be
thought of as a 2-d array where a new dimension is considered after
every 4 elements. 


Code to be affected
===================

All objects and modules in Python that export or consume the old
buffer interface will be modified.  Here is a partial list.

* buffer object
* bytes object
* string object
* unicode object
* array module
* struct module
* mmap module
* ctypes module

Anything else using the buffer API.  


Issues and Details
==================

It is intended that this PEP will be back-ported to Python 2.6 by
adding the C-API and the two functions to the existing buffer
protocol.

Previous versions of this PEP proposed a read/write locking scheme,
but it was later perceived as a) too complicated for common simple use
cases that do not require any locking and b) too simple for use cases
that required concurrent read/write access to a buffer with changing,
short-living locks.  It is therefore left to users to implement their
own specific locking scheme around buffer objects if they require
consistent views across concurrent read/write access.  A future PEP
may be proposed which includes a separate locking API after some
experience with these user-schemes is obtained

The sharing of strided memory and suboffsets is new and can be seen as
a modification of the multiple-segment interface.  It is motivated by
NumPy and the PIL.  NumPy objects should be able to share their
strided memory with code that understands how to manage strided memory
because strided memory is very common when interfacing with compute
libraries.

Also, with this approach it should be possible to write generic code
that works with both kinds of memory without copying. 

Memory management of the format string, the shape array, the strides
array, and the suboffsets array in the bufferinfo structure is always
the responsibility of the exporting object.  The consumer should not
set these pointers to any other memory or try to free them. 

Several ideas were discussed and rejected: 

    Having a "releaser" object whose release-buffer was called.  This
    was deemed unacceptable because it caused the protocol to be
    asymmetric (you called release on something different than you
    "got" the buffer from).  It also complicated the protocol without
    providing a real benefit.

    Passing all the struct variables separately into the function.
    This had the advantage that it allowed one to set NULL to
    variables that were not of interest, but it also made the function
    call more difficult.  The flags variable allows the same
    ability of consumers to be "simple" in how they call the protocol. 


Code
====

The authors of the PEP promise to contribute and maintain the code for
this proposal but will welcome any help.


Examples
========

Ex. 1
-----------

This example shows how an image object that uses contiguous lines might expose its buffer::

  struct rgba {
      unsigned char r, g, b, a;
  };

  struct ImageObject {
      PyObject_HEAD;
      ...
      struct rgba** lines;
      Py_ssize_t height;
      Py_ssize_t width;
      Py_ssize_t shape_array[2];
      Py_ssize_t stride_array[2];
      Py_ssize_t view_count;
  };

"lines" points to malloced 1-D array of ``(struct rgba*)``.  Each pointer
in THAT block points to a seperately malloced array of ``(struct rgba)``.

In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r".

So what does ImageObject's getbuffer do?  Leaving error checking out::

  int Image_getbuffer(PyObject *self, Py_buffer *view, int flags) {

      static Py_ssize_t suboffsets[2] = { 0, -1};

      view->buf = self->lines;
      view->len = self->height*self->width;
      view->readonly = 0;
      view->ndims = 2;
      self->shape_array[0] = height;
      self->shape_array[1] = width;
      view->shape = &self->shape_array;
      self->stride_array[0] = sizeof(struct rgba*);  
      self->stride_array[1] = sizeof(struct rgba);
      view->strides = &self->stride_array;
      view->suboffsets = suboffsets;

      self->view_count ++;

      return 0;
  } 


  int Image_releasebuffer(PyObject *self, Py_buffer *view) {
      self->view_count--;
      return 0;
  }


Ex. 2
-----------

This example shows how an object that wants to expose a contiguous
chunk of memory (which will never be re-allocated while the object is
alive) would do that.

::

  int myobject_getbuffer(PyObject *self, Py_buffer *view, int flags) {

      void *buf;
      Py_ssize_t len;
      int readonly=0;
        
      buf = /* Point to buffer */
      len = /* Set to size of buffer */
      readonly = /* Set to 1 if readonly */
  
      return PyObject_FillBufferInfo(view, buf, len, readonly, flags);    
  }

  /* No releasebuffer is necessary because the memory will never 
     be re-allocated
  */

Ex.  3
-----------

A consumer that wants to only get a simple contiguous chunk of bytes
from a Python object, obj would do the following:

::

  Py_buffer view;
  int ret;
      
  if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
       /* error return */
  }

  /* Now, view.buf is the pointer to memory
          view.len is the length
          view.readonly is whether or not the memory is read-only.
   */
  

  /* After using the information and you don't need it anymore */
  
  if (PyBuffer_Release(obj, &view) < 0) {
          /* error return */
  }
  

Ex. 4
-----------

A consumer that wants to be able to use any object's memory but is
writing an algorithm that only handle contiguous memory could do the following:

::

    void *buf;
    Py_ssize_t len;
    char *format;
    int copy;

    copy = PyObject_GetContiguous(obj, &buf, &len, &format, 0, 'A');
    if (copy < 0) {
       /* error return */
    }

    /* process memory pointed to by buffer if format is correct */

    /* Optional: 
    
       if, after processing, we want to copy data from buffer back
       into the object  
 
       we could do
       */

    if (PyObject_CopyToObject(obj, buf, len, 'A') < 0) {
           /*        error return */
    }

    /* Make sure that if a copy was made, the memory is freed */
    if (copy == 1) PyMem_Free(buf);
   

Copyright
=========

This PEP is placed in the public domain.