Source

python-peps / pep-0335.txt

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
PEP: 335
Title: Overloadable Boolean Operators
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing@canterbury.ac.nz>
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Aug-2004
Python-Version: 3.3
Post-History: 05-Sep-2004, 30-Sep-2011, 25-Oct-2011

Rejection Notice
================

This PEP was rejected.
See http://mail.python.org/pipermail/python-dev/2012-March/117510.html

Abstract
========

This PEP proposes an extension to permit objects to define their own
meanings for the boolean operators 'and', 'or' and 'not', and suggests
an efficient strategy for implementation.  A prototype of this
implementation is available for download.


Background
==========

Python does not currently provide any '__xxx__' special methods
corresponding to the 'and', 'or' and 'not' boolean operators.  In the
case of 'and' and 'or', the most likely reason is that these operators
have short-circuiting semantics, i.e. the second operand is not
evaluated if the result can be determined from the first operand.  The
usual technique of providing special methods for these operators
therefore would not work.

There is no such difficulty in the case of 'not', however, and it
would be straightforward to provide a special method for this
operator.  The rest of this proposal will therefore concentrate mainly
on providing a way to overload 'and' and 'or'.


Motivation
==========

There are many applications in which it is natural to provide custom
meanings for Python operators, and in some of these, having boolean
operators excluded from those able to be customised can be
inconvenient.  Examples include:

1. NumPy, in which almost all the operators are defined on
   arrays so as to perform the appropriate operation between
   corresponding elements, and return an array of the results.  For
   consistency, one would expect a boolean operation between two
   arrays to return an array of booleans, but this is not currently
   possible.

   There is a precedent for an extension of this kind: comparison
   operators were originally restricted to returning boolean results,
   and rich comparisons were added so that comparisons of NumPy
   arrays could return arrays of booleans.

2. A symbolic algebra system, in which a Python expression is
   evaluated in an environment which results in it constructing a tree
   of objects corresponding to the structure of the expression.

3. A relational database interface, in which a Python expression is
   used to construct an SQL query.

A workaround often suggested is to use the bitwise operators '&', '|'
and '~' in place of 'and', 'or' and 'not', but this has some
drawbacks:

* The precedence of these is different in relation to the other operators,
  and they may already be in use for other purposes (as in example 1).

* It is aesthetically displeasing to force users to use something other
  than the most obvious syntax for what they are trying to express.  This
  would be particularly acute in the case of example 3, considering that
  boolean operations are a staple of SQL queries.

* Bitwise operators do not provide a solution to the problem of
  chained comparisons such as 'a < b < c' which involve an implicit
  'and' operation. Such expressions currently cannot be used at all
  on data types such as NumPy arrays where the result of a comparison
  cannot be treated as having normal boolean semantics; they must be
  expanded into something like (a < b) & (b < c), losing a considerable
  amount of clarity.


Rationale
=========

The requirements for a successful solution to the problem of allowing
boolean operators to be customised are:

1. In the default case (where there is no customisation), the existing
   short-circuiting semantics must be preserved.

2. There must not be any appreciable loss of speed in the default
   case.

3. Ideally, the customisation mechanism should allow the object to
   provide either short-circuiting or non-short-circuiting semantics,
   at its discretion.

One obvious strategy, that has been previously suggested, is to pass
into the special method the first argument and a function for
evaluating the second argument.  This would satisfy requirements 1 and
3, but not requirement 2, since it would incur the overhead of
constructing a function object and possibly a Python function call on
every boolean operation.  Therefore, it will not be considered further
here.

The following section proposes a strategy that addresses all three
requirements.  A `prototype implementation`_ of this strategy is
available for download.

.. _prototype implementation:
   http://www.cosc.canterbury.ac.nz/~greg/python/obo//Python_OBO.tar.gz


Specification
=============

Special Methods
---------------

At the Python level, objects may define the following special methods.

===============  =================  ========================
Unary            Binary, phase 1    Binary, phase 2
===============  =================  ========================
* __not__(self)  * __and1__(self)   * __and2__(self, other)
                 * __or1__(self)    * __or2__(self, other)
                                    * __rand2__(self, other)
                                    * __ror2__(self, other)
===============  =================  ========================

The __not__ method, if defined, implements the 'not' operator.  If it
is not defined, or it returns NotImplemented, existing semantics are
used.

To permit short-circuiting, processing of the 'and' and 'or' operators
is split into two phases.  Phase 1 occurs after evaluation of the first
operand but before the second.  If the first operand defines the
relevant phase 1 method, it is called with the first operand as
argument.  If that method can determine the result without needing the
second operand, it returns the result, and further processing is
skipped.

If the phase 1 method determines that the second operand is needed, it
returns the special value NeedOtherOperand.  This triggers the
evaluation of the second operand, and the calling of a relevant
phase 2 method. During phase 2, the __and2__/__rand2__ and
__or2__/__ror2__ method pairs work as for other binary operators.

Processing falls back to existing semantics if at any stage a relevant
special method is not found or returns NotImplemented.

As a special case, if the first operand defines a phase 2 method but
no corresponding phase 1 method, the second operand is always
evaluated and the phase 2 method called.  This allows an object which
does not want short-circuiting semantics to simply implement the
phase 2 methods and ignore phase 1.


Bytecodes
---------

The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2,
LOGICAL_OR_1 and LOGICAL_OR_2.  As an example of their use, the
bytecode generated for an 'and' expression looks like this::

            .
            .
            .
            evaluate first operand
            LOGICAL_AND_1  L
            evaluate second operand
            LOGICAL_AND_2
       L:   .
            .
            .

The LOGICAL_AND_1 bytecode performs phase 1 processing.  If it
determines that the second operand is needed, it leaves the first
operand on the stack and continues with the following code.  Otherwise
it pops the first operand, pushes the result and branches to L.

The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both
operands and pushing the result.


Type Slots
----------

At the C level, the new special methods are manifested as five new
slots in the type object.  In the patch, they are added to the
tp_as_number substructure, since this allows making use of some
existing code for dealing with unary and binary operators.  Their
existence is signalled by a new type flag,
Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.

The new type slots are::

    unaryfunc nb_logical_not;
    unaryfunc nb_logical_and_1;
    unaryfunc nb_logical_or_1;
    binaryfunc nb_logical_and_2;
    binaryfunc nb_logical_or_2;


Python/C API Functions
----------------------

There are also five new Python/C API functions corresponding to the
new operations::

    PyObject *PyObject_LogicalNot(PyObject *);
    PyObject *PyObject_LogicalAnd1(PyObject *);
    PyObject *PyObject_LogicalOr1(PyObject *);
    PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
    PyObject *PyObject_LogicalOr2(PyObject *, PyObject *);


Alternatives and Optimisations
==============================

This section discusses some possible variations on the proposal,
and ways in which the bytecode sequences generated for boolean
expressions could be optimised.

Reduced special method set
--------------------------

For completeness, the full version of this proposal includes a
mechanism for types to define their own customised short-circuiting
behaviour. However, the full mechanism is not needed to address the
main use cases put forward here, and it would be possible to
define a simplified version that only includes the phase 2
methods. There would then only be 5 new special methods (__and2__,
__rand2__, __or2__, __ror2__, __not__) with 3 associated type slots
and 3 API functions.

This simplified version could be expanded to the full version
later if desired.

Additional bytecodes
--------------------

As defined here, the bytecode sequence for code that branches on
the result of a boolean expression would be slightly longer than
it currently is. For example, in Python 2.7,

::

    if a and b:
        statement1
    else:
        statement2

generates

::

        LOAD_GLOBAL         a
        POP_JUMP_IF_FALSE   false_branch
        LOAD_GLOBAL         b
        POP_JUMP_IF_FALSE   false_branch
        <code for statement1>
        JUMP_FORWARD        end_branch
    false_branch:
        <code for statement2>
    end_branch:

Under this proposal as described so far, it would become something like

::

        LOAD_GLOBAL         a
        LOGICAL_AND_1       test
        LOAD_GLOBAL         b
        LOGICAL_AND_2
    test:
        POP_JUMP_IF_FALSE   false_branch
        <code for statement1>
        JUMP_FORWARD        end_branch
    false_branch:
        <code for statement2>
    end_branch:

This involves executing one extra bytecode in the short-circuiting
case and two extra bytecodes in the non-short-circuiting case.

However, by introducing extra bytecodes that combine the logical
operations with testing and branching on the result, it can be
reduced to the same number of bytecodes as the original:

::

        LOAD_GLOBAL         a
        AND1_JUMP           true_branch, false_branch
        LOAD_GLOBAL         b
        AND2_JUMP_IF_FALSE  false_branch
    true_branch:
        <code for statement1>
        JUMP_FORWARD        end_branch
    false_branch:
        <code for statement2>
    end_branch:

Here, AND1_JUMP performs phase 1 processing as above,
and then examines the result. If there is a result, it is popped
from the stack, its truth value is tested and a branch taken to
one of two locations.

Otherwise, the first operand is left on the stack and execution
continues to the next bytecode. The AND2_JUMP_IF_FALSE bytecode
performs phase 2 processing, pops the result and branches if
it tests false

For the 'or' operator, there would be corresponding OR1_JUMP
and OR2_JUMP_IF_TRUE bytecodes.

If the simplified version without phase 1 methods is used, then
early exiting can only occur if the first operand is false for
'and' and true for 'or'. Consequently, the two-target AND1_JUMP and
OR1_JUMP bytecodes can be replaced with AND1_JUMP_IF_FALSE and
OR1_JUMP_IF_TRUE, these being ordinary branch instructions with
only one target.

Optimisation of 'not'
---------------------

Recent versions of Python implement a simple optimisation in
which branching on a negated boolean expression is implemented
by reversing the sense of the branch, saving a UNARY_NOT opcode.

Taking a strict view, this optimisation should no longer be
performed, because the 'not' operator may be overridden to produce
quite different results from usual. However, in typical use cases,
it is not envisaged that expressions involving customised boolean
operations will be used for branching -- it is much more likely
that the result will be used in some other way.

Therefore, it would probably do little harm to specify that the
compiler is allowed to use the laws of boolean algebra to
simplify any expression that appears directly in a boolean
context. If this is inconvenient, the result can always be assigned
to a temporary name first.

This would allow the existing 'not' optimisation to remain, and
would permit future extensions of it such as using De Morgan's laws
to extend it deeper into the expression.


Usage Examples
==============

Example 1: NumPy Arrays
-----------------------

::

    #-----------------------------------------------------------------
    #
    #   This example creates a subclass of numpy array to which
    #   'and', 'or' and 'not' can be applied, producing an array
    #   of booleans.
    #
    #-----------------------------------------------------------------
    
    from numpy import array, ndarray
    
    class BArray(ndarray):
    
        def __str__(self):
            return "barray(%s)" % ndarray.__str__(self)
    
        def __and2__(self, other):
            return (self & other)
    
        def __or2__(self, other):
            return (self & other)
    
        def __not__(self):
            return (self == 0)
    
    def barray(*args, **kwds):
        return array(*args, **kwds).view(type = BArray)
    
    a0 = barray([0, 1, 2, 4])
    a1 = barray([1, 2, 3, 4])
    a2 = barray([5, 6, 3, 4])
    a3 = barray([5, 1, 2, 4])
    
    print "a0:", a0
    print "a1:", a1
    print "a2:", a2
    print "a3:", a3
    print "not a0:", not a0
    print "a0 == a1 and a2 == a3:", a0 == a1 and a2 == a3
    print "a0 == a1 or a2 == a3:", a0 == a1 or a2 == a3

Example 1 Output
----------------

::

    a0: barray([0 1 2 4])
    a1: barray([1 2 3 4])
    a2: barray([5 6 3 4])
    a3: barray([5 1 2 4])
    not a0: barray([ True False False False])
    a0 == a1 and a2 == a3: barray([False False False  True])
    a0 == a1 or a2 == a3: barray([False False False  True])


Example 2: Database Queries
---------------------------

::

    #-----------------------------------------------------------------
    #
    #   This example demonstrates the creation of a DSL for database
    #   queries allowing 'and' and 'or' operators to be used to
    #   formulate the query.
    #
    #-----------------------------------------------------------------
    
    class SQLNode(object):
        
        def __and2__(self, other):
            return SQLBinop("and", self, other)
    
        def __rand2__(self, other):
            return SQLBinop("and", other, self)
    
        def __eq__(self, other):
            return SQLBinop("=", self, other)
    
    
    class Table(SQLNode):
    
        def __init__(self, name):
            self.__tablename__ = name
            
        def __getattr__(self, name):
            return SQLAttr(self, name)
        
        def __sql__(self):
            return self.__tablename__
    
    
    class SQLBinop(SQLNode):
    
        def __init__(self, op, opnd1, opnd2):
            self.op = op.upper()
            self.opnd1 = opnd1
            self.opnd2 = opnd2
    
        def __sql__(self):
            return "(%s %s %s)" % (sql(self.opnd1), self.op, sql(self.opnd2))
    
    
    class SQLAttr(SQLNode):
    
        def __init__(self, table, name):
            self.table = table
            self.name = name
        
        def __sql__(self):
            return "%s.%s" % (sql(self.table), self.name)
    
    
    class SQLSelect(SQLNode):
    
        def __init__(self, targets):
            self.targets = targets
            self.where_clause = None
    
        def where(self, expr):
            self.where_clause = expr
            return self
        
        def __sql__(self):
            result = "SELECT %s" % ", ".join([sql(target) for target in self.targets])
            if self.where_clause:
                result = "%s WHERE %s" % (result, sql(self.where_clause))
            return result
    
    
    def sql(expr):
        if isinstance(expr, SQLNode):
            return expr.__sql__()
        elif isinstance(expr, str):
            return "'%s'" % expr.replace("'", "''")
        else:
            return str(expr)
    
    
    def select(*targets):
        return SQLSelect(targets)
    
    #-----------------------------------------------------------------
    
    dishes = Table("dishes")
    customers = Table("customers")
    orders = Table("orders")
    
    query = select(customers.name, dishes.price, orders.amount).where(
        customers.cust_id == orders.cust_id and orders.dish_id == dishes.dish_id
        and dishes.name == "Spam, Eggs, Sausages and Spam")
    
    print repr(query)
    print sql(query)

Example 2 Output
----------------

::

    <__main__.SQLSelect object at 0x1cc830>
    SELECT customers.name, dishes.price, orders.amount WHERE
    (((customers.cust_id = orders.cust_id) AND (orders.dish_id =
    dishes.dish_id)) AND (dishes.name = 'Spam, Eggs, Sausages and Spam'))


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End: