StatementError Exceptions un-pickable.

Issue #2371 resolved

Lorenzo Bolla created an issue 2012-01-10

All StatementError exceptions (and exceptions derived) are not pickable. This means that they cannot be used, for example, in the callback of multiprocessing.map_async function (which uses cPickle to communicate between master and workers).

See discussion here: http://stackoverflow.com/a/8786557/1063605

This is probably a Python bug, and it's been already issued: http://bugs.python.org/issue13751 but it would be nice to workaround it in SQLAlchemy, too.

unpickable.py

Comments (19)

Mike Bayer repo owner
- changed component to engine
- changed milestone to 0.7.5
- changed status to resolved
this is tested/fixed on the SQLAlchemy side in d78d2d60aa30b0b6c3c230ddf3cafda2529e6409. However, at least psycopg2 doesn't support pickling of their exceptions, since they package the cursor within it, and probably other DBAPIs - so the usage of multiprocessing still needs you to re-raise exceptions as something harmless. You'll need to post on their tracker as well http://initd.org/psycopg/tracker/ to get it fixed on their end.
- 2012-01-10T11:18:46+00:00
Faheem Mitha
Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

In 2: from cPickle import loads, dumps

In 3: loads(dumps(n))

TypeError Traceback (most recent call last)

/usr/local/src/sqlalchemy/sqlalchemy/lib/sqlalchemy/<ipython console> in <module>()

TypeError: ('init() takes exactly 4 arguments (2 given)', <class 'exc.NoReferencedColumnError'>, ('foo',))

This corresponds to

class NoReferencedColumnError(NoReferenceError): """Raised by ForeignKey when the referred Column cannot be located."""
```
def __init__(self, message, tname, cname):
    NoReferenceError.__init__(self, message)
    self.table_name = tname
    self.column_name = cname
```
but you have not added a __reduce__ function for that class. There are a couple of other instances, like NoReferencedTableError and CircularDependencyError.

Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.
- 2012-01-11T01:16:38+00:00
Faheem Mitha
I came up with the following simple example

import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

Which gives:

relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "<stdin>", line 11, in <module> File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

Is this convincing, or should I give an example using multiprocessing?
- 2012-01-11T06:30:23+00:00
Mike Bayer repo owner
- removed status
- changed status to open
Replying to faheem:

Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

In 2: from cPickle import loads, dumps

In 3: loads(dumps(n))

TypeError Traceback (most recent call last)

oh yes, missed those. reopening. though those are typically raised at configuration time.

Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.

as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.
- 2012-01-11T11:32:37+00:00
Mike Bayer repo owner
Replying to faheem:

I came up with the following simple example

import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

Which gives:

relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "<stdin>", line 11, in <module> File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

Is this convincing, or should I give an example using multiprocessing?

It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.
- 2012-01-11T11:33:39+00:00
Faheem Mitha
Replying to zzzeek:

Replying to faheem:

Hi Mike,

Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

I didn't mean to cause confusion. I just meant there are other ways one can do this. Per sbt in

http://stackoverflow.com/questions/8785899/hang-in-python-script-using-sqlalchemy-and-multiprocessing e.g.
```
class GoodExc1(Exception):
    def __init__(self, a):
        **Non-optional param in the constructor.**
        Exception.__init__(self, a)
        self.a = a
```
or
```
class GoodExc2(Exception):
    def __init__(self, a):
        **Non-optional param in the constructor.**
        self.args = (a,)
        self.a = a
```
But maybe these are no better than the way you are doing it.

as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.

So something like what lbolla is doing in that question looks reasonable to you? I.e.
```
def do(kwargs):
    try:
        i = kwargs['i']('i')
        print i
        raise BadExc('a')
        return i
    except Exception as e:
        raise Exception(repr(e))
```
I don't understand how this avoids the problem, but it seems to work.

Thanks, Faheem
- 2012-01-11T13:08:59+00:00

Faheem Mitha

Replying to zzzeek:

It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.

I've reported it here. The developers seem to be considering it, but I'm not really following the discussion. I did try to produce an example that would hang like the SQLA one, but was not able to. I did the following, but it doesn't hang as I would have expected. Any idea why?

import multiprocessing, psycopg2

def do(kwargs):
    i = kwargs['i']('i')
    conn = psycopg2.connect("dbname='illumina_faheem'")
    cur = conn.cursor()
    cur.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")

pool = multiprocessing.Pool(processes=5)               # start 5 worker processes
results = [= [](]
arglist)
for i in range(10):
    arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()

2012-01-11T13:18:22+00:00

Mike Bayer repo owner
Seems like they've accepted it. See how quick they were to close it at first ? Yeah, we all have itchy trigger fingers in this business :).
- 2012-01-11T13:43:00+00:00
Faheem Mitha
BTW, I didn't receive any of the updates to this ticket by email, as I would have expected.
- 2012-01-11T14:23:30+00:00
Mike Bayer repo owner
I've updated the trac configuration let's see what happens.
- 2012-01-11T14:28:40+00:00
Mike Bayer repo owner
- changed status to resolved
OK some more exceptions fixed in 26625d897f9becd7bccc6ece037e9ad39a6a77e9 and ORM + more tests in rc8b4b7c8cb7b9. If jenkins makes it through we should be good.
- 2012-01-11T21:05:38+00:00
Faheem Mitha
I just checked your changes. How about CircularDependencyError in lib/sqlalchemy/exc.py? This doesn't have a __reduce__ defined. Otherwise looks good.

I got an email notification of the last comment.

What is jenkins?

Faheem.
- 2012-01-12T00:37:50+00:00
Faheem Mitha
Thought about enabling openid support? See https://bitbucket.org/Dalius/authopenid-plugin/wiki/Home. This is in Debian, so is probably reasonably stable.
- 2012-01-12T00:41:45+00:00
Mike Bayer repo owner
- removed status
- changed status to open
- 2012-01-12T00:48:56+00:00
Mike Bayer repo owner
- changed status to resolved
7026ffe57c76821af3f1d5d01721e4035dc82de9
- 2012-01-12T00:49:35+00:00
Mike Bayer repo owner
Replying to faheem:

Thought about enabling openid support? See https://bitbucket.org/Dalius/authopenid-plugin/wiki/Home. This is in Debian, so is probably reasonably stable.

I'd prefer an OAuth plugin that I can hook into github, couldn't find one. openID seems kind of hopeless to me.
- 2012-01-12T00:50:05+00:00
Mike Bayer repo owner
also we're on 0.12 so this wouldn't work anyway.
- 2012-01-12T00:50:35+00:00
Faheem Mitha
I've never heard of OAuth before. One learns something new every day. By hopeless do you mean insecure, or do you have other objections? I've used openID for awhile. I don't know about security, but I like not having to keep track of 10 million passwords.
- 2012-01-12T01:00:10+00:00
Mike Bayer repo owner
- removed milestone
Removing milestone: 0.7.5 (automated comment)
- 2014-02-23T18:14:54+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Component: engine

Milestone: –

Votes: 0

Watchers: 0

Comments (19)

In 3: loads(dumps(n))

In 3: loads(dumps(n))