StatementError Exceptions un-pickable.

Issue #2371 resolved
Lorenzo Bolla created an issue

All StatementError exceptions (and exceptions derived) are not pickable. This means that they cannot be used, for example, in the callback of multiprocessing.map_async function (which uses cPickle to communicate between master and workers).

See discussion here: http://stackoverflow.com/a/8786557/1063605

This is probably a Python bug, and it's been already issued: http://bugs.python.org/issue13751 but it would be nice to workaround it in SQLAlchemy, too.

Comments (19)

  1. Mike Bayer repo owner

    this is tested/fixed on the SQLAlchemy side in d78d2d60aa30b0b6c3c230ddf3cafda2529e6409. However, at least psycopg2 doesn't support pickling of their exceptions, since they package the cursor within it, and probably other DBAPIs - so the usage of multiprocessing still needs you to re-raise exceptions as something harmless. You'll need to post on their tracker as well http://initd.org/psycopg/tracker/ to get it fixed on their end.

  2. Faheem Mitha

    Hi Mike,

    Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

    In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

    In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

    In 2: from cPickle import loads, dumps

    In 3: loads(dumps(n))

    TypeError Traceback (most recent call last)

    /usr/local/src/sqlalchemy/sqlalchemy/lib/sqlalchemy/<ipython console> in <module>()

    TypeError: ('init() takes exactly 4 arguments (2 given)', <class 'exc.NoReferencedColumnError'>, ('foo',))

    This corresponds to

    class NoReferencedColumnError(NoReferenceError): """Raised by ForeignKey when the referred Column cannot be located."""

    def __init__(self, message, tname, cname):
        NoReferenceError.__init__(self, message)
        self.table_name = tname
        self.column_name = cname
    

    but you have not added a __reduce__ function for that class. There are a couple of other instances, like NoReferencedTableError and CircularDependencyError.

    Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

    BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.

  3. Faheem Mitha

    I came up with the following simple example

    import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

    import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

    Which gives:

    relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "<stdin>", line 11, in <module> File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

    Is this convincing, or should I give an example using multiprocessing?

  4. Mike Bayer repo owner
    • removed status
    • changed status to open

    Replying to faheem:

    Hi Mike,

    Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

    not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

    In any case, don't you need to define __reduce__ on other exception in "exc.py"? For example. you have

    In 1: n = exc.NoReferencedColumnError("foo", "tname", "cname")

    In 2: from cPickle import loads, dumps

    In 3: loads(dumps(n))

    TypeError Traceback (most recent call last)

    oh yes, missed those. reopening. though those are typically raised at configuration time.

    Additionally, I'm unclear about what to do about psycopg2 as you suggest. I think that writing on their site "make your exceptions pickleable" would be short of detail. Can you provide a usage example or more information about the issue?

    BTW, on the subject of how to temporarily work around it, do you recommend adding __reduce__ functions using copy_reg or otherwise, or reraising the exception as lbolla does? I prefer the former, but apparently there is the issue of psycopg2 exceptions that you raised... Thanks.

    as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.

  5. Mike Bayer repo owner

    Replying to faheem:

    I came up with the following simple example

    import psycopg2 conn = psycopg2.connect("dbname='illumina_faheem'") cur = conn.cursor() try: cur.execute("SELECT * FROM barf") except Exception, e: print e pass

    import cPickle a = cPickle.dumps(e) l = cPickle.loads(a)

    Which gives:

    relation "barf" does not exist LINE 1: SELECT * FROM barf ^ Traceback (most recent call last): File "<stdin>", line 11, in <module> File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.name TypeError: can't pickle cursor objects

    Is this convincing, or should I give an example using multiprocessing?

    It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.

  6. Faheem Mitha

    Replying to zzzeek:

    Replying to faheem:

    Hi Mike,

    Thanks for the quick fix. However, wouldn't it be more straightforward to just pass the argments down to the .args attribute of Exception (which is what pickle looks at)?

    not sure what you mean here. I basically didn't want to make the arguments in __init__ optional, so __reduce__ is a way to control exactly how __init__ is called when unpickling.

    I didn't mean to cause confusion. I just meant there are other ways one can do this. Per sbt in

    http://stackoverflow.com/questions/8785899/hang-in-python-script-using-sqlalchemy-and-multiprocessing e.g.

    class GoodExc1(Exception):
        def __init__(self, a):
            **Non-optional param in the constructor.**
            Exception.__init__(self, a)
            self.a = a
    

    or

    class GoodExc2(Exception):
        def __init__(self, a):
            **Non-optional param in the constructor.**
            self.args = (a,)
            self.a = a
    

    But maybe these are no better than the way you are doing it.

    as a workaround I'd not be allowing exceptions to propagate outside of multiprocessing subprocesses. I'd reraise them or handle them.

    So something like what lbolla is doing in that question looks reasonable to you? I.e.

    def do(kwargs):
        try:
            i = kwargs['i']('i')
            print i
            raise BadExc('a')
            return i
        except Exception as e:
            raise Exception(repr(e))
    

    I don't understand how this avoids the problem, but it seems to work.

    Thanks, Faheem

  7. Faheem Mitha

    Replying to zzzeek:

    It's convincing, but you do need to mention that multiprocessing pickles exception objects. Otherwise they might wonder why you need this feature.

    I've reported it here. The developers seem to be considering it, but I'm not really following the discussion. I did try to produce an example that would hang like the SQLA one, but was not able to. I did the following, but it doesn't hang as I would have expected. Any idea why?

    import multiprocessing, psycopg2
    
    def do(kwargs):
        i = kwargs['i']('i')
        conn = psycopg2.connect("dbname='illumina_faheem'")
        cur = conn.cursor()
        cur.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
    
    pool = multiprocessing.Pool(processes=5)               # start 5 worker processes
    results = [= [](]
    arglist)
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append)
    r.get()
    r.wait()
    pool.close()
    pool.join()
    
  8. Mike Bayer repo owner

    Seems like they've accepted it. See how quick they were to close it at first ? Yeah, we all have itchy trigger fingers in this business :).

  9. Faheem Mitha

    I just checked your changes. How about CircularDependencyError in lib/sqlalchemy/exc.py? This doesn't have a __reduce__ defined. Otherwise looks good.

    I got an email notification of the last comment.

    What is jenkins?

    Faheem.

  10. Faheem Mitha

    I've never heard of OAuth before. One learns something new every day. By hopeless do you mean insecure, or do you have other objections? I've used openID for awhile. I don't know about security, but I like not having to keep track of 10 million passwords.

  11. Log in to comment