Query._join reuses joinpath node with stale 'prev' pointer

Issue #2247 resolved
Former user created an issue

Query._join can reuse a joinpath node with stale 'prev' pointer. This can cause it to revert to an earlier incarnation of the _joinpath structure, thus forgetting some joinpaths. You can trigger the bug with a sequence of joins such as:

q.join('a').join('b').join('a', 'c').join('b', 'd').all()

(ProgrammingError) table name "B" specified more than once

The query will end up with two separate join clauses for 'B' instead of reusing the first one.

The bug occurs when the third join discards the joinpath from the second join when it reuses the joinpath from the first join. This happens because only the 'prev' references along the spine of the tree from _joinpath to _joinpoint are kept up to date.

I have attached a patch that fixes the bug.

I'm using v0.7.2 and this code looks about the same in hg.

Comments (7)

  1. Mike Bayer repo owner
    • changed milestone to 0.7.3

    now that is what I'm talking about ! thanks for proving that I'm not the only one who can understand that code ! Patch with a test is attached (was able to get the existing ORM fixtures to imitate your a/b/ac/bd structure exactly), who can I give credit towards ?

  2. Former user Account Deleted

    Dave Vitek (dvitek@grammatech.com)

    I was upgrading the version of sqlalchemy we use in our CodeSonar tool from 0.5 to 0.7 yesterday. We had implemented a hackier version of joinpath caching on our end a while back, so had to reconcile the two. The upgrade was prompted by a race inside the weakref _cleanup code with the old session_id stuff -- happily fixed in 0.7. Anyway, thanks for the great software and all the hard work. The upgrade has been relatively painless.

    PS We have a few trivial modifications to sqlalchemy if there's anything you want to pick over: 1) Use itertools.count instead of a random int in the psycopg2 dialect for server side cursor names; had random collisions on occasion. Safe with cpython, but you should probably use a lock and a global to be portable to other pythons. 2) Added Query.pop_joinpoint; you can guess what this does 3) A convenience flag you can pass to join so it applies "contains_eager" to the joinee 4) Some code for logging fully bound sql queries (ala cursor.last_query) using python's 'logging' module 5) Curtailed size_growth in engine/base.py to just the first two entries -- on some pathological inputs the python process was OOMing with the larger numbers. No silver bullet on this one, I suspect.

  3. Mike Bayer repo owner

    Hmmm is itertools.count threadsafe ? We have another place that we're doing a "global count" which currently uses a mutex. If I can just stick a count there and not mutex it that would be handy.

  4. Mike Bayer repo owner

    can you show me the use case for Query.pop_joinpoint() ? I never even need reset_joinpoint() in my own code.

  5. Log in to comment