Stale data from with_for_update query on postgres backend

Issue #3917 resolved
Jochen Tuchbreiter created an issue

When selecting data from postgres engine using with_for_update() we sometimes get stale data.

Script to reproduce is attached. Usage:

1) Put your data into connection string in line 27

2) Prepare db:

$ python bug.py prep

3) Run tester processes:

$ python bug.py run

Expected result: Counter gets decreased for every line in output like this:

pid 9937 | Counter 20
pid 9942 | Counter 19
pid 9938 | Counter 18
pid 9943 | Counter 17
....

Actual result: We see the same counter value multiple times. This means that one process did get stale data even though the row was locked using with_for_update. It looks like this:

pid 9937 | Counter 20
pid 9942 | Counter 20
pid 9938 | Counter 20
pid 9943 | Counter 20
pid 9936 | Counter 20
pid 9941 | Counter 19
pid 9940 | Counter 19
.....

If you comment out line 52 and comment in line 54 it works as expected. This is why I assume that I am using with_for_update correctly.

If you turn on query logging you can see a SELECT FROM sp and then another one just like it but this time containing the FOR UPDATE. My wild guess is that the data from the first SELECT is used instead of data from the second.

I can reproduce it with SQLAlchemy==1.1.6.dev0 from today as well as 1.1.5.

Please reproduce with local db as this behaviour is very timing sensitive.

Let me know if I can help in any way.

Comments (4)

  1. Mike Bayer repo owner

    hi there -

    thanks for posting a great test case. The behavior here is not unexpected from an API perspective and SQL logging would illustrate what's happening:

     BEGIN (implicit)
    SELECT vh.id AS vh_id, vh.sp_id AS vh_sp_id 
    FROM vh 
    WHERE vh.id = %(id_1)s
    {'id_1': 1}
     SELECT sp.id AS sp_id, sp.counter AS sp_counter 
    FROM sp 
    WHERE sp.id = %(param_1)s
     {'param_1': 1L}
     SELECT sp.id AS sp_id, sp.counter AS sp_counter 
    FROM sp 
    WHERE sp.id = %(id_1)s FOR UPDATE
     {'id_1': 1L}
    pid 7414 | Counter 0
     UPDATE sp SET counter=-1 WHERE id=1
     {}
     COMMIT
    

    note that there are two SELECT statements against "sp", and only the second has FOR UPDATE. The first SELECT is a result of the expression:

    vh.sp
    

    which emits a lazy load for the "sp" value. Once sp with identity one is in the identity map, its contents are no longer overwritten (background background ) , so this is the source of the "stale" data.

    Changing the line to:

    sp= session.query(SP).filter(SP.id == vh.sp.id).with_for_update().populate_existing().one()
    

    resolves.

  2. Jochen Tuchbreiter reporter

    Thanks a lot for your detailed explanation - very helpful!

    It seems that it is quite easy to get this wrong from a user perspective. Maybe including a hint in the docs in the with_for_update() section might prevent some future issue reports like this one.

    I will probably end up refreshing the data in almost all cases where I currently use with_for_update()

    Resolving as this is expected/documented behaviour.

  3. Log in to comment