make_transient causes field values to disappear

Issue #3640 resolved
zestyping created an issue

My intended use case is to retrieve a row from a table and add a modified copy of the row. Using SQLAlchemy, I'm expressing this as a query, followed by make_transient() on the retrieved instance, modifying some fields on the instance, and then committing the new instance.

After this point the instance is in a mysterious state, in which calling make_transient() on it makes the values of its fields disappear. There are at least 3 invariants violated here:

  1. After make_transient(), fields that should have values become None. I would expect make_transient() to never change any fields of the instance.

  2. The persistent instance after add() and commit() does not have behaviour identical to that of an instance loaded by a query() that retrieves the same row. I would expect such instances to be indistinguishable in all respects.

  3. Merely accessing a field after commit() causes it to no longer become None after make_transient(). I would expect that merely reading a field never has detectable side effects.

The following script demonstrates the bug.

import sys
import sqlalchemy
import sqlalchemy.orm
import sqlalchemy.ext.declarative
from sqlalchemy import Table, Column, Integer, Sequence, MetaData

print('sys.version = %s' % sys.version)
print('sqlalchemy.__version__ = %s' % sqlalchemy.__version__)
print('')

Base = sqlalchemy.ext.declarative.declarative_base()

class Foo(Base):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    value = Column(Integer)

engine = sqlalchemy.create_engine('sqlite:///:memory:')
Base.metadata.create_all(engine)

s = sqlalchemy.orm.sessionmaker()
s.configure(bind=engine)
db = s()

# Insert one row.
x = Foo(id=1, value=1)
db.add(x)
db.commit()
print('x.value = %r' % x.value)  # prints 1

# Fetch the row and use it as a template for a second row to insert.
y = db.query(Foo).first()
sqlalchemy.orm.make_transient(y)

y.id = 2
print('y.value = %r' % y.value)  # prints 1
db.add(y)
db.commit()

# Merely accessing y.value here will change the output printed below!
# y.value

sqlalchemy.orm.make_transient(y)
print('y.value = %r' % y.value)  # prints None (should print 1)

Expected output is:

x.value = 1
y.value = 1
y.value = 1

Actual output in Python3:

sys.version = 3.5.0 (default, Sep 23 2015, 04:42:00) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)]
sqlalchemy.__version__ = 1.0.11

x.value = 1
y.value = 1
y.value = None

Actual output in Python 2.7:

sys.version = 2.7.10 (default, Jul 14 2015, 19:46:27) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)]
sqlalchemy.__version__ = 1.0.11

x.value = 1
y.value = 1
y.value = None

Comments (23)

  1. Mike Bayer repo owner

    Hi there!

    Please note that session.commit() by default expires all the attributes on objects. This is documented at http://docs.sqlalchemy.org/en/rel_1_0/orm/session_basics.html#committing.

    Below, the moment you call db.commit(), your "y" object is empty of all direct attribute state, which will be automatically re-fetched once you access the attributes:

    y.id = 2
    print('y.value = %r' % y.value)  # prints 1
    db.add(y)
    db.commit()
    
    # y is now *empty* of everything except the _sa_instance_state tracking 
    # object, which tells the Session how to refresh its attributes once they 
    # are re-accessed
    
    # hence calling this reloads y.value.  turn on echo=True on the engine
    # to see this.
    # y.value
    

    therefore here we have an essentially empty object, with just its primary key stored in _sa_instance_state. But then we call make_transient(), which erases that key as well:

     sqlalchemy.orm.make_transient(y)
    

    therefore we have an empty, transient object now. the behavior is expected and documented.

  2. zestyping reporter

    Hi Mike,

    Thanks for your quick response and explanation!

    The behaviour I'm seeing is extremely surprising. Although I think I understand what you are saying, it seems really strange to me. Are you sure that you want all three of the invariants I mentioned to be wrong? Most SQLAlchemy users, if asked, would probably believe those same three invariants. I would also be surprised if most (or even 10% of) SQLAlchemy users would be able to correctly predict the behaviour seen here. Do you have a different view of what your users believe?

    the moment you call db.commit(), your "y" object is empty of all direct attribute state

    But db.commit() does not in general clear the instance fields. For example, after db.add(x); db.commit() in this script, x.value is still 1, not None.

    here we have an essentially empty object, with just its primary key stored in _sa_instance_state. But then we call make_transient(), which erases that key as well

    That isn't what I'm seeing; after the first call to make_transient(y), y.id is still 1, not None.

    So, I'm finding it hard to construct a mental model that gives me the right intuitions about this. Thanks again for taking the time to help me out.

  3. Mike Bayer repo owner

    Hello -

    It seems perhaps you're not fully understanding what "expire" means, or alternatively the concept of lazy loading. Expiration means that the database-persisted values of the object are discarded. If you look inside of object.__dict__, all the keys like 'id' are gone. The second part is that when you access these attributes normally, their current database value is lazy loaded, which re-instates the value as freshly retrieved from the database . The documentation tries to make the reader familiar with this mostly within the object relational tutorial which illustrates the SQL being emitted as expired attributes are being re-accessed, as well as from the glossary entry which also includes links to further documentation.

    I can also respond to all your points individually but I'm hoping that just these two concepts will make it clear.

  4. zestyping reporter

    You're right, we have different expectations about what expiration should do in this case.

    As far as I know, the common understanding of lazy loading is that it's supposed to be transparent, as with most operations called "lazy"—in other words, the client doesn't have to know or care whether the loading is lazy. The laziness doesn't affect the semantics of the operation; whether the data is lazily loaded or eagerly loaded, the same data is there when it is observed.

    The inconsistent behaviour is the real issue here, which is why I've been trying to describe this in terms of invariants rather than definitions of terms. Do you see why it is surprising to get a different result based only on whether something was observed? Do you think most SQLAlchemy users get this? Or perhaps is it just that make_transient() a wizard-level operation, below the level of abstraction that you would recommend most users should work at, and thus only to be used by people who understand the inner workings of SQLAlchemy in unusual detail?

  5. Mike Bayer repo owner

    re: the "invariants". #1 is inaccurate because make_transient does not change the state of any attributes; The None is present because make_transient does not assume you'd like to unexpire any existing expired attributes. The other "invariants" make a similar assumption. There are many methods that do not assume a "pre-unexpire" action and will deliver a detached object with unloaded state. It would be completely impractical to pre-unexpire in many cases, and the only alternative would be to not provide such methods at all, including session.close(), session.expunge(), session.expunge_all(), make_transient(), and others. Unexpiration impies not only SQL-per-object but also a flush(), these are the most heavyweight operations the ORM does and bundling them into every state-management method would be very undesirable.

    If all methods were to require a fully stringent, completely airtight form of "consistency", the library would be forced to forego a tremendous amount of flexibility and functionality. SQLAlchemy was designed to avoid this issue by purposely embracing the inherent "leakiness" of non-trivial abstractions, instead of struggling to hide all leaks, it makes them part of the public API. This might sound strange but it is the fundamental reason for SQLAlchemy's success, a "practicality beats purity", if you will.

    For an ORM with a much more stringent sense of "consistency" in this regard you might look at the Storm ORM. It's "Store" object for example does not have any such methods as "expunge()" or similar, nor the ability to disable autoflush, etc. Because they are impossible without exposing the mechanism of the abstration.

    In this case, a simple documentation note in make_transient(), not to mention on other methods that don't "auto-un-expire" before producing a detached object, would alleviate future confusion in this area.

  6. zestyping reporter

    Yes, documentation would be one way to address this. This would have to be a big bold warning, and even then it's still going to be a stumbling block for a lot of people.

    You understand that you're violating a contract of the Python language here, right? The thing you're calling "a fully stringent, completely airtight form of consistency" is one of the most basic and simple expectations of a programmer: that merely reading a variable x will not change the observed value of some other variable y. If you were to change 1 + 1 so that it gives 3, not many people are going to read the docs for + and notice the warning that says "this returns the wrong result when one of the inputs is 1".

    I'm not saying you can't do this, but the level of communication required has to be commensurate with the extreme level of strangeness, e.g. "Warning! Methods in the following category will break your fundamental expectations about the Python language; use with great care."

  7. Mike Bayer repo owner

    How about I remove "make_transient", and the problem is gone, would that help? If it is in fact such a heinous crime to introduce a function to accomplish a task that nonetheless might be confusing. Why do you need make_transient()? Why can't you use the Storm ORM which as I said prides itself on making no such sins?

    The answer is, you need the functionality. SQLAlchemy is an open source, non commerical, written-for-free-by-very-few-people project. Actionable suggestions, not to mention pull requests with actual code, are welcome so feel free to propose some.

  8. Christopher Toth

    @zestyping Hopping in here to try and resolve this usefully. Do you think a documentation update would be sufficient to address this? I don't expect any API changes at this point, and so docs are the obvious low-hanging fruit to target here.

  9. Mike Bayer repo owner

    " that merely reading a variable x will not change the observed value of some other variable y. "

    what specifically are your referring to here? make_transient() is not a "read" operation. Lazy-loads, e.g. "some_object.x" loads, is, and to my knowledge that's the only one - 1.0 repaired one major such "modify on read" behavior that was outside of this category.

    Certainly changing state on reads is not something to undertake casually. But lazy loading of collections and attributes is a major pattern throughout SQLAlchemy ORM and it does have that behavior. This aspect of the ORM, as well as others, are naturally controversial (for example is entirely incompatible with explicit asynchronous programming), which is why the ORM of SQLAlchemy itself is entirely optional.

    Are you suggesting that the basic concept of lazy loading itself, requires a huge red warning, and if so where would this warning be?

  10. Mike Bayer repo owner
    • fully hyperlink the docstring for make_transient
    • establish make_transient and make_transient_to_detached as special-use, advanced use only functions
    • list all conditions under make_transient() under which an attribute will not be loaded and establish that make_transient() does not attempt to load all attributes before detaching the object from its session, fixes #3640

    → <<cset 7eff4e8f3e39>>

  11. Mike Bayer repo owner
    • fully hyperlink the docstring for make_transient
    • establish make_transient and make_transient_to_detached as special-use, advanced use only functions
    • list all conditions under make_transient() under which an attribute will not be loaded and establish that make_transient() does not attempt to load all attributes before detaching the object from its session, fixes #3640

    (cherry picked from commit 7eff4e8f3e3999d9eb914647d8776e6e5b7ee88e)

    → <<cset d3eef9551030>>

  12. Mike Bayer repo owner
    • fully hyperlink the docstring for make_transient
    • establish make_transient and make_transient_to_detached as special-use, advanced use only functions
    • list all conditions under make_transient() under which an attribute will not be loaded and establish that make_transient() does not attempt to load all attributes before detaching the object from its session, fixes #3640

    (cherry picked from commit 7eff4e8f3e3999d9eb914647d8776e6e5b7ee88e)

    → <<cset 13473af1b78f>>

  13. zestyping reporter

    Hi Mike. I agree, updating the documentation is a good course of action and would have been my first proposal. Thanks for making this change.

    To answer your question from a few comments up, "merely reading a variable x will not change the observed value of some other variable y" refers to the commented line in the example script — the third-last line, y.value. If I uncomment that line, the output changes from None to 1 because, as you explained, accessing y.value has the side-effect of loading the value from the database before make_transient, so the value remains visible after make_transient. Because accessing y.value loads all the attributes of the instance, it has the even more surprising effect that if y had some more attributes foo and bar, accessing y.foo would change the later observed value of y.bar.

    Lazy-loading is not itself evil! :) It's the combination of lazy-loading, expiry, and make_transient()'s current functionality that produces the surprising effect in this case. The best idea I have at the moment for how to produce consistent behaviour is to make make_transient() expire all attributes, so that it always makes all the attributes appear to become None, instead of sometimes appearing to keep their values and sometimes appearing to become None. This would achieve the goal that merely accessing any attribute has no visible effect on outcomes, without the cost of pre-unexpiring all the attributes.

    There are many existing users of SQLAlchemy, and I respect that the disruption caused by a change to the API could be hard to justify. I don't know if this change would cause a lot of breakage. I guess I do wonder, if anyone is relying on values to still be present after make_transient(), in what circumstances is that actually useful behaviour? Do you consider it an intentional part of the design that the attribute values remain present if unexpired, or is that somewhat of an accident? Clearly make_transient() is not for the purpose that I had incorrectly assumed (I wanted to turn a just-committed instance into a plain data container that doesn't have side effects on the database), and I'm not understanding its actual intended use cases—sorry about that.

    In any case, the documentation changes you've made are very clear and would have been great at helping me avoid encountering this. Thank you.

  14. Mike Bayer repo owner

    The best idea I have at the moment for how to produce consistent behaviour is to make make_transient() expire all attributes, so that it always makes all the attributes appear to become None, instead of sometimes appearing to keep their values and sometimes appearing to become None.

    but that goes against the reason I wrote make_transient() for myself, which was that i wanted to take an object and copy it to a new session as a new row.

    I don't know if this change would cause a lot of breakage. I guess I do wonder, if anyone is relying on values to still be present after make_transient(), in what circumstances is that actually useful behaviour?

    just to give some perspective I can't imagine what circumstance you would not want the values to be present :). If I need a new, blank object I'd just call the constructor? why would i need make_transient()?

    Clearly make_transient() is not for the purpose that I had incorrectly assumed (I wanted to turn a just-committed instance into a plain data container that doesn't have side effects on the database),

    Hmm but you must have wanted to retain some kind of state from that object? otherwise why not just make a new object.

    In any case, the documentation changes you've made are very clear and would have been great at helping me avoid encountering this. Thank you.

    OK glad we could get through this thanks for your contribution!

  15. zestyping reporter

    but that goes against the reason I wrote make_transient() for myself, which was that i wanted to take an object and copy it to a new session as a new row.

    That's exactly what I wanted to use it for too.

    just to give some perspective I can't imagine what circumstance you would not want the values to be present

    I very much do want the values to be present. Now it seems like you're arguing my case. :) The only reason I wanted to use it was to keep those values, and you've just been explaining to me that it cannot be relied upon for that and there is no practical way it can be made to do that.

    Okay, let's suppose, based on our sample size of 2 (you and me), that what is really needed is a way to take an object and copy it to a new session as a new row. We've established that make_transient() doesn't work for that. How would you meet that need?

  16. Mike Bayer repo owner

    you've just been explaining to me that it cannot be relied upon for that and there is no practical way it can be made to do that.

    It seems you're regarding whether or not attributes are loaded as some inaccessible magic, I don't see it that way. The APIs that control whether or not attributes are present are explicit, e.g. you queried for the whole entity, you didnt map with deferred(), you didn't call commit() first, etc.

    In my own use case I simply loaded the object normally, made sure i didn't expire any of its attributes, and made the copy. You can iterate through all attributes and access them if you're looking for a generalized system of ensuring this.

  17. Mike Bayer repo owner

    also note, for making a copy of a row, assuming you aren't doing it recursively to other rows as well, you only need to ensure column attributes are loaded. relationships don't need to be present.

  18. Log in to comment