Inconsistencies in documentation and code regarding strong referencing guidance

Issue #3517 resolved
Allan Crooks created an issue

My use case: I recently upgraded a project, which was built using SQLAlchemy 0.3, to SQLAlchemy 1.0. I went from version to version making changes where required. One of the features that the project made use of from SQLA 0.3 was the "objectstore" extension, that provided (amongst other things) object caching.

As part of the update process, I moved to using sessions, and explicitly disabled the weak reference identity map to keep my code compatible. When I started to get warnings because weak_identity_map=False, I stopped passing that parameter. I then realised that it was causing performance issues, as now multiple calls were being made to the database to retrieve the same object, whereas only one call was made previously.

So, I want to be able to have sessions which will persist objects until they are explicitly cleared. The problem is, I see inconsistent advice offered when it comes to object caching.


  • The documentation here explicitly suggests how you should disable weak references if that's your preferred option: "To disable the weak referencing behavior and force all objects within the session to remain until explicitly expunged, configure sessionmaker with the weak_identity_map=False setting."

The use of it is not discouraged at all.

  • The documentation here indicates that, although the Session object is capable of being used as a cache, it's not designed for it.

This gives the impression that, while SQLAlchemy does provide an implementation of a cache, alternative options may work better. However, it does not discourage the use of it.

  • This indicates that the option is obsolete.

"Obsolete" would indicate that, regardless of the value passed to the flag, it would result in the same behaviour. This isn't true in my case.

As a message, that's unclear what that's saying. Is it saying that SQLAlchemy doesn't need that feature? Perhaps. Is it implying that I don't need that feature? That would be incorrect.

  • This discussion thread contains more enthusiastic advice that people should avoid using strong identity maps, and maintain a local cache of objects themselves.

This is a strong indication that we should strongly move away from using strong identity maps.

  • There is issue #1473, which specifies the deprecation of StrongInstanceDict.

However, there aren't any deprecation warnings in the direct usage of StrongInstanceDict itself - so I might infer from that the use of the classes in sqlalchemy.orm.identity are intended for internal use only, and that the weak_identity_map flag is meant to be the only way to make use of them.


These inconsistencies make it unclear on how to proceed (especially in terms of expectations on how future versions will change the API):

  • Is the caching feature of SQLAlchemy going to be removed altogether?

  • Am I allowed to use StrongInstanceDict, or write my own IdentityMap class?

  • Is SQLAlchemy correct in asserting that the flag is obsolete?

  • Is SQLAlchemy indicating that the feature is unnecessary for its own internal use, or for my intended use?

  • Shouldn't SQLAlchemy generate a warning which suggests to clients how they should make changes to achieve the same effect for future versions?

My ideal solution would be that I could continue using StrongInstanceDict - even if future versions of SQLAlchemy only supported this by allowing a class or factory function to return an instance of an object to use as an identity map (so allow any identity map implementation to be used, rather than only talking in terms of weak vs strong references).

Comments (9)

  1. Mike Bayer repo owner

    Hello - thanks for reporting.

    The overarching theme of this critique is that you are in the very unfortunate position of being one of the exceedingly few users that actually did things with version 0.3. Thanks for being with the project this long. However, as 0.3 is approaching ten years old, the documentation and the library itself is not oriented towards the "user upgrading from 0.3 to 1.0". It would be very confusing and misleading if the documentation were written with great references to 0.3-style patterns and how they have been superseded; the overwhelming vast majority of SQLAlchemy users only started using the library in version 0.6 or 0.7. While I appreciate your frustration, once you get around having to use a strong referencing session, you will like all our other users never notice the "weak_identity_map" flag again :).

    i can answer your questions directly:

    Is the caching feature of SQLAlchemy going to be removed altogether?

    I assume you mean the StrongInstanceDict itself. I have wanted to entirely remove it for many years, however, it remains in place both because it is not a maintenance burden (though bug reports like this definitely change that calculus), as well as that there are a handful of users from the 0.3 days who really really like it, and there's not been any compelling reason to take it away from them. SQLAlchemy has no issue keeping around legacy patterns of use as long as they are pushed outside of the library's internals when not being used (for example, the MapperExtension / SessionExtension classes).

    Am I allowed to use StrongInstanceDict, or write my own IdentityMap class?

    You can use the StrongInstanceDict itself though I'd recommend just using the weak_identity_map flag to turn it on. Warnings don't impact the operation of the program and they can also be suppressed. Writing your own identity map is not generally a public API path and would be very risky for you to maintain as I can't guarantee subtle changes to the Session wouldn't break it.

    Is SQLAlchemy correct in asserting that the flag is obsolete?

    I will remove that word but "obsolete" has been relevant for many years, because when the weak-referencing feature was first introduced, it didn't work in all cases. It had a lot of exception cases where it failed to do the right thing, so its advantages were not clear. So the old 0.3 behavior was kept around, in case the new behavior was a failure. But around 0.5 or so additional architectural changes were made that finally allowed the weak referencing map to work fully in all cases, finally solving the issue of people's applications running out of memory when they did too much work with a single Session. The strong identity map was finally obsolete.

    Is SQLAlchemy indicating that the feature is unnecessary for its own internal use, or for my intended use?

    it's unnecessary across the board. That's not to say it isn't serving a purpose in your 0.3 design, however it is much better for you to explicitly handle those objects for which you'd like to strong reference within your own systems; that way you aren't surprised by objects that were loading somewhere and are now stuck in your Session permanently until you expunge_all() on it. The Session is better focused on the one job it has to do, not other jobs that can be more accurately maintained externally to it.

    Shouldn't SQLAlchemy generate a warning which suggests to clients how they should make changes to achieve the same effect for future versions?

    What effect would that be, exactly? Which objects are the ones that you actually care about? There's an enormous variety of ways in Python to store references to objects so that they don't fall out of scope. Dictionaries, sets and lists are good starts. More complex caching systems are possible as well but these are for specific kinds of cases. The option does create a warning, which I will of course add some extra words to, but if it didn't, you probably wouldn't be posting this bug report.

    But to my original point, the question of "how do I make sure my objects don't vanish" is one that is simply never asked at all by anyone, except users that wrote their applications under 0.3. For everyone else, it's simply not a problem - the Session never held onto their objects like a cache in the first place and their applications never built a dependence on this effect. This is the strongest evidence that the "strong identity map" is not a good avenue to travel; it's brittle and conceptually burdensome. However, your app is already on it. So I'd not insist you change anything, that's why the option is there.

    My ideal solution would be that I could continue using StrongInstanceDict - even if future versions of SQLAlchemy only supported this by allowing a class or factory function to return an instance of an object to use as an identity map (so allow any identity map implementation to be used, rather than only talking in terms of weak vs strong references).

    Well, the flag isn't going anywhere too soon, but unfortunately your ideal solution is the opposite of what I'd like to see. Allowing pluggable identity maps into the Session is too embedded into the internal operation of the object.

    Instead, If you want to automatically track objects that are placed in the session, you can make a much more maintainable and future-proof design by using the after_attach event. Here is an extensive example illustrating how to use this event to build your own session-scoped cache, for example; this is more complex than what you need. To mimic the identity map, just put the objects inside of a dictionary, and place that dictionary inside of Session.info.

  2. Mike Bayer repo owner
    • use consistent and descriptive language in all cases where we refer to the "weak_identity_map" option, and add additional exposition in the session documentation which refers to it. fixes #3517

    → <<cset 956907a4b15f>>

  3. Mike Bayer repo owner

    in #2677 we will seek to improve the event model so that an exact mirror of strong identity mapping can be maintained externally to the Session using events. it is possible now only using after_attach with various hacks.

  4. Mike Bayer repo owner

    the after_attach event is not invoked for objects loaded from a query, so I will revise the documentation in progress here to reflect all currently known techniques for tracking objects as well as add a recipe to provide the identical behavior as strongidentitymap without digging into internals. the "detachment" part of it however will have to wait until 1.1 because a new event is needed.

  5. Mike Bayer repo owner

    new hooks will be added in 1.1 to help with this. in the meantime, here is a recipe that will strong reference all objects as they move to the persistent state:

    from sqlalchemy import event
    from sqlalchemy import inspect
    
    
    def _add_strong_ref(object_, session):
        if 'refs' not in session.info:
            refs = session.info['refs'] = set()
        else:
            refs = session.info['refs']
        refs.add(object_)
    
    
    @event.listens_for(Session, "after_attach")
    def add_persistent(session, object_):
        if inspect(object_).persistent:
    
            _add_strong_ref(object_, session)
    
    
    @event.listens_for(Session, "after_flush")
    def add_newly_persistent(session, ctx):
        for object_ in session.new:
            _add_strong_ref(object_, session)
    
    # assume Base = declarative_base()
    @event.listens_for(Base, "load", propagate=True)
    def add_loaded(object_, context):
        session = context.session
        _add_strong_ref(object_, session)
    
  6. Mike Bayer repo owner
    • The :class:.SessionEvents suite now includes events to allow unambiguous tracking of all object lifecycle state transitions in terms of the :class:.Session itself, e.g. pending, transient, persistent, detached. The state of the object within each event is also defined. fixes #2677
    • Added a new session lifecycle state :term:deleted. This new state represents an object that has been deleted from the :term:persistent state and will move to the :term:detached state once the transaction is committed. This resolves the long-standing issue that objects which were deleted existed in a gray area between persistent and detached. The :attr:.InstanceState.persistent accessor will no longer report on a deleted object as persistent; the :attr:.InstanceState.deleted accessor will instead be True for these objects, until they become detached.
    • The :paramref:.Session.weak_identity_map parameter is deprecated. See the new recipe at :ref:session_referencing_behavior for an event-based approach to maintaining strong identity map behavior. references #3517

    → <<cset 108c60f460c7>>

  7. Log in to comment