Regression in lazy loading strategy for sharded setups

Issue #4228 resolved
Rishi Sharma
created an issue

SQLAlchemy >= 1.2

In sharded setups, the lazy loading strategy no longer fetches instances available in the identity map; instead, emits a query.

The following test passes in 1.1 but fails in 1.2

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.horizontal_shard import ShardedSession
from sqlalchemy.testing import fixtures
from sqlalchemy.testing import eq_
from sqlalchemy import testing

class LazyLoadFromIdentityMapTest(fixtures.DeclarativeMappedTest):
    @classmethod
    def setup_classes(cls):
        Base = cls.DeclarativeBasic

        class Book(Base):
            __tablename__ = 'book'
            id = Column(Integer, primary_key=True)
            pages = relationship('Page', backref='book')

        class Page(Base):
            __tablename__ = 'page'
            id = Column(Integer, primary_key=True)
            book_id = Column(ForeignKey('book.id'))

    def test_lazy_load_from_identity_map(self):
        session = ShardedSession(
            shards={"test": testing.db},
            shard_chooser=lambda *args: 'test',
            id_chooser=lambda *args: None,
            query_chooser=lambda *args: ['test']
        )

        Book, Page = self.classes("Book", "Page")
        book = Book()
        book.pages.append(Page())

        session.add(book)
        session.commit()

        book = session.query(Book).first()
        page = session.query(Page).first()

        def go():
            eq_(page.book, book)

        # doesn't emit SQL
        self.assert_sql_count(
            testing.db,
            go,
            0
        )

Comments (22)

  1. Rishi Sharma reporter

    Hi Michael, I imagine you are really busy. Is there something with the patch that needs changing? I can try to help. We are eager to upgrade to 1.2. Thanks! -Rishi

  2. Michael Bayer repo owner

    I've been mentally linking this to another sharding related issue but this is more of a bug than the other one which is more of a feature request, so let me take a look

  3. Michael Bayer repo owner

    Refactor "get" to allow for pluggable identity token schemes

    Fixed regression in 1.2 within sharded query feature where the new "identity_token" element was not being correctly considered within the scope of a lazy load operation, when searching the identity map for a related many-to-one element. The new behavior will allow for making use of the "id_chooser" in order to determine the best identity key to retrieve from the identity map. In order to achieve this, some refactoring of 1.2's "identity_token" approach has made some slight changes to the implementation of ShardedQuery which should be noted for other derivations of this class.

    Change-Id: I04fa60535deec2d0cdec89f602935dfebeb9eb9d Fixes: #4228

    → <<cset 43f278356d94>>

  4. Michael Bayer repo owner

    Refactor "get" to allow for pluggable identity token schemes

    Fixed regression in 1.2 within sharded query feature where the new "identity_token" element was not being correctly considered within the scope of a lazy load operation, when searching the identity map for a related many-to-one element. The new behavior will allow for making use of the "id_chooser" in order to determine the best identity key to retrieve from the identity map. In order to achieve this, some refactoring of 1.2's "identity_token" approach has made some slight changes to the implementation of ShardedQuery which should be noted for other derivations of this class.

    Change-Id: I04fa60535deec2d0cdec89f602935dfebeb9eb9d Fixes: #4228 (cherry picked from commit 43f278356d94b5342a1020a9a97feea0bb7cd88f)

    → <<cset eee88f5fb5fb>>

  5. Rishi Sharma reporter

    Tested on 1.2.7 locally and seeing the following error:

    AssertionError: SELECT ...(redacted query)... does not have a unique shard to bind to: set([])
    

    The reason being is that we have never implemented "id_chooser" (because we don't use "get" to fetch a single record, instead apply a filter on the primary key, then use "one" or "first").

    I suppose the recent change in 1.2.7 now requires "id_chooser" to be implemented?

  6. Michael Bayer repo owner

    yeah the horizontal shard extension assumes you've implemented all three. in your case, this is easy, the "ident" passed has the existing shard as ident[-1], just return that in a list:

    def id_chooser(query, ident):
        return [ident[-1]]
    
  7. Rishi Sharma reporter

    Using your implementation in my case, the ident is, for example, [1, 1] representing a composite primary key. The shard_id is not present in that list.

  8. Michael Bayer repo owner

    or you can search for that identity in the identity map locally to get it

    def id_chooser(query, ident):
        target_cls = query.column_descriptions[0].['type']
        for key in query.session.identity_map.keys():
            if key[0] is target_cls and key[1] == ident:
                return key[2]
    

    the id_chooser() function should be enhanced so that it is given context for the identity it's being asked for. or a fourth, optional callable can be added to ShardedSession that is specifically for lazy loads. The latter is possibly more doable in a point release.

  9. Michael Bayer repo owner

    yeah that id chooser won't really work that great, because if the object is not locally present then you don't know the shard id unless we send you more information.

  10. Michael Bayer repo owner

    but....it should get you to where you were before? would be fully:

    def id_chooser(query, ident):
        target_cls = query.column_descriptions[0].['type']
        for key in query.session.identity_map.keys():
            if key[0] is target_cls and key[1] == ident:
                return [key[2]]
        else:
             return [<all shard ids>]
    
  11. Rishi Sharma reporter

    With a minor modification of your id_chooser we get back to where we were:

    def id_chooser(query, ident):
        ident_tuple = tuple(ident)
        target_cls = query.column_descriptions[0]['type']
    
        for model_cls, primary_key_ident, shard_id in query.session.identity_map.keys():
            if model_cls is target_cls and primary_key_ident == ident_tuple:
                return [shard_id]
    
        return []
    

    I wonder if there is a small performance hit in scanning the identity_map keys for 100s of lookups, in our case.

    Your suggestion of an "optional callable can be added to ShardedSession that is specifically for lazy loads" seems reasonable. Although we are eager, the upgrade is not urgent, so please feel to take the time to do what you think makes sense.

  12. Log in to comment