lazyload parent state context for sharding

Issue #4243 resolved
Michael Bayer
repo owner created an issue

continuing from #4228, information needs to be present in the Query regarding what state we are lazy loading from. Additionally, when a lazyload does hit the DB we have no chance to know what the parent shard we are querying from is which could help when the assumption is that we lazy-load from the same shard.

Additionally, a basic refresh of an object uses id_chooser() and we should be able to have the token available as well though this might be a separate issue

We'd like id_chooser to be writable as:

def id_chooser(query, ident):
    if query.lazy_loaded_from:
        return [query.lazy_loaded_from.identity_token]
    else:
        return [< all shard ids>]

this needs to happen both for identity map lookup as well as a real load, and since BakedQuery does its own version of "get()" we need to fill through:

diff --git a/lib/sqlalchemy/ext/baked.py b/lib/sqlalchemy/ext/baked.py
index 13ad4cf7c..fa63d35d9 100644
--- a/lib/sqlalchemy/ext/baked.py
+++ b/lib/sqlalchemy/ext/baked.py
@@ -154,6 +154,11 @@ class BakedQuery(object):
         self._spoiled = True
         return self

+    def _add_lazyloaded_state(self, state):
+        self.add_criteria(
+            lambda q: q._set_lazyload_from(state)
+        )
+
     def _add_lazyload_options(self, options, effective_path, cache_path=None):
         """Used by per-state lazy loaders to add options to the
         "lazy load" query from a parent query.
diff --git a/lib/sqlalchemy/ext/horizontal_shard.py b/lib/sqlalchemy/ext/horizontal_shard.py
index 266bd784e..a612e6aca 100644
--- a/lib/sqlalchemy/ext/horizontal_shard.py
+++ b/lib/sqlalchemy/ext/horizontal_shard.py
@@ -65,7 +65,7 @@ class ShardedQuery(Query):
     @classmethod
     def _identity_lookup(
             cls, session, mapper, primary_key_identity, identity_token=None,
-            **kw):
+            lazyload_from=None, **kw):
         """override the default Query._identity_lookup method so that we
         search for a given non-token primary key identity across all
         possible identity tokens (e.g. shard ids).
@@ -80,6 +80,8 @@ class ShardedQuery(Query):
             )
         else:
             q = cls([mapper], session)
+            if lazyload_from:
+                q = q._set_lazyload_from(lazyload_from)
             for shard_id in q.id_chooser(q, primary_key_identity):
                 obj = super(ShardedQuery, cls)._identity_lookup(
                     session, mapper, primary_key_identity,
diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
index a17f590e9..44749cac8 100644
--- a/lib/sqlalchemy/orm/query.py
+++ b/lib/sqlalchemy/orm/query.py
@@ -110,6 +110,7 @@ class Query(object):
     _orm_only_from_obj_alias = True
     _current_path = _path_registry
     _has_mapper_entities = False
+    _lazyload_from = None

     def __init__(self, entities, session=None):
         """Construct a :class:`.Query` directly.
@@ -258,6 +259,10 @@ class Query(object):
             for o in cols
         ]

+    @_generative()
+    def _set_lazyload_from(self, state):
+        self._lazyload_from = state
+
     @_generative()
     def _adapt_all_clauses(self):
         self._orm_only_adapt = False
@@ -884,7 +889,7 @@ class Query(object):
     @classmethod
     def _identity_lookup(
             cls, session, mapper, primary_key_identity, identity_token=None,
-            passive=attributes.PASSIVE_OFF):
+            passive=attributes.PASSIVE_OFF, on_behalf_of=None):
         """Locate an object in the identity map.

         Given a primary key identity, constructs an identity key and then
@@ -904,6 +909,13 @@ class Query(object):
          :func:`.loading.get_from_identity`, which impacts the behavior if
          the object is found; the object may be validated and/or unexpired
          if the flag allows for SQL to be emitted.
+        :param on_behalf_of: an :class:`.InstanceState` that is specifically
+         asking for this identity as a related identity.  Used for sharding
+         schemes where there is a correspondence between an object and
+         a related object being lazy-loaded (or otherwise relationship-loaded).
+
+         .. versionadded:: 1.2.8
+
         :return: None if the object is not found in the identity map, *or*
          if the object was unexpired and found to have been deleted.
          if passive flags disallow SQL and the object is expired, returns
diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
index 9bce5593d..6241d4840 100644
--- a/lib/sqlalchemy/orm/strategies.py
+++ b/lib/sqlalchemy/orm/strategies.py
@@ -618,7 +618,7 @@ class LazyLoader(AbstractRelationshipLoader, util.MemoizedSlots):
             # identity_token would be for this identity
             instance = session._query_cls._identity_lookup(
                 session, self.mapper, primary_key_identity,
-                passive=passive
+                passive=passive, lazyload_from=state
             )

             if instance is not None:
@@ -709,14 +709,17 @@ class LazyLoader(AbstractRelationshipLoader, util.MemoizedSlots):
             # is usually a throwaway object.
             effective_path = state.load_path[self.parent_property]
             q._add_lazyload_options(
-                state.load_options, effective_path
+                state, state.load_options, effective_path
             )

+        q._add_lazyloaded_state(state)
+
         if self.use_get:
             if self._raise_on_sql:
                 self._invoke_raise_load(state, passive, "raise_on_sql")
             return q(session)._load_on_pk_identity(
-                session.query(self.mapper), primary_key_identity)
+                session.query(self.mapper),
+                primary_key_identity)

         if self.parent_property.order_by:
             q.add_criteria(
diff --git a/test/ext/test_horizontal_shard.py b/test/ext/test_horizontal_shard.py
index 0bcacad37..29cd7c30d 100644
--- a/test/ext/test_horizontal_shard.py
+++ b/test/ext/test_horizontal_shard.py
@@ -390,11 +390,23 @@ class LazyLoadFromIdentityMapTest(fixtures.DeclarativeMappedTest):
             book_id = Column(ForeignKey('book.id'))

     def test_lazy_load_from_identity_map(self):
+        def id_chooser(query, ident):
+            assert query._lazyload_from and \
+                query._lazyload_from.identity_token == 'test'
+            return ['test']
+
+        def no_query_chooser(query):
+            if query._lazyload_from is None:
+                assert query.column_descriptions[0]['type'] is Page
+            else:
+                assert query._lazyload_from.identity_token == 'test'
+            return ['test']
+
         session = ShardedSession(
             shards={"test": testing.db},
             shard_chooser=lambda *args: 'test',
-            id_chooser=lambda *args: ['test'],
-            query_chooser=lambda *args: ['test']
+            id_chooser=id_chooser,
+            query_chooser=no_query_chooser
         )

         Book, Page = self.classes("Book", "Page")
@@ -402,11 +414,11 @@ class LazyLoadFromIdentityMapTest(fixtures.DeclarativeMappedTest):
         book.pages.append(Page())

         session.add(book)
-        session.commit()
+        session.flush()

-        book = session.query(Book).first()
         page = session.query(Page).first()

+        session.expire(page, ['book'])
         def go():
             eq_(page.book, book)

@@ -415,3 +427,44 @@ class LazyLoadFromIdentityMapTest(fixtures.DeclarativeMappedTest):
             testing.db,
             go,
             0)
+
+    def test_lazy_load_from_db(self):
+        def id_chooser(query, ident):
+            assert query._lazyload_from and \
+                query._lazyload_from.identity_token == 'test'
+            return ['test']
+
+        def no_query_chooser(query):
+            if query._lazyload_from is None:
+                assert query.column_descriptions[0]['type'] is Page
+            else:
+                assert query._lazyload_from.identity_token == 'test'
+            return ['test']
+
+        session = ShardedSession(
+            shards={"test": testing.db},
+            shard_chooser=lambda *args: 'test',
+            id_chooser=id_chooser,
+            query_chooser=no_query_chooser
+        )
+
+        Book, Page = self.classes("Book", "Page")
+        book = Book()
+        book.pages.append(Page())
+
+        session.add(book)
+        session.flush()
+        book_id = inspect(book).identity_key
+        session.expunge(book)
+
+        page = session.query(Page).first()
+        session.expire(page, ['book'])
+
+        def go():
+            eq_(inspect(page.book).identity_key, book_id)
+
+        # emits one query
+        self.assert_sql_count(
+            testing.db,
+            go,
+            1)

this is compatible w/ a 1.2.8 release as this is more regression stuff due to #4137

Comments (9)

  1. Rishi Sharma

    Does it make sense for BakedQuery to lazyload from state by default, instead of id_chooser needing to implement that? That might preserve the pre-1.2 behaviour.

  2. Michael Bayer reporter

    that assumption was always a bug in 1.1 and earlier, that the related item has to be in the same shard.

    here, we have the ability to let the id choosers figure this out now, that opens up the possibility to automate this particular decision as an option, e.g. lazyloads would assume the shard of the parent and then your chooser functions don't get called directly. but anyway this is the start of how that would work. if you have time to maybe try this out that would be helpful #4228 ironically went out too fast :)

  3. Rishi Sharma

    Makes sense.

    I applied the patch locally, ran our test suite and got this error:

    ...my source...
      File "/env/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 242, in __get__
        return self.impl.get(instance_state(instance), dict_)
      File "/env/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 599, in get
        value = self.callable_(state, passive)
      File "/env/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 631, in _load_for_state
        session, state, primary_key_identity, passive)
      File "<string>", line 1, in <lambda>
      File "/env/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 712, in _emit_lazyload
        state, state.load_options, effective_path
      File "/env/lib/python2.7/site-packages/sqlalchemy/ext/baked.py", line 182, in _add_lazyload_options
        for opt in options:
    TypeError: 'InstanceState' object is not iterable
    
  4. Michael Bayer reporter

    that's this, can you revert this line?

    @@ -709,14 +709,17 @@ class LazyLoader(AbstractRelationshipLoader, util.MemoizedSlots):
                 # is usually a throwaway object.
                 effective_path = state.load_path[self.parent_property]
                 q._add_lazyload_options(
    -                state.load_options, effective_path
    +                state, state.load_options, effective_path
                 )
    
  5. Michael Bayer reporter

    OK just FYI the patch above is still pretty wrong, in that it probably fails for non-sharded queries because I mixed up the names "on_behalf_of" / "lazyload_from", as well as that the baked query is going to cache incorrectly.

    I am overdue for 1.2.8 release, the WIP for what I have here is https://gerrit.sqlalchemy.org/#/c/zzzeek/sqlalchemy/+/763 . this patch is more correct but I'm out of time to get the tests to assert everything I want to check for, so this is still unfortunately pending for now.

  6. Michael Bayer reporter

    Add Query.lazy_load_from attribute for sharding

    Added new attribute :attr:.Query.lazy_loaded_from which is populated with an :class:.InstanceState that is using this :class:.Query in order to lazy load a relationship. The rationale for this is that it serves as a hint for the horizontal sharding feature to use, such that the identity token of the state can be used as the default identity token to use for the query within id_chooser().

    Also repaired an issue in the :meth:.Result.with_post_criteria method added in I899808734458e25a023142c2c5bb37cbed869479 for 🎫4128 where the "unbake subquery loaders" version was calling the post crtieria functions given the :class:.Result as the argument rather than applying them to the :class:.Query.

    Change-Id: I3c0919ce7fd151b80fe2f9b5f99f60df31c2d73d Fixes: #4243

    → <<cset a574b409296e>>

  7. Michael Bayer reporter

    Add Query.lazy_load_from attribute for sharding

    Added new attribute :attr:.Query.lazy_loaded_from which is populated with an :class:.InstanceState that is using this :class:.Query in order to lazy load a relationship. The rationale for this is that it serves as a hint for the horizontal sharding feature to use, such that the identity token of the state can be used as the default identity token to use for the query within id_chooser().

    Also repaired an issue in the :meth:.Result.with_post_criteria method added in I899808734458e25a023142c2c5bb37cbed869479 for 🎫4128 where the "unbake subquery loaders" version was calling the post crtieria functions given the :class:.Result as the argument rather than applying them to the :class:.Query.

    Change-Id: I3c0919ce7fd151b80fe2f9b5f99f60df31c2d73d Fixes: #4243 (cherry picked from commit a574b409296ef793cec8e1d00f1f7be48f15325e)

    → <<cset 2eeab327b184>>

  8. Log in to comment