query heuristic inspection system

Issue #3225 new
Mike Bayer repo owner created an issue

given issues like #3223 and the openstack issues with model_query(), here's a sketch of an idea:

@event.listens_for(EventedQuery, "before_compile", retval=True)
def append_order_by(new_query):

    if inspect(new_query).selects_against(MyClass).has_any():
        for insp in inspect(new_query).selects_against(MyClass).all_aliases():
            new_query = new_query.order_by(insp.entity.whatever)
    return new_query

with this extension we'd need to identify specific use cases (since we see them all the time anyway) and add them to the docs of this extension as covered use cases. So the docs here need lots of recipes and they need to be tested.

Official response

  • Mike Bayer reporter

    OK I think event-wise, we only should have "before_compile". This is the only place that we know what we're otherwise working with.

    then we need a query language to get at information within the query. Here is the QueryInspector:

    insp = inspect(some_query)
    

    this can then tell us about what's contained within, with the QueriesAgainst object:

    some_entity = insp.selects_against(SomeClass)  # returns unconditionally
    some_entity.has()  # do we select against this at all (not including aliases)?
    some_entity.has_any() # do we select against this or any aliases of SomeClass?
    some_entity.joined() # is this entity assembled into a join or is it implicitly loaded from?
    some_entity.joined_any() # etc
    
    zero_entity = inspect(query).selects_against(0)  # give us the zeroth entity
    
    zero_entity.entity # give us the class or alias that we are
    zero_entity.mapper # the mapper
    zero_entity.is_aliased_class  # familiar
    zero_entity.expression  # give us the expression that we are selecting from
    
    some_entity = insp.selects_against(SomeClass)
    for against in some_entity.all_aliases():
       # iterate through a QueriesAgainst object for all aliases of SomeClass
    

    how about a from_self() query? (or that we get from a union, etc)

    # we have to modify query.statement so that an annotation is added which includes the originating query.
    # then this system can perform a lookup of selectables which contain this annotation.
    # derived from should only return *one level* of queries, that is, traversal stops with those queries which 
    # contain the annotation.
    insp = inspect(some_query)
    for embedded in insp.derived_from:
       # ...
    

Comments (12)

  1. Mike Bayer reporter

    OK I think event-wise, we only should have "before_compile". This is the only place that we know what we're otherwise working with.

    then we need a query language to get at information within the query. Here is the QueryInspector:

    insp = inspect(some_query)
    

    this can then tell us about what's contained within, with the QueriesAgainst object:

    some_entity = insp.selects_against(SomeClass)  # returns unconditionally
    some_entity.has()  # do we select against this at all (not including aliases)?
    some_entity.has_any() # do we select against this or any aliases of SomeClass?
    some_entity.joined() # is this entity assembled into a join or is it implicitly loaded from?
    some_entity.joined_any() # etc
    
    zero_entity = inspect(query).selects_against(0)  # give us the zeroth entity
    
    zero_entity.entity # give us the class or alias that we are
    zero_entity.mapper # the mapper
    zero_entity.is_aliased_class  # familiar
    zero_entity.expression  # give us the expression that we are selecting from
    
    some_entity = insp.selects_against(SomeClass)
    for against in some_entity.all_aliases():
       # iterate through a QueriesAgainst object for all aliases of SomeClass
    

    how about a from_self() query? (or that we get from a union, etc)

    # we have to modify query.statement so that an annotation is added which includes the originating query.
    # then this system can perform a lookup of selectables which contain this annotation.
    # derived from should only return *one level* of queries, that is, traversal stops with those queries which 
    # contain the annotation.
    insp = inspect(some_query)
    for embedded in insp.derived_from:
       # ...
    
  2. Mike Bayer reporter

    OK next modification. we need to get at eager joins as well. so I think we should not use inspect() here, and instead, pass either the QueryContext itself or some inspection version of that directly to the event. The API here needs to allow us to get at joins that have been created so that we can add criteria to the ON clause of joins. Some system of affecting entities at the FROM level transparently, e.g. if the FROM element is the right side of a JOIN the criteria goes into the ON else it goes into the WHERE, is needed.

    @event.listens_for(Query, "compile")
    def join_w_criteria(query, context):
        if context.selects_against(MyClass).has_any(include_eager=True):
            for insp in inspector.selects_against(MyClass).all_aliases(include_eager=True):
                # works at the level of "this entity, as selected from the FROM "
                insp.filter_from(insp.entity.deleted == False)  
    
                # would replace the ON clause entirely?
                # insp.replace_filter_from(...)
    
    @event.listens_for(Query, "compile")
    def append_order_by(query, context):
        if context.selects_against(MyClass).has_any():
            for insp in context.selects_against(MyClass).all_aliases():
                context.append_order_by(insp.entity.whatever)
    
  3. Mike Bayer reporter

    use cases:

    1. the query deals with certain entities at the column/criteria level, and we have to ensure that the query selects FROM these entities in a certain way, e.g. we need to add a join() / outerjoin().

    2. the query selects FROM certain entities, and we need to change how the set of these entities is delivered, e.g. filter / ON clause

    3. the query selects FROM certain entities and we need to add ordering criteria

    the use case for #1 has to be handled differently than the use case for #2, #3. I'm trying to find a simple, one-way-to-do-it-all approach here, but im not sure that's possible. if we already select from Foo and say WHERE RelatedBar == 5, we need to add .join(Foo.related_bars), and we may need the full capabilities of query.join() for that.

    For the vast majority of cases here, it's much easier to give the user a Query that they can just add things on towards. However, for eager joins, none of that works. eager joins are produced using Core-only techniques.

    I don't like giving the user two APIs, for the entity over here, you can use a Query, but then when it's joinedloaded, you have to use some totally different Core-only thing.

    but let's consider it's not avoidable. It would be:

    diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
    index f070608..b465577 100644
    --- a/lib/sqlalchemy/orm/query.py
    +++ b/lib/sqlalchemy/orm/query.py
    @@ -2850,6 +2850,8 @@ class Query(object):
             return update_op.rowcount
    
         def _compile_context(self, labels=True):
    +        self = self.dispatch.compile_before_setup(...)
    +
             context = QueryContext(self)
    
             if context.statement is not None:
    @@ -2866,17 +2868,17 @@ class Query(object):
                 strategy = rec[0]
                 strategy(*rec[1:])
    
    +        if self._enable_single_crit:
    +            self._adjust_for_single_inheritance(context)
    +
    +        self.dispatch.compile_after_setup(context, ...)
    +
             if context.from_clause:
                 # "load from explicit FROMs" mode,
                 # i.e. when select_from() or join() is used
                 context.froms = list(context.from_clause)
    -        else:
    -            # "load from discrete FROMs" mode,
    -            # i.e. when each _MappedEntity has its own FROM
    -            context.froms = context.froms
    -
    -        if self._enable_single_crit:
    -            self._adjust_for_single_inheritance(context)
    +        # else "load from discrete FROMs" mode,
    +        # i.e. when each _MappedEntity has its own FROM
    
             if not context.primary_columns:
                 if self._only_load_props:
    

    so here's how we'd have to document this. Use compile_before_setup to make the query deal with additional entities that aren't currently there, as well as to add details about how primary entities are loaded. Use compile_after_setup to modify the criteria at which individual entities are selected.

    so #1 is compile_before_setup, #2 is compile_after_setup, #3 is compile_before_setup.

    what is the inspection system used in either of these systems? I think there have to be two of them. The questions we ask of these inspection systems have to work against a Query and QueryContext at the same time. Maybe when we say, "include_eager=True", we know to look at the QueryContext if its present (throw an error if that flag is used in the before_setup event).

  4. Mike Bayer reporter

    if we need to alter the way a joined eager load works, that should be done with options. we can alter the joinedload() option to provide for additional ON criteria. The inspector within the "before" event can be used to provide information about what joinedloads are present and can add these options.

  5. Log in to comment