Add ability to query column sets as one entity

Issue #2824 resolved
Vladimir Magamedov created an issue

In our company we have a tool called {{{construct}}}, which is doing next things:

  1. You declare what you want to show in your template (we use Mako, but we want to make templates as dumb as possible):

    product_struct = Construct(dict( name=Product.name, url=apply_( get_product_url, args=Product.name, ), image_url=if_( Product.main_image_id, then_=apply_(get_image_url, args=Image.file_name, Image.store_type, 100, 100), else_=None, ), ))

  2. You query this structure using {{{session.query}}}

    products = ( db.session.query(product_struct) .outerjoin(Product.main_image) .limit(10) .all() )

  3. And you get these results:

    [ Object(name=u'Foo', url=u'/p1-foo.html', image_url=u'http://images.example.st/123-foo-100x100.jpg'), Object(name=u'Bar', url=u'/p2-bar.html', image_url=None),

    # ...
    
    Object(name=u'Baz',
           url=u'/p10-baz.html',
           image_url=u'http://images.example.st/789-baz-100x100.jpg'),
    

    ]( )

    At this time, construct requires custom {{{Query}}} subclass, which overrides {{{_set_entities}}} and {{{instances}}} methods (plus some other methods). I find this hacky and I can't find any other solution in how to make construct and {{{Query}}} work better together.

    I've tried to subclass {{{ClauseList}}} and {{{ColumnElement}}} classes, to provide necessary columns to query (this works), but I can't figure out how to gather these columns as one entity/structure in the resulting row. Looks like {{{_ColumnEntity}}} wrapper doesn't support this functionality or implementation would be also very hacky, {{{_MapperEntity}}} is tied to mapped classes and {{{Query}}} doesn't support any other option.

    Is it possible to support so weird option?

Comments (17)

  1. Mike Bayer repo owner

    this seems like just a simple filter on top of column-based results. I'd think you could just hit Session.query_cls, and iter(), and be done with it, proof of concept:

    from sqlalchemy.orm import Query
    
    class Thing(Query):
        _special = False
    
        @classmethod
        def factory(cls, entities, session=None):
            if isinstance(entities[0](0), EntityThing):
                ent = entities[0](0)
                query = Thing(ent._expand(), session)
                query._special = ent
                return query
            else:
                return Query(entities, session)
    
        def __iter__(self):
            if self._special:
                return self._special._iter(Query.__iter__(self))
            else:
                return Query.__iter__(self)
    
    class EntityThing(object):
        def __init__(self, expr):
            self.expr = expr
    
        def _expand(self):
            return [           e.label(key)
                for key, e in self.expr.items()
            ](
    )
    
        def _iter(self, rows):
            for row in rows:
                yield self._process(row)
    
        def _process(self, row):
            return dict((key, col) for key, col in zip(self.expr, row))
    
    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    class A(Base):
        __tablename__ = 'a'
    
        id = Column(Integer, primary_key=True)
        data = Column(String)
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    sess = Session(e, query_cls=Thing.factory)
    sess.add(A(data='a1'), A(data='a2'))
    
    et = EntityThing({"id": A.id, "data": A.data})
    print sess.query(et).all()
    

    the only feature I can see here is maybe a more idiomatic way of filtering the output of a Query, we've just had people overriding __iter__() but we'd want to revisit several existing examples/recipes that feature this usage.

  2. Vladimir Magamedov reporter

    This is what I'm doing right now:) I've just implemented this in more general way (not only for the case when we query single entity, for example {{{session.query(Product.id, product_struct, ...)}}}).

    Here is my implementation:

    def _obj_getter(struct, i, count):
        return lambda row: struct.from_row(row[i:i+count](i:i+count))
    
    
    def _label_getter(i):
        return lambda row: row._labels[i](i)
    
    
    _none_getter = lambda row: None
    
    
    def _expand_entities(entity):
        return entity.columns if isinstance(entity, Construct) else (entity,)
    
    
    class Query(orm.Query):
    
        def _add_value_extractors(self, start_from, entities):
            i = start_from
            for entity in entities:
                if isinstance(entity, Construct):
                    count = len(entity.columns)
                    self._value_extractors.append(_obj_getter(entity, i, count))
                    self._label_extractors.append(_none_getter)
                    i += count
                else:
                    self._value_extractors.append(itemgetter(i))
                    self._label_extractors.append(_label_getter(i))
                    i += 1
    
        def _set_entities(self, entities, entity_wrapper=None):
            self._value_extractors = [       self._label_extractors = [](]
    )
            expanded_entities = tuple(chain(*(imap(_expand_entities, entities))))
            super(Query, self)._set_entities(expanded_entities, entity_wrapper)
            self._add_value_extractors(0, entities)
    
        def add_struct(self, struct):
            query = super(Query, self).add_columns(*struct.columns)
            query._add_value_extractors(len(self._entities), [struct](struct))
            return query
    
        def add_entity(self, entity, alias=None):
            query = super(Query, self).add_entity(entity, alias)
            query._add_value_extractors(len(self._entities), [entity](entity))
            return query
    
        def add_columns(self, *columns):
            query = super(Query, self).add_columns(*columns)
            query._add_value_extractors(len(self._entities), columns)
            return query
    
        def instances(self, *args, **kwargs):
            contains_struct = _none_getter in self._label_extractors
            single_struct = contains_struct and len(self._value_extractors) == 1
    
            gen = super(Query, self).instances(*args, **kwargs)
    
            if single_struct:
                for row in gen:
                    yield self._value_extractors[0](0)(row)
            elif contains_struct:
                for row in gen:
                    yield NamedTuple([for ext in self._value_extractors](ext(row)),
                                     [for ext in self._label_extractors](ext(row)))
            else:
                for row in gen:
                    yield row
    

    But how I can convince someone else to use this approach if it requires these quirks? An ideal way is to be able to query such structures with original Query and Session, without extra setup. I hope this are not very selfish statements:)

  3. Mike Bayer repo owner

    you're overriding twoHHH four methods, that's not much "quirk".

    the feature that you're doing is interesting but not something I have any notion of an API that would do exactly that, it's better that there are enough expansion points that it can be implemented via custom code as you're doing. Wrapping columns in an ad-hoc python function on return, that could be useful, that would be more of a Core feature and perhaps something that builds on type_coerce(). I can see helpers here, and maybe a "filter" extension. What "quirks" are bothering you exactly?

  4. Mike Bayer repo owner

    Another way to go here would be just to implement your own _QueryEntity subclass. At the moment I'd not encourage this use as the _QueryEntity hasn't been tested and fleshed out for end-user production (it's one thing to make a base class, another to make one that users can subclass). But that could be the expansion point too perhaps.

  5. Vladimir Magamedov reporter

    you're overriding twoHHH four methods, that's not much "quirk".

    I'm overriding non-public method in the {{{Query}}} class and reading non-public attribute of the {{{NamedTuple}}} class.

    It would be great if I could provide custom {{{_QueryEntity}}} subclass with simple API to somehow wrap columns and process them as one thing. Currently {{{_MapperEntity}}} and {{{_ColumnEntity}}} looks very complex for me, so I even didn't tried to write custom wrapper.

  6. Mike Bayer repo owner

    this is a pretty lame proof of concept but im not very comfortable with it, as it only would address your immediate concern which seems to be not a broad enough case:

    diff --git a/lib/sqlalchemy/orm/loading.py b/lib/sqlalchemy/orm/loading.py
    index 512a07d..5dc1184 100644
    --- a/lib/sqlalchemy/orm/loading.py
    +++ b/lib/sqlalchemy/orm/loading.py
    @@ -42,8 +42,8 @@ def instances(query, cursor, context):
                 def filter_fn(row):
                     return tuple(fn(x) for x, fn in zip(row, filter_fns))
    
    -    custom_rows = single_entity and \
    -                    query._entities[0](0).mapper.dispatch.append_result
    +    custom_rows = False #single_entity and \
    +                    #query._entities[0](0).mapper.dispatch.append_result
    
         (process, labels) = \
                     list(zip(*[
    diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
    index d64575a..01a7238 100644
    --- a/lib/sqlalchemy/orm/query.py
    +++ b/lib/sqlalchemy/orm/query.py
    @@ -2890,6 +2890,8 @@ class _QueryEntity(object):
                 if not isinstance(entity, util.string_types) and \
                             _is_mapped_class(entity):
                     cls = _MapperEntity
    +            elif isinstance(entity, UserDefinedEntity):
    +                cls = _UserDefinedEntity
                 else:
                     cls = _ColumnEntity
             return object.__new__(cls)
    @@ -2900,6 +2902,35 @@ class _QueryEntity(object):
             return q
    
    
    +class UserDefinedEntity(object):
    +    def setup_columns(self, column_collection, context):
    +        pass
    +
    +    def process_rows(self, context):
    +        pass
    +
    +
    +class _UserDefinedEntity(_QueryEntity):
    +    """User defined entity!"""
    +
    +    filter_fn = id
    +
    +    entities = ()
    +
    +    def __init__(self, query, entity):
    +        query._entities.append(self)
    +        self.entity = entity
    +
    +    def setup_entity(self, ext_info, aliased_adapter):
    +        raise NotImplementedError()
    +
    +    def setup_context(self, query, context):
    +        self.entity.setup_columns(context.primary_columns, context)
    +
    +    def row_processor(self, query, context, custom_rows):
    +        return self.entity.process_rows(context)
    +
    +
     class _MapperEntity(_QueryEntity):
         """mapper/class/AliasedClass entity"""
    
  7. Vladimir Magamedov reporter

    It works!:) Thank you, this is much better, now {{{Query}}} subclass (or mixin) will be simpler.

    Another way to go here would be just to implement your own _QueryEntity subclass. At the moment I'd not encourage this use as the _QueryEntity hasn't been tested and fleshed out for end-user production (it's one thing to make a base class, another to make one that users can subclass). But that could be the expansion point too perhaps.

    I'm understand that my case isn't broad enough, so you can close this ticket if you wish or leave it open to publish later more expansion points to support custom entity wrappers (as you said above) when their implementation would be stabilised/tested.

  8. Mike Bayer repo owner

    I have a need for something like this specific to composites, so I've implemented an experimental version of a new feature here in the ticket_2824 branch, if you want to check that out. Basically you override the Bundle class, implement the create_row_processor method as you wish and you can do whatever you want with a grouping of columns.

    I need to test out the composite functionality here in terms of my app over here before I merge this in.

  9. Vladimir Magamedov reporter

    I'm published my library on GitHub: https://github.com/vmagamedov/sqlconstruct - it is not finished/polished/documented yet, but I'm already wrote some tests and they can help.

    Here is my Bundle subclass:

    class Construct(Bundle):
    
        def __init__(self, spec):
            self._spec = OrderedDict(spec)
            self._columns = tuple(set(chain(*map(_yield_columns, spec.values()))))
            super(Construct, self).__init__(None, *self._columns)
    
        def from_row(self, row):
            values_map = dict(zip(self._columns, row))
            get_value = partial(_get_value_from_map, values_map)
            return Object(zip(
                self._spec.keys(),
                map(get_value, self._spec.values()),
            ))
    
        def create_row_processor(self, query, procs, labels):
            def proc(row, result):
                return self.from_row([None) for proc in procs](proc(row,))
            return proc
    

    Some errors which are speaks for themselves:

    ======================================================================
    ERROR: test_query_count (test_sqlconstruct.TestConstruct)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/vm/ws/sqlconstruct/test_sqlconstruct.py", line 336, in test_query_count
        self.assertEqual(query.count(), 2)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 2539, in count
        return self.from_self(col).scalar()
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 952, in from_self
        q = self._from_selectable(fromclause)
      File "<string>", line 1, in <lambda>
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 53, in generate
        fn(self, *args[1:](1:), **kw)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 981, in _from_selectable
        e.adapt_to_selectable(self, self._from_obj[0](0))
    AttributeError: '_BundleEntity' object has no attribute 'adapt_to_selectable'
    
    ======================================================================
    ERROR: test_query_with_explicit_join (test_sqlconstruct.TestConstruct)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/vm/ws/sqlconstruct/test_sqlconstruct.py", line 444, in test_query_with_explicit_join
        .join(self.b_cls.a)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1673, in join
        from_joinpoint=from_joinpoint)
      File "<string>", line 1, in <lambda>
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 53, in generate
        fn(self, *args[1:](1:), **kw)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1806, in _join
        outerjoin, create_aliases, prop)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1854, in _join_left_to_right
        self._join_to_left(l_info, left, right, onclause, outerjoin)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1967, in _join_to_left
        if ent.corresponds_to(left):
    AttributeError: '_BundleEntity' object has no attribute 'corresponds_to'
    
    ======================================================================
    ERROR: test_query_with_implicit_join_ge_08 (test_sqlconstruct.TestConstruct)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/vm/ws/sqlconstruct/test_sqlconstruct.py", line 503, in test_query_with_implicit_join_ge_08
        .join(self.a_cls)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1673, in join
        from_joinpoint=from_joinpoint)
      File "<string>", line 1, in <lambda>
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 53, in generate
        fn(self, *args[1:](1:), **kw)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1806, in _join
        outerjoin, create_aliases, prop)
      File "/Users/vm/ws/sqlconstruct/.tox/py27sqla0X/src/sqlalchemy/lib/sqlalchemy/orm/query.py", line 1819, in _join_left_to_right
        left = self._entities[0](0).entity_zero_or_selectable
    AttributeError: '_BundleEntity' object has no attribute 'entity_zero_or_selectable'
    

    Other tests failed because bundles can't be selected as "single entity". I think that bundles can work like models: when you querying only one bundle, query should yield bundles and not keyed tuples with one bundle.

    Here is my custom entity wrapper with some hacks to fix tests (very specific, supports only Construct entities for simplicity):

    class _ConstructEntity(_QueryEntity):
        """Queryable construct entities
    
        Adapted from: http://www.sqlalchemy.org/trac/ticket/2824
        """
        filter_fn = id
    
        entities = ()
        entity_zero_or_selectable = None
    
        # hack for sqlalchemy.orm.query:Query class
        class mapper:
            class dispatch:
                append_result = False
    
        def __init__(self, query, struct):
            query._entities.append(self)
            self.struct = struct
    
        def corresponds_to(self, entity):
            return False
    
        def adapt_to_selectable(self, query, sel):
            query._entities.append(self)
    
        #def setup_entity(self, *args, **kwargs):
        #    raise NotImplementedError
    
        def setup_context(self, query, context):
            context.primary_columns.extend(self.struct.columns)
    
        def row_processor(self, query, context, custom_rows):
            def processor(row, result):
                struct_row = [row[c](row[c) for c in self.struct.columns]
                return self.struct.from_row(struct_row)
            return processor, None
    
  10. Vladimir Magamedov reporter

    Other tests failed because bundles can't be selected as "single entity". I think that bundles can work like models: when you querying only one bundle, query should yield bundles and not keyed tuples with one bundle.

    points = db.session.query(Bundle(Point.x, Point.y)).all()
    for point in points:
        print point.x, point.y
    

    Is this possible to implement? At least optionally (overridable in Bundle subclass)?

  11. Mike Bayer repo owner
    • changed status to open
    • removed status

    I thought about that. The single model thing has always bothered me in its inconsistency. but i guess people will probably expect it to work that way, will look into it (it's just a conditional somewhere).

  12. Log in to comment