Metaprogramming with instrumented attributes is awkward

Issue #2235 resolved
Former user created an issue

I'm building a search doo-dad and trying to describe criteria with chains of attributes, like might appear in a loading option: Song.album, Album.artist, Artist.birth_year.

* Some tables need to be joined to multiple times, so I have to alias every table involved to be safe.  If all I have is `attr = Song.album`, the best way I can find to create a corresponding alias is `getattr(aliased(attr.property.argument), attr.key)` -- which is ugly, not really documented, and doesn't work with association proxies and other special relationships that have no `key` attribute.

* After getting some result rows, I often want to retrieve the column data corresponding to some criteria; e.g. if the user searched for a range, it's helpful to show the actual values.  How do I get `some_song.album` given `Song.album`?  I've only come up with the `key` line above, or the slightly more "correct" `attr.__get__(type(some_song), some_song)`.

I could just use string names of properties, I suppose, but I like the compile-time checking I get by using the actual objects.

A more general comment: some of the recipes and examples are absolutely fabulous, but I'd never have guessed that SQLAlchemy was even capable of such things, let alone figured out how to implement them. More documentation on extending SQLAlchemy itself would be phenomenally helpful. Note, for example, that the docs for ConnectionEvents don't describe the events' arguments.

Comments (6)

  1. Mike Bayer repo owner

    Replying to guest:

    * Some tables need to be joined to multiple times, so I have to alias every table involved to be safe.  If all I have is `attr = Song.album`, the best way I can find to create a corresponding alias is `getattr(aliased(attr.property.argument),
    

    Assuming you mean Song.album == attr and this is a SQL expression, what is the context by which you are arriving here ? I don't actually understand what you're trying to achieve here or what you are starting from - "argument" is never used by RelationshipProperty except during initialization ? Is this to provide filter() expressions ? Need much much more detail, really don't know what the context is here.

    * After getting some result rows, I often want to retrieve the column data corresponding to some criteria; e.g. if the user searched for a range, it's helpful to show the actual values.  How do I get `some_song.album` given `Song.album`?
    

    meaning what, session.query(Song.album) ? It would be the Nth element in the tuple corresponding to the Nth argument passed to query(). It is also named "album", and you can also get full metadata about the results that will be returned using http://www.sqlalchemy.org/docs/orm/query.html#sqlalchemy.orm.query.Query.column_descriptions .

    I've only come up with the key line above, or the slightly more "correct" attr.__get__(type(some_song), some_song).

    really no clue what "attr" is here, why on earth you'd need to dig into descriptor methods, need full working examples please !

    A more general comment: some of the recipes and examples are absolutely fabulous, but I'd never have guessed that SQLAlchemy was even capable of such things, let alone figured out how to implement them. More documentation on extending SQLAlchemy itself would be phenomenally helpful.

    Extensions like these tend to be documented through recipes at http://www.sqlalchemy.org/trac/wiki/UsageRecipes as well as /examples. Also, a SQLAlchemy book would be very nice. Unfortunately I have been under contract with Prentice to write a SQLAlchemy book for about five years now and the project has been stalled for most of this time as there is nobody able to help write it, which for me alone would be a full time job. The SQLAlchemy project overall right now has extremely few development/documentation resources with no relief in sight.

    Note, for example, that the docs for ConnectionEvents don't describe the events' arguments.

    We have several tickets for docstrings, including that we should have code examples for virtually all functions, :param: for all arguments, sure. Feel free to pick up an editor and start sending patches :)

  2. Former user Account Deleted

    Sorry, my brain is a bit fried today. Let me try this more concretely.

    A user searches for songs whose album artists were born in the 1980s. Doing that manually (with paranoid amounts of aliasing) might look like this:

        q = session.query(Song)
        song_album = aliased(Album)
        q = q.join(song_album, Song.album)
        song_album_artist = aliased(Artist)
        q = q.join(song_album_artist, song_album.artist)
    
        q = q.filter(song_album_artist.birth_year.between(1980, 1989))
    
        q = q.options(
            contains_eager(Song.album, alias=song_album),
            contains_eager(Song.album, Album.artist, alias=song_album_artist),
        )
    
        for song in q:
            print song.title, song.album.artist.birth_year
    

    But this search gadget has a lot of possible criteria, so I want to reuse that logic for various other chains of Song → relation → relation → ... → column. That is, I want the above to be a function I can just call like this:

        do_magic(
            root=Song,
            relationships=(Song.album, Album.artist),
            column=Artist.birth_year,
            argument=(1980, 1989),
        )
    

    With the original version, an alias for Album.artist is obviously written something like aliased(Album).artist, because I can read the class and property names from the code. The problem I'm running into is that, given an opaque reference to an ''arbitrary'' ORM property on a mapped class, it's difficult to get at the corresponding property on an alias or an instance.

    Is that more, er, lucid?

    Assuming you mean Song.album == attr and this is a SQL expression...

    attr was intended to be the actual value of Song.album. So what I want to do is: given ''only'' the RelationshipProperty itself, find its owning table, alias that table, and then refer to the same relationship on the alias. (And it might not be a relationship; could also be a column, or whatever.)

    meaning what, session.query(Song.album) ? It would be the Nth element in the tuple corresponding to the Nth argument passed to query().

    Similar to above: I only have the property object, and I want to "evaluate" it against an instance to get that instance's album.

    Some criteria will involve one-to-many relationships, so I can't just pass them all to query(), either.

    Also, a SQLAlchemy book would be very nice. Unfortunately I have been under contract with Prentice to write a SQLAlchemy book for about five years now and the project has been stalled for most of this time as there is nobody able to help write it, which for me alone would be a full time job. The SQLAlchemy project overall right now has extremely few development/documentation resources with no relief in sight.

    Ah, what a shame! I know free project documentation is a pain, and I don't mean to heckle. SQLAlchemy is a fantastic project; I just wish I knew better how to use it :)

    We have several tickets for docstrings, including that we should have code examples for virtually all functions, :param: for all arguments, sure. Feel free to pick up an editor and start sending patches :)

    Blast, you got me there. I'll see if I have anything coherent left in me once I'm done with this search thing...

  3. Mike Bayer repo owner

    Replying to guest:

    {{{ do_magic( root=Song, relationships=(Song.album, Album.artist), column=Artist.birth_year, argument=(1980, 1989), ) }}}

    With the original version, an alias for Album.artist is obviously written something like aliased(Album).artist, because I can read the class and property names from the code. The problem I'm running into is that, given an opaque reference to an ''arbitrary'' ORM property on a mapped class, it's difficult to get at the corresponding property on an alias or an instance.

    If you're aliasing everything in this way, join() supports an aliased=True flag. After the join() with this flag, whatever references to the target entity will be automatically aliased. So your above can be written as:

    def do_magic(root, relationships, column, argument):
        return s.query(root).\
                    join(aliased=True, *relationships).\
                    filter(column.between(*argument)).\
                    all()
    

    At the end, the call to filter() which involves Artist will adapt the criteria to the anonymous aliases formed in the join(). What you're trying to do with aliased() is pretty much the same thing. Though this doesn't do the contains eager part and I'm not sure if that's really straightforward - since what is happening here is essentially an extension to Query.

    The reason the method you have isn't a common use case is because that kind of approach is a bit brittle and can only handle the specific use cases you've come up with. Really the Query object itself is supposed to be the starting point for writing queries - its common for a "service layer" to provide coarse grained access to queries, like:

    def search_for_songs_with_artist(artist_name=None, birth_start=None, birth_end=None):
       pass
    

    and in my experience those are the two levels that are practical - both operate from a position of concrete knowledge about the entities being manipulated. The "I'm going to wrap around Query in an agnostic way" at the end of the day leads to you go full circle, and have essentially come up with a new way to write a Query that's just more difficult to make it do something.

    We do have some "metaprogramming" examples in /examples/, notably the horizontal sharding example as well as the XML traversal example. Traditionally, this kind of thing works with SQL constructs, i.e. Table and such, not the ORM-level constructs. Which is because those constructs existed first, not like doing it more with ORM entities is a bad thing.

    Anyway as far as the aliased/attribute access, I'm only seeing the need for relationship.property.mapper and attr.key, so this is not very intricate access:

    def do_magic(root, relationships, column, argument):
        q = s.query(root)
    
        prev = None
        opt = []
        for r in relationships:
            # note: this isn't working code, just illustrates
            # how to go from Cls.attr to alias(cls).attr
            target = aliased(r.property.mapper)
            if prev is not None:
                onclause = getattr(prev, r.key)
            else:
                onclause = r
            prev = onclause
            q = q.join(target, onclause)
            opt.append((r, target))
    
        # do similar with opt to generate the contains_eager
        # ...
    
        # then filter
        q = q.filter(getattr(target, column.key))
    

    I mentioned earlier to someone that it is planned for the small handful of important relationship() attributes to be sphinx-doced, we've already added them in for Mapper.

  4. Former user Account Deleted

    If you're aliasing everything in this way, join() supports an aliased=True flag. After the join() with this flag, whatever references to the target entity will be automatically aliased.

    Aha! So are you saying that this works as commented? (I just tried it, and it seems to.)

        q = session.query(Song)
    
        q = q.join(Song.album, Album.artist, aliased=True)
        # This refers to Song.album.artist
        q = q.filter(Artist.birth_year == 1980)
    
        q = q.join(Song.artist, aliased=True)
        # This refers to Song.artist
        q = q.filter(Artist.birth_year == 1980)
    

    That's very helpful; I had no idea how aliased=True actually worked with multiple joins to the same table, and didn't expect it to impact future filter() calls this way. Even if I alias manually, it looks like hooking into this behavior is as easy as assigning a new ORMAdapter to the query's _filter_aliases. That makes my life ''much'' easier.

    The reason the method you have isn't a common use case is because that kind of approach is a bit brittle and can only handle the specific use cases you've come up with. Really the Query object itself is supposed to be the starting point for writing queries...

    I have a version of this search already written which does the querying "by hand". I'm trying to expand it to also function as a JSON API, and I was hoping to reuse more code since many of the criteria behave very similarly -- a numeric range always needs the same HTML form control, the same parsing, the same query operation, the same output, etc. If it were merely a Python interface then I'd just have added little methods like yours here, but the similar UI and parsing make for a whole lot of duplication.

    Anyway as far as the aliased/attribute access, I'm only seeing the need for relationship.property.mapper and attr.key, so this is not very intricate access...

    This is surprisingly similar to what I have now, yes; it just took me rather longer to come up with (although I didn't know aliased also works on mappers). I wasn't sure if key were meant to be reliable, or if a relationship should have some other way to "evaluate itself" against a row, or what.

    Looks like I have some ideas for doc patches now :) Thanks!

  5. Log in to comment