marking subclass of declared_attr with cascading ignores subclass

Issue #4082 open
David Lord created an issue

Before a class is mapped, I want to check if it defines a primary key. If the column is defined in a declared_attr, I can't know if it's primary without evaluating the attr, which SQLAlchemy correctly warns about, since the model is not mapped yet.

To solve this, I thought I'd require marking declared attrs that define a primary key with a declared_pkey subclass. However, this breaks when using declarded_pkey.cascading, since internally _stateful_declared_attr creates a declared_attr instead of knowing about the subclass.

I realize that cascading is not necessary in this contrived example, but it still demonstrates the issue. Running the following code should print 'pkey named id' but does not.

class declared_pkey(declared_attr):
    primary_key = True

class IDMixin:
    @declared_pkey.cascading
    def id(cls):
        return Column(Integer, primary_key=True)

class ExampleMeta(DeclarativeMeta):
    def __init__(cls, name, bases, d):
        for base in cls.__mro__:
            for key, value in base.__dict__.items():
                if (
                    isinstance(value, (declared_attr, sa.Column, InstrumentedAttribute))
                    and getattr(value, 'primary_key', False)
                ):
                    print(f'pkey named {key}')

        super(ExampleMeta, cls).__init__(name, bases, d)

Base = declarative_base(metaclass=ExampleMeta)

class User(IDMixin, Base):
    __tablename__ = 'user'

For now my solution has been to document that this doesn't work and tell people to do id.primary_key = True manually, but it would be nice if subclasses were preserved.

class IDMixin:
    @declared_attr.cascading
    def id(cls):
        return Column(Integer, primary_key=True)

    id.primary_key = True

To solve this, _stateful_declared_attr could be modifed to take the class as well.

class _stateful_declared_attr(declared_attr):
    def __init__(self, cls, **kw):
        self.cls = cls
        self.kw = kw

    def _stateful(self, **kw):
        new_kw = self.kw.copy()
        new_kw.update(kw)
        return _stateful_declared_attr(self.cls, **new_kw)

    def __call__(self, fn):
        return self.cls(fn, **self.kw)

class declared_attr(...):
    ...
    def _stateful(cls, **kw):
        return _stateful_declared_attr(cls, **kw)
    ...

Comments (31)

  1. David Lord reporter

    Additionally, it would be nice if declared_attr.__init__ was modified to accept any keyword arguments, so that we can add other instance attributes besides cascading, although it's not necessary for this issue.

  2. Mike Bayer repo owner

    Subclassing of declared_attr is not supported and I don't see how it's necessary? I don't understand what:

    class declared_pkey(declared_attr):
        primary_key = True
    

    is intended to achieve.

    Adding arbitrary attributes to the InstrumentedAttribute is also not supported (use .info), and I don't see how a "MyClass.foo.primary_key" is useful considering that all the other columns / relationships / whatever will not have a "primary_key" attirbute, how would you check that?

  3. Mike Bayer repo owner

    Before a class is mapped, I want to check if it defines a primary key. If the column is defined in a declared_attr, I can't know if it's primary without evaluating the attr

    why would this class use the "IDMixin" if it already defines the thing that IDMixin provides? Why not a simple "defines_primary_key=True" class attribute ?

  4. Mike Bayer repo owner

    I see you have the getattr() in that metaclass and all that. basically we need to step back and talk about what you actually need to achieve b.c. custom metaclasses and subclassing things that dont want to support subclassing is not how to do it.

  5. David Lord reporter

    This is in support of Flask-SQLAlchemy, which generates tablenames automatically, but currently has some bugs / deficiencies. I understand you're not the biggest fan of that feature, but it is something I want to continue to support. Part of that support is attempting to handle as many of SQLAlchemy's options for defining models as possible.

    Part of the process is detecting if the model defines a primary key, or if one of its MRO defines a primary key as a declared attr. However, accessing a declared_attr on a mixin (unmapped) class causes SQLAlchemy to emit a warning. The solution to that was to look at the value in the __dict__, which doesn't execute the property and so doesn't emit the warning. However, now we can't tell if the property defines the primary key.

    This subclass was the most obvious way to support this case to me. Now the property has a primary_key attribute like regular columns, and so the process works correctly. However, if the user then decides to use the cascading option, it breaks the subclass.

    It doesn't seem like there's a reason that subclasses aren't supported here, and the change seems minimal, which is why I decided to report it. As I pointed out, I can also document that the attribute should be set manually, I just figured a subclass used in the same way as declared_attr would be easier to understand for most users.

  6. David Lord reporter

    I would like to be able to detect that the following three examples all define a primary key. The first two are easy. The last one is what I'm trying to solve. This has to happen before the model is mapped because the tablename must be set before the model is mapped.

    class DirectModel(Base):
        # isinstance(value, InstrumentedAttribute); value.primary_key == True
        id = Column(Integer, primary_key=True)
    
    class DirectMixin:
        # isinstance(value, Column); value.primary_key == True
        id = Column(Integer, primary_key=True)
    
    class DeclMixin:
        # isinstance(value, declared_attr); value.primary_key == unknown without access, access emits warning
        @declared_attr
        def id(cls):
            return Column(Integer, primary_key=True)
    
  7. Mike Bayer repo owner

    This is in support of Flask-SQLAlchemy, which generates tablenames automatically, but currently has some bugs / deficiencies. I understand you're not the biggest fan of that feature, but it is something I want to continue to support. Part of that support is attempting to handle as many of SQLAlchemy's options for defining models as possible.

    I'm totally cool with that! I've blogged about auto-tablename patterns and openstack uses that kind of thing like crazy. That's the whole point of __tablename__ and all that.

    Part of the process is detecting if the model defines a primary key, or if one of its MRO defines a primary key as a declared attr. However, accessing a declared_attr on a mixin (unmapped) class causes SQLAlchemy to emit a warning. The solution to that was to look at the value in the dict, which doesn't execute the property and so doesn't emit the warning. However, now we can't tell if the property defines the primary key.

    why must the class be unmapped ?

  8. Mike Bayer repo owner

    also, the warning is to prevent people from making mistakes. in this case, if you want to inspect a mixin, you are not making a mistake. So the real bug is, the warning not having any way of skipping it (besides filtering it, which nobody ever seems to want to do).

  9. Mike Bayer repo owner

    OK so i think what we are warning here on is that when you call upon the @declared_attr without the mapping, you are creating a brand new object and then throwing it away. That is, if you said "MyMixin.some_attr.foo = 'bar'", that "foo" is now gone. Also if there are side effects from the method, like creating a table or constraint that modifies the metadata, this is a problem also.

    would need to see the way you are using this but a "safe" means might be a sub-method ".invoke".

  10. David Lord reporter

    I'm totally cool with that!

    Ha, for some reason I thought you didn't like it, not sure where I got that. :-)

    why must the class be unmapped

    A mixin class is just a plain class, it will never be mapped. We traverse the MRO, so we'd be accessing the attribute on the mixin directly.


    I think the warning is correct, we should not be accessing the value outside a real, mapped model. I can think of plenty of contrived examples that won't work if we access them. If I can come up with contrived examples, I don't want to accidentally break a real use case that I couldn't think of. For example:

    class Mixin:
        @declared_attr
        def id(cls):
            return Column(Integer, primary_key=cls.__name__.lower()[0] == 'a')
    
    class Aardvark(Mixin, Base):
        pass
    
    # If Mixin.id is accessed, we would determine that the column is not a primary key
    # but it will end up on Aardvark and so *should* be a primary key
    

    Again, this is contrived, but I bet there are real cases where this would cause problems.

  11. Mike Bayer repo owner

    this is the thing. if you are making a decision about whether or not there is a primary key, that means you have the table to be mapped. There are events where you can make decisions about the table before the mapper() is applied - such as, "if not table.primary_key: table.append_column(some_primary_key)".

    Since this is an extension aiming to be very generic you might want to break into using events for things. they are more flexible than metaclasses.

  12. Mike Bayer repo owner

    here's what that looks like:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base, declared_attr
    from sqlalchemy import event
    
    Base = declarative_base()
    
    
    @event.listens_for(Base, "instrument_class", propagate=True)
    def _ensure_pk(mapper, class_):
        table = mapper.local_table
        if table is not None and not table.primary_key:
            table.append_column(Column('id', Integer, primary_key=True))
    
    
    class DeclMixin:
        @declared_attr
        def id(cls):
            return Column(Integer, primary_key=True)
    
    
    class NoPk(Base):
    
        __tablename__ = 'foobar'
    
    
    class HasPk(DeclMixin, Base):
        __tablename__ = 'batho'
    
    
    class SingleInhSubcl(HasPk):
        pass
    
    
    print NoPk.id == 5
    print HasPk.id == 10
    print SingleInhSubcl.id == 20
    

    that way you don't have to tell your users to use a special decorator

  13. David Lord reporter

    I've pushed an initial patch to fix tablename generation in Flask-SQLAlchemy. I'm pretty sure I tried moving the tablename detection to a declared_attr or events before and couldn't get it to work in all cases, but I don't remember clearly, it was months ago that I worked on that. If you can offer any insight in the PR, that would be much appreciated. https://github.com/mitsuhiko/flask-sqlalchemy/pull/541

  14. Mike Bayer repo owner

    that approach looks very complicated. did you consider setting up the primary key contract as a simple mediation between the Table and the mapper() as in my example above? Declarative is all about bolting Python class construction patterns into a system that creates mappings - but trying to ensure a mapping has certain attributes set up, that seems like it can happen at a more fundamental level. Obviously for buliding up the name of the table itself, the usual table / tablename hooks need to be employed since you can't create Table without a name (but without a PK column, no problem).

  15. David Lord reporter

    What are "the usual tablename hooks"? A declared attr for tablename doesn't propagate. If an intermediate class sets a tablename directly, subclasses of it need to re-inherit the declared attr or it's not called.

    class NameMixin:
        @declared_attr  # doesn't matter if .cascading is used here
        def __tablename__(cls):
            return cls.__name__.lower()
    
    class User(NameMixin, Base):
        __tablename__ = 'USER'  # interrupts mixin tablename, no longer inherited
        id = Column(Integer, primary_key=True)
    
    class Employee(User):
        id = Column(ForeignKey(User.id), primary_key=True)
        # tablename is not set by attr, even though that's the most likely desired outcome in this case
    

    instrument_class is called too late, if the tablename is not already set, SQLAlchemy errors before the event is triggered. before_ and after_configured do not receive information about the class being configured. mapper_configured is not called consistently (and is also called too late), for example I called metadata.create_all() and mapper_configured wasn't triggered.

  16. Mike Bayer repo owner

    OK I am not doing a good job with communication today, elsewhere as well. The __table__() __tablename__() are "hooks" because they are methods one can implement and they are called when declarative wants to know about the table it is creating. But in your case, you are making use of __table__ and __tablename__ "hooks" in your approach at https://github.com/mitsuhiko/flask-sqlalchemy/pull/541/files#diff-9697d7f9c915064678df24a3479e4972R579 . I tried to state that the approach of using the "instrument_class" event would not be appropriate for this part of what you are trying to do. However the "create a primary key" part of it can.

    Here's a demo of automatic tablename and pk, is this simpler ?

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base, declared_attr
    from sqlalchemy import event
    import re
    
    
    def camel_to_snake_case(name):
        return (
            name[0].lower() +
            re.sub(r'([A-Z])', lambda m: "_" + m.group(0).lower(), name[1:])
        )
    
    
    class SetsTablename(object):
        @declared_attr
        def __tablename__(cls):
            return camel_to_snake_case(cls.__name__)
    
    Base = declarative_base(cls=SetsTablename)
    
    
    @event.listens_for(Base, "instrument_class", propagate=True)
    def _ensure_pk(mapper, class_):
        table = mapper.local_table
        if table is not None and not table.primary_key:
            table.append_column(Column('id', Integer, primary_key=True))
    
    
    class DeclMixin:
        @declared_attr
        def id(cls):
            return Column(Integer, primary_key=True)
    
    
    class NoPkNoTable(Base):
        pass
    
    
    class NoPkHasTable(Base):
    
        __tablename__ = 'foobar'
    
    
    class Abstract(Base):
        __abstract__ = True
    
    
    class HasPkNoTable(DeclMixin, Base):
        pass
    
    
    class HasPkHasTable(DeclMixin, Base):
        __tablename__ = 'batbar'
    
    
    class SingleInhSubcl(HasPkNoTable):
        __tablename__ = None
    
    
    for cls in NoPkHasTable, NoPkNoTable, HasPkNoTable, HasPkHasTable, SingleInhSubcl:
        print cls.__table__.name
        print cls.__table__.primary_key
    
  17. David Lord reporter

    Thanks for continuing to work through this with me. Unfortunately, that example fails to work with the example I showed above; a manually set tablename stops the declared tablename from continuing on to later subclasses.

    What I've found over time with improving the method we use now is that there's always another corner case, which keeps making the declared attr tablename less viable, at least given the attempts I've made in the past. Let me try to explain the "rules" that we've been using for tablename generation.

    • I want the user to specify (or not) the primary key.
      • If they set a primary key, that means they either have a basic table or want joined table inheritance, and need a name set.
      • If they don't set a primary key, that means they want single table inheritance, so a name should not be set.
    • If the user specifies the tablename themselves, this should take priority over generation.
      • If they've used a declared attr, that should be used going forward for the subclass branch.
      • Otherwise, generation should resume in subclasses. This doesn't work with declared_attr; once a class sets it manually it doesn't carry to subclasses unless the attr is mixed in again.
    • Abstract models, mixins, and declared (cascading) attrs should be usable for both the tablename and primary key, in any combination and hierarchy.

    This set of rules is something that I've accumulated over time as the behavior that requires the least manual intervention. Unless the user decides to use single-table instead of joined inheritance, they hopefully never need to worry about setting the tablename manually.

    The metaclass, while perhaps not the "natural" SQLAlchemy way to do things, works, and while I'd love to figure out a better way, I'm ok supporting it as-is. The only issue that comes up is that we can't see into declared attrs safely using this method (at least as far as I can tell). Hence the initial bug report as I tried to find a solution to that.

  18. Mike Bayer repo owner

    So another thing I was trying to indicate in my earlier comment is that your current metaclass approach is doing the tablename thing that you want already, so there is no reason to change it, either. Using the approach you already have does not preclude handling this new primary key case using an event listener. you can use the metaclass to decide tablename and an event listener to find the primary key column if any.

    I am still optimistic that straight usage of the tablename method can achieve all your use cases but this is a separate concern from the primary key issue.

    as far as the bug here, I'd want to know if using an event listener as I've illustrated, just to handle assignment of primary key but not tablename, is sufficient to solve for the use case first linked to the bug. It's true thi is more of a mailing list issue at this point but we're here now and thankfully you are very responsive so it's fine.

  19. Mike Bayer repo owner

    oh. here's the problem:

    "If they don't set a primary key, that means they want single table inheritance, so a name should not be set."

    I remember this issue, and this is unfortunately the source of all your problems. If it were me, I'd simply not have it work this way. Single inheritance can be indicated by any other means, including a flag like "single_inheritance = True", or "tablename = None", etc. The original proposal here included that you were trying to add a new @declared_attr "declared_pkey" in any case, so already this would be new API for Flask-SQLAlchemy. It's a lot simpler IMO to just have a simple flag or mixin on the class.

  20. Mike Bayer repo owner

    But , this case as written still may be doable without the mixin. not sure yet. need to play with it.

  21. David Lord reporter

    Assigning primary keys is not the goal of this bug, that sneaked in as a side effect of one of the examples I gave, sorry for the confusion.

    The goal is to give the user a way to indicate that a cascading declared attr will return a primary key, without needing to evaluate the attr. Alternatively, if I can have a way to evaluate the attr on the class being mapped (value.__get__(cls)) without raising a warning, that may be better. That goes back to a comment you made earlier:

    OK so i think what we are warning here on is that when you call upon the @declared_attr without the mapping, you are creating a brand new object and then throwing it away. That is, if you said "MyMixin.some_attr.foo = 'bar'", that "foo" is now gone. Also if there are side effects from the method, like creating a table or constraint that modifies the metadata, this is a problem also.

    would need to see the way you are using this but a "safe" means might be a sub-method ".invoke".

    I would want to be sure that doing this wouldn't cause unintended side effects.

  22. David Lord reporter

    Hmm, yes, I may be able to work this a different way, by always assuming joined inheritance and generating a tablename, even if no primary key is set, and requiring the user to set __tablename__ = None for single table inheritance.

  23. Mike Bayer repo owner

    well, if im understanding it this time around, you want to create a Table only if a primary key was somehow set up by the mapping, through whatever means, and if there is no primary key, then use single table inheritance. your earlier examples seemed to indicate you wanted to do the standard "create a pk if one is not present" thing but IIUC that is the opposite of what you want to do here.

    This is conditional creation of __table__ and sort of cooincidentally there is a hook already that does this. It is completely undocumented and not part of public API, only used by an internal test fixture. But, it makes what you have described, if im not missing anything, pretty simple. How about this:

    from sqlalchemy import Integer, Column, Table, ForeignKey
    from sqlalchemy.orm import clear_mappers
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.ext.declarative import has_inherited_table
    from sqlalchemy.ext.declarative import declared_attr
    import re
    import unittest
    
    
    def camel_to_snake_case(name):
        return (
            name[0].lower() +
            re.sub(r'([A-Z])', lambda m: "_" + m.group(0).lower(), name[1:])
        )
    
    
    class SetsTablename(object):
        @declared_attr
        def __tablename__(cls):
            return camel_to_snake_case(cls.__name__)
    
        @classmethod
        def __table_cls__(cls, *arg, **kw):
            for col in arg:
                if isinstance(col, Column) and col.primary_key:
                    return Table(*arg, **kw)
            else:
                return None
    
    
    def auto_base():
        return declarative_base(cls=SetsTablename)
    
    
    # that's it.
    
    # ########################################################################33
    
    # test suite.    Can you add new tests here that illustrate cases that
    # still aren't working?
    
    class GenTableTest(unittest.TestCase):
        def setUp(self):
            self.Base = auto_base()
    
        def tearDown(self):
            self.Base = None
            clear_mappers()
    
        def test_spec_pk_direct_noname(self):
            """If they set a primary key, that means they either have a
            basic table or want joined table inheritance, and need a name set."""
    
            class MyClass(self.Base):
                id = Column(Integer, primary_key=True)
    
            self.assertEqual("my_class", MyClass.__table__.name)
    
        def test_spec_pk_abstract_noname(self):
            """If they set a primary key, that means they either have a
            basic table or want joined table inheritance, and need a name set."""
    
            class MyAbstract(self.Base):
                __abstract__ = True
    
                @declared_attr
                def id(cls):
                    return Column(Integer, primary_key=True)
    
            class MyClass(MyAbstract):
                pass
    
            self.assertEqual("my_class", MyClass.__table__.name)
    
        def test_spec_pk_mixin_noname(self):
            """If they set a primary key, that means they either have a
            basic table or want joined table inheritance, and need a name set."""
    
            class MyMixin(object):
                @declared_attr
                def id(cls):
                    return Column(Integer, primary_key=True)
    
            class MyClass(MyMixin, self.Base):
                pass
    
            self.assertEqual("my_class", MyClass.__table__.name)
    
        def test_spec_pk_cascading_noname(self):
            """If they set a primary key, that means they either have a
            basic table or want joined table inheritance, and need a name set."""
    
            class MyMixin(object):
                @declared_attr.cascading
                def id(cls):
                    if has_inherited_table(cls):
                        return Column(ForeignKey('my_class.id'), primary_key=True)
                    else:
                        return Column(Integer, primary_key=True)
    
            class MyClass(MyMixin, self.Base):
                pass
    
            class MySubClass(MyClass):
                pass
    
            self.assertEqual("my_class", MyClass.__table__.name)
            self.assertEqual("my_sub_class", MySubClass.__table__.name)
    
    
    if __name__ == '__main__':
        unittest.main()
    
  24. Mike Bayer repo owner

    here's the test for single, passes:

        def test_spec_pk_mixin_single_inh(self):
    
            class MyMixin(object):
                @declared_attr
                def id(cls):
                    return Column(Integer, primary_key=True)
    
            class MyClass(MyMixin, self.Base):
                pass
    
            class MySubClass(MyClass):
                pass
    
            self.assertEqual("my_class", MyClass.__table__.name)
            self.assertIsNone(MySubClass.__table__)
            self.assertTrue(MyClass.__table__ is MySubClass.__mapper__.local_table)
    

    is that it?

  25. David Lord reporter

    I pushed a solution that simplifies how the metaclass decides to generate the tablename, then uses __table_cls__ to create the table or clear the name. Seems to cover all test cases and open issues about it, I'm just waiting on some confirmation from a couple users.

    I just want to confirm that __table_cls__ won't go anywhere even though it's undocumented. Is it safe to use that in Flask-SQLAlchemy?

    At this point the original issue isn't relevant, so I'll close it. Thanks for your help working through this!

  26. Mike Bayer repo owner

    a test case should be added on the SQLA side to maintain the behavioral contract of this attribute, and there should be a bit of a docstring somewhere. it's currently used as part of test cases within fixtures.

  27. Log in to comment