association object can be inserted twice

Issue #134 resolved
Mike Bayer repo owner created an issue

the ZBlog demo can illustrate this. this relationship in mapper.py:

Topic.mapper = mapper(Topic, tables.topics)

TopicAssociation.mapper = mapper(TopicAssociation,tables.topic_xref, 
                primary_key=[tables.topic_xref.c.topic_id](tables.topic_xref.c.post_id,), 
                properties={
                    'topic':relation(Topic.mapper, lazy=False),
                })
Post.mapper = mapper(Post, posts_with_ccount, properties={
    'id':posts_with_ccount.c.post_id,
    'topics':relation(TopicAssociation, lazy=False, private=True, association=Topic)
}, is_primary=True, order_by=[desc(posts_with_ccount.c.datetime)](desc(posts_with_ccount.c.datetime)))

this creates TopicAssociation as a many-to-many association object. If you go to the interface and enter a post, and put the same topic name twice, it saves. This is the bug. the list of TopicAssociations should somehow be unique based on the Topic object its attached to, just like any other collection. then if you go to update that post, the update on the topicassocaitions returns 2 rows instead of 1 and the concurrency detector goes off.

Comments (6)

  1. Former user Account Deleted

    Does the following test-case correctly reproduce the issue? If so, then I don't think this is a bug in SA. Rather, it's a mis-configuration of the tables. There should be a UNIQUE constraint on the "enrol" table (note that I had to remove the primary_key=True directives on enrolTbl to reproduce the problem). This is a constraint issue, which we probably want to have the database enforce rather than SA.

    The topic_xref table in zblog should be configured with primary_key=True on its *_id columns. That is unless it should allow multiple references to the same topic from a given post, in which case it will need a separate primary key column. If that's the case then Topics and Posts don't really have a true many-to-many relationship.

    The other possible way to address this situation is to have ListAttribute or HistoryArraySet check the PK's of inserted/appended objects against other objects already in the list (i.e. enforce the set property more strictly). This leads to other questions though. For example: should ListAttribute verify that its objects can actually be inserted into the table to which it (ListAttribute) is mapped?

    This brings up another issue though: could the *Set types in SA be eliminated in favor of the built-in set type in Python?

    class M2MTest4(testbase.AssertMixin):
        def setUpAll(self):
            self.install_threadlocal()
            metadata = testbase.metadata
            global studentTbl
            studentTbl = Table('student', metadata, Column('name', String(20), primary_key=True))
            global courseTbl
            courseTbl = Table('course', metadata, Column('name', String(20), primary_key=True))
            global enrolTbl
            enrolTbl = Table('enrol', metadata,
                Column(
                    'student_id',
                    String(20),
                    ForeignKey('student.name'),
                    #primary_key=True, # uncomment for correct behavior
                ),
                Column(
                    'course_id',
                    String(20),
                    ForeignKey('course.name'),
                    #primary_key=True, # uncomment for correct behavior
                )
            )
    
            studentTbl.create()
            courseTbl.create()
            enrolTbl.create()
    
        def tearDownAll(self):
            enrolTbl.drop()
            studentTbl.drop()
            courseTbl.drop()
            #testbase.db.tables.clear()
            self.uninstall_threadlocal()
    
        def setUp(self):
            objectstore.clear()
            clear_mappers()
    
        def tearDown(self):
            enrolTbl.delete().execute()
            courseTbl.delete().execute()
            studentTbl.delete().execute()
    
        def test_multiple_relations_to_same_entity_arcane(self): 
            class Student(object):
                def __init__(self, name=''):
                    self.name = name
            class Course(object):
                def __init__(self, name=''):
                    self.name = name
            class Enrol(object):
                def __init__(self, student_id='', course_id=''):
                    self.student_id = student_id
                    self.course_id = course_id
            Student.mapper = mapper(Student, studentTbl)
            Enrol.mapper = mapper(Enrol, enrolTbl,
                primary_key=[enrolTbl.c.course_id](enrolTbl.c.student_id,),
                properties={'student':relation(Student.mapper, lazy=False)}
            )
            Course.mapper = mapper(Course, courseTbl,
                properties={
                    'students': relation(Enrol, lazy=False, private=True, association=Student)
                }
            )
            c1 = Course('Course1')
            s1 = Student('Student1')
            c1.students.append(Enrol(s1.name, c1.name))
            c1.students.append(Enrol(s1.name, c1.name))
            self.assert_(len(c1.students) == 2)
            objectstore.flush() # this will produce a constraint violation if the
                                # "enrol" table has a unique constraint on its PK
            objectstore.clear()
            c = Course.mapper.get_by(name='Course1')
            self.assert_(len(c.students) == 1) # PASS (SA consolidates the two rows on select)
            enrol_count = Enrol.mapper.count(course_id='Course1', student_id='Student1')
            self.assert_(enrol_count == 1) # FAIL
    
  2. Mike Bayer reporter

    maybe its not a "critical" bug....but its less than ideal that regular one-to-many and many-to-many relationships, since they use a "Set" object, as well as the workings of the mapper which loads into lists uniquely, automatically enforce a uniqueness regardless of the presence of "unique" constraints on columns, but then if you use the association object mapping, then this behavior goes away. also i am not sure how well databases like SQLite can handle a "composite primary key" on a many-to-many table.

    While i think constraints should be used to enforce things like this, I have always liked that if you have a database without constraints (which is the case not only for databases like sqlite and mysql which have poor support for constraints, but also is the case in certain shops who just turn off constraints for a big performance gain), SA is at least consistent in preventing the most simple of constraint violations on its own; this is an example of where its not consistently predictable.

    as far as Set types, in many cases I am using sets.Set for set functionality (still trying to keep a modicum of 2.3-ness in the core). however the HashSet does something that Set does not, which is it can maintain insert ordering. i've no problem with replacing HashSet with something more directly Set derived if someone wants to provide one, it just needs to maintain insert order. i would love to use base python types instead of HashSet and OrderedDict if they were available.

    There is also a possible need for a "weak referencing" set, in #130, which would be used specifically for storing "transient" objects in the Session. although now that the threadlocal behavior is non-default maybe the need for that is less urgent, since people can now be explicit about which instances go into a session and which ones do not.

    as far as HistoryArraySet, I have mentioned it just now in #186, that object does a whole lot more than a Set does, including maintaining insert ordering as well as keeping track of added/removed items, and is meant to masquerade as a regular list object. so while i think that object needs some work as well as a name change, something like it is still going to be needed.

    we also cant have HistoryArraySet or ListAttribute have any awareness of pk's...it first represents them "reaching up" into a higher level of abstraction and also objects can be inserted into these lists that are unsaved and therefore have no primary keys.

    (free-thinking:) we might be able to achieve some of this via playing around with __cmp__(), although id rather not enforce a __cmp__() requirement on objects. there could maybe be a wrapper used internally, which takes a strategy driven by the kind of relationship used:

        class ObjWrapper(object):
            def __init__(self, instance, strategy):
                # ....
            def __cmp__(self, other):
                return self.strategy.cmp(self.instance, other)
    

    then an association relationship would use a strategy that makes the jump from the "association" object to "associated" object that actually counts for the uniquenes test (assume a more dynamic version of this code):

        class AssociationComparisonStrategy(ComparisonStrategy):
            def cmp(self, instance, other):
                return instance.associated_object.__cmp__(other.associated_object)
    

    im just dreaming up random things on this one, not sure if that would really work.

  3. Former user Account Deleted

    Mike wrote:

    we also cant have HistoryArraySet or ListAttribute have any awareness of pk's...it first represents them "reaching up" into a higher level of abstraction and also objects can be inserted into these lists that are unsaved and therefore have no primary keys.

    But that's what a {{{set}}} does; it enforces a "unique" constraint on a bunch of objects. The current semantics of HistoryArraySet/ListAttribute are confusing:

    >>> foo = Foo(id=1)
    >>> bar = Bar(id=1)
    >>> foo.bars.append(bar)
    >>> foo.bars.append(bar)
    >>> len(foo.bars)
    1
    >>> foo.bars.append(Bar(id=1))
    >>> len(foo.bars)
    2
    

    I think appending a second instance with the same primary key should raise an exception. That would take care of the original issue in this ticket.

    As for dealing with new objects, it should fall back on the python {{{id(obj)}}} function when a primary key has not been assigned. I'm taking this idea straight from Hibernate, which provides several configurable ways to detect if an object is "unsaved", the default being when the PK is None.

    In the case where SA is managing a list of objects that are not necessarily unique I would say it should use a {{{list}}}-based collection since {{{set}}} semantics do not fit that scenario anyway.

    I propose that SA should change to use a {{{list}}} when it may return non-unique results and a {{{set}}} when the results are guaranteed to be unique (i.e. if a key or association has been defined on the collection). Maybe we can make a generic History type that can be backed by either a {{{set}}} or a {{{list}}}. If that's not possible, I suppose we will need both a HistoryList and a HistorySet implementation.

    You're right, we definitely need a {{{SortedSet}}} implementation. The unordered sets provided by Python will not work for database results, which are assumed to at least retain their original order from one iteration to the next. I have noticed that you are using your own implementation of OrderedDict to implement the various Set types in the util package. Would you be interested in using the OrderedDict implementation written by Nicola Larosa and Michael Foord (http://www.voidspace.org.uk/python/odict.html )? It has good documentation and a very well-defined interface. Using it would save a lot of work in creating and maintaining basic data structures in SA. I believe the licensing would also work out since it's BSD.

  4. Mike Bayer reporter

    But that's what a set does; it enforces a "unique" constraint on a bunch of objects. The current semantics of HistoryArraySet/ListAttribute? are confusing: I think appending a second instance with the same primary key should raise an exception. That would take care of the original issue in this ticket.

    Well, its acting exactly like a Set right now in that regard, essentially using only id(). Since the identity map keeps things unique based on primary key, the id() of primary-key valued objects is properly unique.

    The condition you mention will eventually raise an error right now because session semantics will raise an exception that either there's a Bar(id=1) already in the session or that theres a Bar attached to Foo which is not attached to the session of Foo.

    The problem here isnt so hard to fix; HistoryArraySet just needs a little extra grease in the case of the association mapping, to wrap instances in an object that overrides __cmp__() to look at the child object of the association.

    im not opposed to explicitly checking primary key values (although i wouldnt want an instant exception to be raised...), but that means more complexity introduced, and i think session semantics take care of it for the moment. HistoryArraySet already introduces a performance overhead id like to keep it as lightweight as possible. also when i said "historyarrayset should have no awareness of pks" i meant that the primary-key comparison logic would not be within HistoryArraySet itself, it would have to be made available through some layer of abstraction...most likely, the same wrapper with __cmp__() i am mentioning.

    As for dealing with new objects, it should fall back on the python id(obj) function when a primary key has not been assigned. I'm taking this idea straight from Hibernate, which provides several configurable ways to detect if an object is "unsaved", the default being when the PK is None.

    SA does this essentially by checking for the presence of _instance_key, which is the primary key plus the associated mapper (which is important in the case of, class Bar is stored in two different tables by two different mappers).

    In the case where SA is managing a list of objects that are not necessarily unique I would say it should use a list-based collection since set semantics do not fit that scenario anyway.

    Theres no case where SA's mapper manages a list of non-unique objects; its essentially not supported (except in this ticket, where it produces erroneous results).

    Would you be interested in using the OrderedDict? implementation written by Nicola Larosa and Michael Foord (http://www.voidspace.org.uk/python/odict.html )?

    Well, the .py module here would have to be included with the dist (no external downloads).....while it seems nice, im not sure of the advantages to it? this would only replace SA's OrderedDict, theres still no Set, SortedSet here, etc. SA's OrderedDict is like 8 lines of code and has a very well defined interface.....its just dict. odict.py is...1541 lines ? seems pretty chunky. its already been observed that SA's attribute managing logic is a big speed hit for creating large numbers of objects....so i like that the datastructures behind this logic are at least as small and lightweight as possible (which is why a Python-included SortedSet would be great since it could possibly be natively implemented....)

    I do want to give HistoryArraySet an overhaul at some point, including to support "extra lazy loading" which would require triggering callables for individual and sliced attribute access. that task is a little daunting, considering all the "added"/"deleted" histories it has to maintain, as well as the total ordering.

  5. Mike Bayer reporter
    • marked as

    you know, i just had an "aha" (or maybe a "duh") moment, which is that my whole idea of looking at the "associated" object instead of the "association" object is wrong; because the whole point of an "association" object is that you are adding information to an association, and therefore the same child object can be associated more than once with different accompanying information. so this would seem that yes, this whole thing is not much of a "bug" and database constraints are the only way to really check on this.

    Also, I should clarify/correct myself the example with this:

       list.append(Bar(id=1))
       list.append(Bar(id=1))
    

    is actually going to fail when then two Bar objects are inserted, becuase it will use "1" for the primary key of both and fail on a PK constraint.

    im generally very hesitant to add database-constraint checking logic at the object manipulation level; because it highly complicates the code, impacting its stability and performance, and also becomes a wild goose chase where you eventually are duplicating an entire RDBMS in memory. if HistoryArraySet was looking at the "id=1" primary key part of the data, it seems to me that we are getting into runtime constraint checking....and its true that i would like SA to be as minimal as possible and i dont want to get into constraint checking on the object side, because then people start wanting the full scope of everything (people have asked for object-side foreign key checking, for example....a pretty tall order, and a real reinvention of the wheel). Perhaps someday there will be an extension that addresses this kind of thing in some way.

    so i am leaning towards relying on plain constraint checking here on this one now, your first suggestion. i suppose if you created tables with no primary key constraints, and inserted two Bar(id=1) objects, and then selected them back and updated again youd get the same buggy behavior.

    im going to try going into the ZBlog demo, sticking an appropriate __cmp__() method on my TopicAssociation objects, and seeing if that can be the "lightweight" way to work around this issue. because im really trying to keep SA as lightweight as I can and as much out of the "manages how you create/connect your objects together" space as possible.

  6. Mike Bayer reporter

    this problem got worse with recent improvements to the unit of work, so i have made another pass through the "association relation" code and made it simpler, and it also resolves this problem, changeset:1523

  7. Log in to comment