maintain duplicates in history lists

Issue #1232 resolved
Former user created an issue

This problem occurs in sqlalchemy 4.7.1, using oracle 9.

Case scenario: I had a job table, and a task task, and a jabs2tasks table to create a many to many relation. I created a job and a task, and added the job twice to the task (using a relation that have a secondary). len(task.jobs) returned 2. Then I saved the task and commited. I was expecting that an exception would be thrown, but it didn't. When I looked in at jobs2tasks in my SQLNavigator, I saw that only one entry was enetered.

The problem is, that this behaviour is not expected. When I try to enter, for example, a task that already exists in the db (the same task_id), sqlalchemy throw a constraint exception. I think that this is exactly the same case, but with a complex primary key.

Reported by Kobi Perl.

Comments (6)

  1. Former user Account Deleted

    There is no "second case". I tried to say that if it throws me a constraint exception when I enter two tasks with the same id, it should do the same when I enter two "jobs2task" entries with the same complex primary key - THROW AN EXCEPTION that a constraint was broken.

    I can use collection_class=set to ensure that I'm really working with a set. This is a fine solution (And BTW I think that set should be the default and not list).

    But still, it's logical to throw an exception, when I do choose to use a list, and have a task that contains two jobs with the same primary key in its jobs list...

    When does the current behaviour is useful? make sense? Why should one be able to do something like:

    j = Job()
    t = Task()
    t.jobs.append(j)
    t.jobs.append(j)
    # Print 2
    print len(t.jobs)
    session.save(t)
    t_id = t.task_id
    session.commit()
    t = session.query(Task).findone(t.id = t_id)
    # Print 1
    print len(t.jobs)
    
  2. Mike Bayer repo owner

    sqlalchemy relies upon the database to enforce constraints - we don't duplicate that behavior - because a. the database does it much better, b. we try not to intrusively get in the way of the relational dialogue the database produces, and c. because it would impact performance negatively for all instrumented lists to have a backing set with constraint checking logic going on, especially when this logic would be completely redundant.

    If your jobs2task table specifies the primary key of each related table as a composite primary key, or otherwise has a UNIQUE constraint for those two columns, a constraint exception will be thrown.

    as far as set being the default, it should have been the default but the original behavior of relation() was that the collections were automatically ordered. "set" was also not as widely used in the 2.3 series of Python. so there is some historical reference here. a better solution, perhaps one for 0.6, would be for collection_class to be a required field for all collection-holding relations.

  3. Mike Bayer repo owner

    My apologies, the many-to-many table itself only encounters one insert() statement and the constraint violation is not raised.

    (if you attached a test case that would have communicated the issue more clearly. this is why its in big red letters...).

    Here's a patch which maintains duplicates in the history list:

    Index: lib/sqlalchemy/orm/attributes.py
    ===================================================================
    --- lib/sqlalchemy/orm/attributes.py    (revision 5330)
    +++ lib/sqlalchemy/orm/attributes.py    (working copy)
    @@ -1427,11 +1427,18 @@
                 elif original is NEVER_SET:
                     return cls((), list(current), ())
                 else:
    -                collection = util.OrderedIdentitySet(current)
    -                s = util.OrderedIdentitySet(original)
    -                return cls(list(collection.difference(s)),
    -                           list(collection.intersection(s)),
    -                           list(s.difference(collection)))
    +                collection = util.IdentitySet(current)
    +                s = util.IdentitySet(original)
    +
    +                added = collection.difference(s)
    +                unchanged = collection.intersection(s)
    +                deleted = s.difference(collection)
    +                # maintain duplicates
    +                return cls(
    +                    [for x in current if x in added](x),
    +                    [for x in current if x in unchanged](x),
    +                    [for x in original if x in deleted](x)
    +                )
             else:
                 if current is NO_VALUE:
                     if original not in [NEVER_SET, NO_VALUE](None,):
    

    test case

    from sqlalchemy import *
    from sqlalchemy.orm import *
    
    engine = create_engine('sqlite://', echo=True)
    metadata = MetaData(engine)
    
    t1 = Table('t1', metadata,
        Column('id', Integer, primary_key=True),
        Column('data', String)
    )
    
    t2 = Table('t2', metadata,
        Column('id', Integer, primary_key=True),
        Column('data', String)
    )
    
    t3 = Table('t3', metadata,
        Column('t1id', Integer, ForeignKey('t1.id'), primary_key=True),
        Column('t2id', Integer, ForeignKey('t2.id'), primary_key=True),
    )
    
    metadata.create_all()
    
    class T1(object):
        pass
    
    class T2(object):
        pass
    
    mapper(T1, t1, properties={
        't2s':relation(T2, secondary=t3)
    })
    mapper(T2, t2)
    
    a = T1()
    b = T2()
    
    sess = create_session()
    a.t2s.append(b)
    a.t2s.append(b)
    sess.add(a)
    sess.flush()
    

    I'm not sure if I want to add this into 0.5 as we were done with release candidates and this could be a surprise for some. will have to test some more.

  4. Log in to comment