mapped_collection should honor explicit dictionary key (or at least raise an error if a mismatch is present)

Issue #886 resolved
Mike Bayer repo owner created an issue

the example illustrates using attribute_mapped_collection and assignment of a full dictionary. the value in the dictionary does not yet have the "key" attribute assigned, but shouldn't be needed since we are passing in the full dict (or if this is not supported should raise an error. this might be a "quick fix" for now at least if the error says 'not supported yet').

later, the mismatched dictionary creates a KeyError inside of attributes.CollectionAttributeImpl.set() when the "remove" event fires on the old dictionary. we should at least detect this condition if not fix it altogether.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.orm.collections import attribute_mapped_collection

engine = create_engine('sqlite://')

class Base(object):
    def __init__(self, **kwargs):
        for k in kwargs:
            setattr(self, k, kwargs[k](k))

class Product(Base):
    pass

class ProductDescription(Base):
    pass

class Language(Base):
    pass

meta = MetaData(engine)

products = Table('products', meta, 
    Column('id', Integer, primary_key=True))

languages = Table('languages', meta, 
    Column('id', Integer, primary_key=True),
    Column('name', String(50))
    )

product_descriptions = Table('product_descriptions', meta, 
    Column('product_id', Integer, ForeignKey('products.id'), primary_key=True),
    Column('language_id', Integer, ForeignKey('languages.id'), primary_key=True),
    Column('description', String(255)),
    )

meta.create_all()

languages.insert().execute(id=1, name='english')

mapper(Product, products, properties={
    'descriptions':relation(ProductDescription, 
    collection_class=attribute_mapped_collection('language_id'), cascade="all, delete-orphan")
})
mapper(ProductDescription, product_descriptions, properties={
    'language':relation(Language)
})
mapper(Language, languages)


sess = create_session()
english = sess.query(Language).filter(Language.name=='english').one()

p1 = Product(descriptions={english.id:ProductDescription(language=english, description='description #1')})

# this is the source of the bug.  ProductDescription doesn't have a language_id yet, so it gets placed 
# under the key "None".  if you set language_id=english.id on ProductDescription, then it works
try:
    assert english.id in p1.descriptions
except AssertionError:
    print "assertion failed: english.id is not the key"

try:
    assert None not in p1.descriptions
except AssertionError:
    print "assertion failed: the ProductDescription is keyed underneath 'None'"

sess.save(p1)
sess.flush()

# later, when we want to change things, attributes.set() gets screwed up as well
p1.descriptions = {english.id:ProductDescription(language=english, description='description #2')}  # <-- KeyError

Comments (12)

  1. jek

    Yep- the keying function for mapped collections must be immutable. The error here is using the session-scope mutable 'language_id' as the mapping attribute- change it to 'language' and it will work as indended.

  2. jek

    the collection protocol could be extended with a 'construct by bulk assignment' type of hook- that would allow a dict-based collection to inspect the incoming key/value pairs and assert their accuracy and uniqueness. such a hook might also improve the semantics of bulk assignment for object-based collections.

  3. Mike Bayer reporter

    so it would appear that saying myobject.collection = {'foo':someinstance} should not be allowed at all, since we never look at the keys in the dictionary...you should instead assign using a regular iterable, and it pulls the keys using the function. at any rate this prevents the error from occuring at all. whatever we do, we definitely can't leave it the way it is now, i shouldn't be allowed to set invalid data.

  4. jek

    also, the bulk assignment hook is probably super simple, 15 min plus test updates. i'd prefer that to changing the external assignment api at this late date.

  5. Mike Bayer reporter

    yeah, bulk assignment hook would be best here to check incoming data for accuracy.

    heres a crazy shot in the dark....how come the keyfunc is even needed at all when assigning a complete dict ?

  6. jek

    collections doesn't touch the user's collection- the only available interface for mutation is .append(scalar) and .remove(scalar)

  7. Mike Bayer reporter

    oh and as far as why, because its not like "some_integer='ABCD'". if you do that, the database gets a hold of it and complains clearly (or youre using sqlite, and it just comes back the same way you set it). here, its more like "some_integer='ABCD'" and it silently assigns "some_integer=None" instead, and then some other crazy side effects crash the application a few steps later, telling you pretty much nothing about what you did wrong.

  8. jek

    digging in to this, it looks like what's needed is a function that accepts an incoming collection and returns an iterable of stuff that can be appended. validation can happen in there if the collection wants to. a naive dict implementation would be something like 'checker = lambda incoming: incoming.itervalues()'.

    the fallout from ticket #834 pretty much rules out the super-simple approach, but this isn't too bad. it would also possibly push all of the __set__ policy out of attributes.py- currently the orm decrees that only list-like objects can be bulk-assigned to list-like collection attributes. this hook would allow that to be overridden.

  9. Log in to comment