load_only omits primary keys when not specified

Issue #3080 resolved
jvanasco created an issue

example attached below; this happens in 0.9.4

if trying to load_only a column of the child relationship and not passing in the primary_key, sqlalchemy isn't happy

the sql generated is fine and runs -- it pulls the specified column and uses query string as an anonymous column ( sidenote: is that really necessary ?)

but the ORM is unhappy and raises an error.

adding the primary key works.

using a joinedload instead of subqueryload works as well.

and hey, i made a proper test case!

CATCH_KNOWN_EXCEPTION = False

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
import sqlalchemy.exc

Base = declarative_base()

class TableA(Base):
    __tablename__ = 'table_a'
    id = Column(Integer, primary_key=True)
    name = Column(String(50))

    bs = relationship("TableB", primaryjoin="TableB.table_a_id==TableA.id")
    cs = relationship("TableC", primaryjoin="TableC.table_a_id==TableA.id")

class TableB(Base):
    __tablename__ = 'table_b'
    id = Column(Integer, primary_key=True)
    name = Column(String(30))
    table_a_id = Column(Integer, ForeignKey('table_a.id'), nullable=False)

class TableC(Base):
    __tablename__ = 'table_c'
    id = Column(Integer, primary_key=True)
    id_alt = Column(Integer, primary_key=True)
    name = Column(String(30))
    table_a_id = Column(Integer, ForeignKey('table_a.id'), nullable=False)


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)

s.add_all([
    TableA(id=1, name='a',),
    TableA(id=2, name='aa',),
    TableA(id=3, name='aaa',),
    TableA(id=4, name='aaaa',),
    TableB(id=1, name='b', table_a_id=1),
    TableB(id=2, name='bb', table_a_id=2),
    TableB(id=3, name='bbb', table_a_id=3),
    TableB(id=4, name='bbbbb', table_a_id=4),
    TableC(id=1, id_alt=10, name='c', table_a_id=1),
    TableC(id=2, id_alt=11, name='cc', table_a_id=2),
    TableC(id=3, id_alt=12, name='ccc', table_a_id=3),
    TableC(id=4, id_alt=13, name='cccc', table_a_id=4),
])

s.commit()


q = s.query( TableA ).filter( TableA.id == 2 )

## this passes
try:
    q_good = q.options( subqueryload('bs').load_only('id', 'name') )
    result = q_good.first()
    print "expected - can load `bs` with primary_key + desired column"
except:
    raise

try: 
    q_bad = q.options( subqueryload('bs').load_only('name') )
    result = q_bad.first()
except sqlalchemy.exc.NoSuchColumnError, e:
    print "expected - CAN NOT LOAD `bs` with only desired column"
    if not CATCH_KNOWN_EXCEPTION:
        raise

try:
    joinedload_works = q.options( joinedload('bs').load_only('name') )
    result = joinedload_works.first()
    print "expected - can load `bs` with joinedload only using the target column"
except:
    raise

try: 
    q_bad = q.options( subqueryload('cs').load_only('name') )
    result = q_bad.first()
except sqlalchemy.exc.NoSuchColumnError, e:
    print "expected - CAN NOT LOAD `cs` with only desired column"
    if not CATCH_KNOWN_EXCEPTION:
        raise

## this also fails, only one key
try: 
    q_bad = q.options( subqueryload('cs').load_only('id', 'name') )
    result = q_bad.first()
except sqlalchemy.exc.NoSuchColumnError, e:
    print "expected - CAN NOT LOAD `cs` with only 1/2 primary keys + desired column"
    if not CATCH_KNOWN_EXCEPTION:
        raise

try:
    q_good = q.options( subqueryload('cs').load_only('id', 'id_alt', 'name'))
    result = q_good.first()
    print "expected - can load `cs` with subqueryload using the target column + BOTH primary keys"
except:
    raise



s.close()

Comments (9)

  1. jvanasco reporter

    oh wait, that's not in the testcase I uploaded. but it's in mine

    joinedload_works = q.options( joinedload('bs').load_only('name') )
    print "trying joinedload_works"
    result = joinedload_works.first()
    
  2. jvanasco reporter
    • edited description

    edited the testcase : * show ALL primary keys are required for subqueryload * added CATCH_KNOWN_EXCEPTION boolean to disable known exceptions (default is False)

  3. Mike Bayer repo owner
    • Modified the behavior of :func:.orm.load_only such that primary key columns are always added to the list of columns to be "undeferred"; otherwise, the ORM can't load the row's identity. Apparently, one can defer the mapped primary keys and the ORM will fail, that hasn't been changed. But as load_only is essentially saying "defer all but X", it's more critical that PK cols not be part of this deferral. fixes #3080

    → <<cset 33cb84945f52>>

  4. Mike Bayer repo owner
    • Modified the behavior of :func:.orm.load_only such that primary key columns are always added to the list of columns to be "undeferred"; otherwise, the ORM can't load the row's identity. Apparently, one can defer the mapped primary keys and the ORM will fail, that hasn't been changed. But as load_only is essentially saying "defer all but X", it's more critical that PK cols not be part of this deferral. fixes #3080

    → <<cset 023d818f21ad>>

  5. John Prior

    Hello - I am trying to use the load_only functionality, but my primary keys have the same name across tables.

    from sqlalchemy import create_engine
    from sqlalchemy import Column, Integer, String, ForeignKey
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import Load, relationship, load_only
    
    Base = declarative_base()
    
    class Table1(Base):
        __tablename__ = 'table1'
        global_id = Column(Integer, primary_key=True)
        table1_val = Column(String)
        r1 = relationship('Table2', backref = 'r2')
    
    
    class Table2(Base):
        __tablename__ = 'table2'
        global_id = Column(Integer, ForeignKey('table1.global_id'), primary_key=True)
        table2_val = Column(String)
    
    from sqlalchemy.orm import sessionmaker
    some_engine = create_engine('postgresql://scott:tiger@localhost/')
    Session = sessionmaker(bind=some_engine)
    session = Session()
    
    query = session.query(Table1, Table2).join(Table2).options(Load(Table1).load_only('table1_val'), Load(Table2).load_only('table2_val'))
    
    import pandas as pd
    df = pd.read_sql(query.statement, query.session.bind)
    

    Results in this error: ValueError: Encountered duplicate field name: 'global_id'

    More info can be found at (http://stackoverflow.com/questions/35966988/load-subset-of-joined-columns-in-sqlalchemy)

  6. Mike Bayer repo owner

    hello -

    please post your issue either on the mailing list or if you can produce a complete test case as a new issue. Please provide a complete test that does not require pandas using the guidelines at http://stackoverflow.com/help/mcve.

    Below, your code with a simple persist and a query produces no error:

    from sqlalchemy import create_engine
    from sqlalchemy import Column, Integer, String, ForeignKey
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import Load, relationship, load_only
    
    Base = declarative_base()
    
    class Table1(Base):
        __tablename__ = 'table1'
        global_id = Column(Integer, primary_key=True)
        table1_val = Column(String)
        r1 = relationship('Table2', backref = 'r2')
    
    
    class Table2(Base):
        __tablename__ = 'table2'
        global_id = Column(Integer, ForeignKey('table1.global_id'), primary_key=True)
        table2_val = Column(String)
    
    from sqlalchemy.orm import sessionmaker
    some_engine = create_engine('sqlite://')
    Base.metadata.create_all(some_engine)
    Session = sessionmaker(bind=some_engine)
    session = Session()
    
    session.add(Table1(r1=[Table2()]))
    query = session.query(Table1, Table2).join(Table2).options(
        Load(Table1).load_only('table1_val'), Load(Table2).load_only('table2_val'))
    
    print query.all()
    

    output:

    [(<__main__.Table1 object at 0x7f98c2271ad0>, <__main__.Table2 object at 0x7f98c2271f10>)]
    
  7. Log in to comment