Using aliases only in func.substr() leads to extra table in select()'s join clause

Issue #1411 resolved
Former user created an issue

I'm constructing a join() object that includes two alias()'es to the same table, and then using that as the from_obj= parameter to select(). This select() also has a complex expression as one of the columns being selected, which involves the column aliases. select() then appears to helpfully decide that it needs to join that same (gigantic) table back to the join object with no additional conditions...

    def __init__(self, chado, organism, id=None, name=None):
        **Must supply either ID or uniquename.**
        self._chado = chado
        feature = self._chado.tables["feature"]("feature")
        featureloc = self._chado.tables["featureloc"]("featureloc")
        feat1 = feature.alias()
        feat2 = feature.alias()
        jn = feat1.join(
            featureloc, feat1.c.feature_id == featureloc.c.feature_id).join(
            feat2, feat2.c.feature_id == featureloc.c.srcfeature_id)
        sel = select([featureloc, func.substr(feat2.c.residues, featureloc.c.fmin+1, featureloc.c.fmax-featureloc.c.fmin).label("sequence")](feat1,),
                     and_(feat1.c.organism_id == organism.organism_id, feat1.c.type_id == self._chado.get_cv_term("sequence", "gene")),
                     from_obj=jn)
        if id is not None: sel = sel.where(feature.c.feature_id == id)
        if name is not None: sel = sel.where(feature.c.uniquename == name)
        print str(sel) % sel.compile().params

This produces the following SQL query:

SELECT feature_1.feature_id, feature_1.dbxref_id, feature_1.organism_id, feature_1.name, feature_1.uniquename, feature_1.residues, feature_1.seqlen, feature_1.md5checksum, feature_1.type_id, feature_1.is_analysis, feature_1.is_obsolete, feature_1.timeaccessioned, feature_1.timelastmodified, featureloc.featureloc_id, featureloc.feature_id, featureloc.srcfeature_id, featureloc.fmin, featureloc.is_fmin_partial, featureloc.fmax, featureloc.is_fmax_partial, featureloc.strand, featureloc.phase, featureloc.residue_info, featureloc.locgroup, featureloc.rank, substr(feature_2.residues, featureloc.fmin + 1, featureloc.fmax - featureloc.fmin) AS sequence FROM feature AS feature_1 JOIN featureloc ON feature_1.feature_id = featureloc.feature_id JOIN feature AS feature_2 ON feature_2.feature_id = featureloc.srcfeature_id, feature WHERE feature_1.organism_id = 6 AND feature_1.type_id = 790 AND feature.uniquename = 'AT1G01010-TAIR-G'

The thing to note is the extra ", feature" just before the WHERE clause. It's not coming from the join() object:

>>> str(jn)
'feature AS feature_1 JOIN featureloc ON feature_1.feature_id = featureloc.feature_id JOIN feature AS feature_2 ON feature_2.feature_id = featureloc.srcfeature_id'

So I can only conclude select() must be helping me out here. The result is catestrophic -- feature has millions of rows.

Interestingly, if I comment out the ".alias()" for feat1, this problem goes away -- this is why I suspect it's select() that's getting confused:

        feat1 = feature #.alias() -- this fixes the problem

Comments (2)

  1. Mike Bayer repo owner

    as far as i can tell, "feature" is added to the FROM clause because you are adding it in a WHERE criterion:

    sel.where(feature.c.uniquename == name)
    

    no bug here. please confirm and we can close this ticket.

  2. Log in to comment