database.export_exemplars selects exemplars by taking the transcript with the largest confidence score from RSEMeval from the set of transcripts sharing the same gene ID. However the way the sql command for it is written results in selecting a single exemplar across the set of transcripts for ALL species sharing the same gene ID, because gene IDs by themselves are not unique. So for example, if species A and species B both have gene IDs DN2_c0_g1 (from Trinity's naming convention), then
export_exemplars will return only a single exemplar instead of 2, 1 for species A and 1 for species B.
The issue here is in
GROUP BY genes.gene, which groups by the non-unique gene IDs. Instead we should be grouping by the species ID or a proxy for the species ID, such as
agalma_models.catalog_id. This bug has been present since at least agalma/1.0.0. A sqlite output file has been attached to show the bug; the agalma database in the file is from the regression test.