agalma / ISSUES

Full commit
Known Issues

We are working on addressing the following issues in a future release:

* MACSE marks the presence of stop codons in the AA sequence with a '*' but
  does not mark stop codons in DNA sequences. From the MACSE paper:

"... at the nucleotide level, MACSE uses the symbol ‘‘!’’ to represent
deletions of one or two nucleotides that induce frameshifts and it uses no
special representation for the stop codon."

 In multalign.remove_frameshifts, we can truncate sequences at the first stop
 codon found in the AA sequence, but doing so for DNA sequences is harder
 (it requires considering all of the possible frames). Currently, we do not
 truncate the DNA sequences at stop codons.

* Annotated contigs from the transcriptome pipeline are sometimes misformatted
  and missing the sequence.

* Some SwissProt descriptions are messy, and contain multiple '>' characters,
  which can confuse programs that look for '>' as the ID line delimiter for
  FASTA format. We plan to replace the '>' with another character in the
  Agalma-optimized SwissProt database.