* Hotfixes to correct errors in TUTORIAL and append the supermatrix FASTA file
to the 'multalign' report.
* New 'supermatrix' pipeline can construct supermatrices by occupancy
* New 'multalign' pipeline uses MAFFT instead of MACSE for multiple alignment
of translated protein sequences. The simultaneous alignment and translation
approach originally implemented in Agalma can improve translations by
accommodating frameshifts; however, mistakenly including fairly distant
homologs or erroneous transcripts within clusters can result in overall poor
translations and alignment of clusters. The old multalign pipeline was
renamed 'multalignx' where the 'x' stands for translated multiple alignment,
since MACSE uses nucleotide alignments to infer translations. (#79)
* Improved the linkage between the phylogeny pipelines, so that the most recent
and correct type of previous runs are identified by default. A previous run
can be explicitly chosen with the --previous argument (now consistent across
all the pipelines). (#85)
* Rewrote the 'assemble' pipeline to subsume the Trinity.pl wrapper script, and
run the various components of Trinity as separate stages within the pipeline.
This provides finer grained resource usage and fixes some problems with
robustness and memory use we were experiencing on our compute cluster. GNU
parallel replaces ParaFly for both the quantify_graph and butterfly stages.
Oases is no longer supported in 'assemble', but additional assemblers could
be added in the future as variants on the 'assemble' pipeline, e.g.
* The report for the 'supermatrix' report now includes a table of the
percentage of genes present for each taxon. (#82)
* The regression tests are taking longer to run (30-40 minutes) and have been
divided up into different levels. The default level (1) now runs in about
(16-cores). Higher levels (2 or 3) provide more complete tests and are
selected with 'agalma test X'. (#92)
* Added a histogram of mean quality scores to the 'sanitize' report. (#90)
* Improved parallelization of the blastx annotation in 'postassemble'. (#53)
* 'homologize' has a new mode for seeding the homology search with an existing
set of genes, such as CEGMA or an previously computed supermatrix. Instead of
performing an all-by-all homology search, transcripts are only aligned
against the seed genes. (#56, #59)
* New parameter in 'genetree' to disable bootstrapping or change the threshold
for filtering by mean bootstrap support. (#60)
* Added multi-node parallelism to 'multalign' and 'genetree' using GNU
parallel. (#58, #61)
* 'postassemble' now performs protein translation (largest open reading frame
with Transdecoder) and transcript quantification (with RSEM). The schema for
the 'sequences' table was updated so that exemplars are now selected as the
transcript with highest abundance in a locus, rather than by the earlier
ad-hoc selection of the longest transcript in the locus. Exemplars are now
chosen in 'homologize' (via 'database.load_seqs') and not in 'postassemble'.
* New 'orthologize' pipeline provides an alternative phylogeny pipeline that
directly infers orthologs using OMA. (#64)
* Sequence reduction plot in the phylogeny report has more detail: added
sequence counts before and after 'homologize.mcl_cluster' and for each filter
applied in 'multalign.refine_clusters'. (#70, #71)
* Fixed a mis-calculation in the overlap threshold applied in
* Added bootstraping to RAxML calls in the 'genetree.genetrees' stage, and a
subsequent filtering stage that removes trees with low mean bootstrap
* Removed the auto-generated report at the end of 'transcriptome' and put the
appropriate report commands in the TUTORIAL. (#51)
* Added report commands to the phylogeny section of the TUTORIAL. (#50)
* Fixed problems with 'tabular_report' that caused unneccessary rows and empty
table cells. (#52)
* A new option '--nreads' for reducing the number of reads that 'sanitize'
* Modified 'load' to correctly validate external assemblies with IUPAC
ambiguity codes. (#41)
* Added 'resource_report' and 'phylogeny_report' utilities.
* Additional reporting for phylogeny pipelines:
o 'genetree' reports maximum likelihood tree when run on a supermatrix.
o supermatrix image in 'multalign', ordered by most complete taxon and gene.
o some histograms were changed to tables for small numbers of taxa.
* Updates to README and TUTORIAL:
o Clarified that the Agalma-bundled SwissProt database only includes Metazoa.
o Fixed overwrite of 'BIOLITE_RESOURCES' variable in TUTORIAL. (#24)
* 'homologize' now ignores bad BLAST hits, that seem to occur for query
sequences longer than 10Kb and in which the original query id is lost in
* Fixed bug with passing flags through to RAxML in 'genetree'. (#19)
* Removed a hard-coded minimum cluster size of 3 from 'multalign' and replaced
with the 'min_taxa' value (which should never be less than 4).
* New mechanism to break up the expensive all-by-all tblastx in 'homologize',
so that many smaller chunks can be run externally/concurrently, and read
back into the pipeline. This feature is not yet tested and we plan to finish
it in the 0.3.3 release.
* Fixed default RAxML model in genetree. (#9)
* New regression test feature 'agalma test' downloads and runs a small
transcriptome and phylogeny example to verify correct installation and
validate changes to the code base.
* Phylogeny pipelines can now pass a common ID with --id and they will
intelligently find the appropriate output from earlier pipelines. Previously,
numeric run IDs had to be passed between pipelines. This is demonstrated in
* Split off part of 'assemble' pipeline into a new 'postassemble' pipeline,
that performs all post-assembly filtering, coverage analysis, and annotation.
It can be run on external (non-Agalma) assemblies prior to load, although
the exemplars stage needs to be skipped if the assembly does not have
* Removed the annotation stage from 'load' pipeline, since this is now
provided by 'postassemble' for external assemblies.
* Updated TUTORIAL now has a more complete phylogeny section and includes
estimates of resources requirements.
* bugfix: typo in 'agalma_database' key in default agalma.cfg
* bugfix: missing 'cd' command in ubuntu install script