1. Casey Dunn
  2. agalma
  3. Issues
Issue #20 resolved

Multalign - align_sequences runs too long on real data

Sergey Naumenko
created an issue

Hi!

I've started 4 transcriptomes with multalign pipeline, and it is running >4 days on 5 cores for 1600 genes. What will be in the case of 30 species?

I think it is because MacSE aligner written in java. Muscle does this alignements in about ~2h using 1 core.

How can I speed up the computation? Can I use muscle manually in-between of stages instead of MacSE? Or MacSE also finds ORFs? Can I speed up MacSE? Maybe I could use Muscle for alignement and MacSE only for ORF finding?

Thanks! SN

Comments (4)

  1. Casey Dunn repo owner

    We are relying on MacSE for both alignment and translation, as we found simultaneous alignment and translation to give more reliable protein translations than translation prior to alignment. Running MacSE is certainly a bottleneck... would be great if there were some other options. Unfortunately simultaneous alignment and translation will always be slower than alignment alone, but it would be great if it weren't this much slower.

  2. Log in to comment