Structure-based sequence alignment

I think it would be a good idea to have some integration with programs / methods for using the protein structures in generating sequence alignments. muscle fails in many of the low sequence identity scenarios, making both PCA and NMA comparison difficult.

I now have quick implementations of tmalign() and mustang() ready, but none of them are optimal. mustang is very time consuming and sensitive to non-standard amino acids (generating annoying errors), while tmalign() takes only a pair of proteins. to overcome the latter problem I've made a function to combine the sequence alignments from tmalign into a multiple sequence alignment (based on one reference sequence/structure).

the idea is to have a substitute for seqaln/pdbaln():

# tmalign.pair()
aln.pair <- tmalign.pair(pdb.a, pdb.b)

# tmalign() 
aln <- tmalign(files, ref=1)
pdbs <- read.fasta.pdb(aln)


# mustang
aln <- mustang(files)
pdbs <- read.fasta.pdb(aln)

here, files is a character vector of file names obtained e.g through pdbsplit(). obviously my functions above are just wrappers for calling the corresponding executables (equivalent to dssp() or seqaln()).

what do you think? are there other options that would be better?

Comments (6)