Structure-based sequence alignment

Issue #81 resolved
Lars Skjærven created an issue

I think it would be a good idea to have some integration with programs / methods for using the protein structures in generating sequence alignments. muscle fails in many of the low sequence identity scenarios, making both PCA and NMA comparison difficult.

I now have quick implementations of tmalign() and mustang() ready, but none of them are optimal. mustang is very time consuming and sensitive to non-standard amino acids (generating annoying errors), while tmalign() takes only a pair of proteins. to overcome the latter problem I've made a function to combine the sequence alignments from tmalign into a multiple sequence alignment (based on one reference sequence/structure).

the idea is to have a substitute for seqaln/pdbaln():

# tmalign.pair()
aln.pair <- tmalign.pair(pdb.a, pdb.b)

# tmalign() 
aln <- tmalign(files, ref=1)
pdbs <- read.fasta.pdb(aln)


# mustang
aln <- mustang(files)
pdbs <- read.fasta.pdb(aln)

here, files is a character vector of file names obtained e.g through pdbsplit(). obviously my functions above are just wrappers for calling the corresponding executables (equivalent to dssp() or seqaln()).

what do you think? are there other options that would be better?

Comments (6)

  1. Xinqiu Yao

    Hi,

    It is a bit pitty that tmalign doesn't support multiple alignment directly. But I think it will be still very useful when comparing two protein families that in each family sequence alignment is available but the joint of two multiple alignments need to be based on structures (e.g. the comparison between Ras and heterotrimeric G proteins). The question is: How often do we really need a purely structure-based multiple sequence alignment? The current solution with one reference structure should be a good start. Can we also think about writing our own progressive alignment alogrithm as implemented in MUSCLE or CLUSTAL based on tmalign results?

    By the way, I have written a couple very preliminary structural alignment functions (straln() and straln.pair() under new_funs/), although the objective of the function is a bit different from here.

  2. Lars Skjærven reporter

    Hey, Thanks. Testing it now, but this seems to give some odd results..?

    a= read.pdb("3drc")
    a=trim.pdb(a, atom.select(a, chain="A"))
    b= read.pdb("1aoe")
    b=trim.pdb(b, atom.select(b, chain="A"))
    aln <- straln.pair(a,b)
    
  3. Lars Skjærven reporter

    Pushed my temporary version of tmalign() to new_funs/.

    Example usage:

    ids <- c("3DRC_A", "3DRC_B", "1DF7_A", "1DG5_A", 
                 "1AI9_A", "1AOE_B", "1DHF_B", "2DHF_A")
    raw.files <- get.pdb(ids, path = "raw_pdbs", gzip=TRUE)
    files <- pdbsplit(raw.files, ids = ids, path = "raw_pdbs/split_chain")
    
    aln <- tmalign(files)
    pdbs <- read.fasta.pdb(aln)
    modes <- nma.pdbs(pdbs)
    
  4. Xinqiu Yao

    Thanks! I will have a try. The straln.pair() assumes you have pre-fitted structures and does simply dynamic programming based on RMSD. The original task is to align a group of peptides that are anchored to MHC receptor (So no need to fit). It may not be directly related here...

  5. Barry Grant

    Sorry if I am a bit late onto this thread.

    I wanted Hongyang to implement an SSM function, which could both search the PDB database and do small scale multiple structure alignments using EBI services. See: http://www.ebi.ac.uk/msd-srv/ssm/ssmlink.html

    I think Hongyang had some issues with this and he might comment here. However, I think it is still a good idea to implement warpers for these kinds of 'services' similar to blast.pdb() and hmmer(). This could be in addition to mustang() and suitable for folks who have trouble setting up the executables.

    I also put this on the ToDo list I think but am waiting for someone to bite.

  6. Log in to comment