Alignment for remote homologues

Issue #148 new
Xinqiu Yao created an issue

Hi guys,

We probably need a function for a mixed sequence and structural alignment, e.g. for two protein families we call seqaln() within each family and mustang() for inter-family alignment. The main motivation is the fact that mustang() can do good job on low-similarity sequences but sometimes give apparent mistakes for very similar sequences, which can be done well by seqaln().

The question here is how we group sequences into families, and one way to do it is doing hierarchical clustering based on similarity scores from an initial alignment (by calling seqaln()).

Does it make sense? Let me know if you have any idea!

Comments (2)

  1. Barry Grant

    On a somewhat related note: How about functionality for calculating a structure based quality score for all aligned positions in any 'pdbs' object? This would then highlight the regions were refinement might be necessary - e.g. the output of mustang() could be refined in certain regions by calling seqaln() with suitable parameters for refinement (this is already available I think). The idea here is that structurally diverse (i.e. flexible) regions that might be easily aligned by sequence but a re challenging for structure based approaches could be improved.

    What you are suggesting is a strategy for combined sequence and structure alignment. This could entail pairwise sequence based clustering then structure alignment of select cluster representatives. Also you might want to have a look at later T-coffee approaches for combining sequence and structure info.

  2. Log in to comment