Sequence alignment of several structures
Dear members of the Grant Lab,
I'm trying to perform a structure PCA of around 8000 different structures form the same protein family, but I am having some trouble right in the first steps. It seems that the alignment step within Bio3D is getting stuck at some point. Would it be possible for me to align the sequences of these proteins outside Bio3D and then use the MSA information to make the structural alignment around a specified core? I haven't found a way to do this so far.
Thanks you very much in advance for your attention.
Comments (6)
-
-
Dear Xinqiu,
Thank you very much for your reply! I was able to do the structural alignment using a combination of read.fasta() and read.fasta.pdb().
I just have one more question: When I try to find a core for alignment of my proteins I got a message (for one of my larger MSA files) that the function was not able to find a single non-gap position in my alignment, though I’m sure that there are some positions that are conserved in all sequences (I checked for “-” in these columns using awk and all of them are completely filled). Could I be missing something in my file or does Bio3D has a special definition of a non-gap position?
-
Can you provide a short example for us to take a look?
-
Sure! Thank you very much. How should I send it to you?
-
You can send it to xinqiu.yao@gmail.com or attach the files here if they are not too large.
-
Ok, I’ll send it to your e-mail now, thanks!
- Log in to comment
Yes, you can use pre-aligned sequences and load them into R to do other analyses with bio3d. The alignment should be in FASTA format. Then, look at functions including
read.fasta()
andread.fasta.pdb()
. Check the documents and also some tutorials about basic sequence/structure analysis or PCA. All available from http://thegrantlab.org/bio3d/Let me know if you still have problems.