PCA analysis for MUSCLE aligned multiple protein sequences?

Issue #479 resolved
Yang Cheng created an issue

Hi,

I'm very new to this and I was wondering if I can use bio3d PCA analysis on multiple sequences (up to 1000) that I had aligned elsewhere using MUSCLE alignment? or use an Newick tree format input on on bio3d PCA? or I need to re-modify it thru the source code? Thanks so much

Yang

Comments (5)

  1. Xinqiu Yao

    Hi Yang,

    That's an interesting question. The 'pca' functionality in bio3d is only for protein structures. Are you going to do PCA on sequences themselves? How will you define the coordinates of each sequence that are to be used to calculate the covariance?

  2. Yang Cheng reporter

    HI Xin-Qiu,

    Thanks for the reply. Yes, I want to use dimensionality reduction analysis to see if I can cluster certain group of sequence with special motif. My approach is to use MUSCLE alignment and the PCA hopefully could cluster them based on this alignment, or based on the phylogeny tree built from MUSCLE alignment. Is this possible for bio3d? Thanks

  3. Xinqiu Yao

    Bio3d does not directly support sequence PCA, but I believe it should be doable once you find a proper way to "encode" the sequences. I would recommend do a literature search first, find the method previous studies used, and reproduce it with bio3d or R.

  4. Log in to comment