pca.project doesn't work if structures to project are smaller than those used to make the pca

Issue #795 closed
Former user created an issue

Hello,

I have made a PCA using several pdb structures of my protein of interest and I would now like to project frames from an MD simulation atop (as done here http://thegrantlab.org/bio3d/articles/online/enma_vignettes/Bio3D_nma-dhfr-partI.html). The PCA has been calculated with 2 domains of my protein but one of my simulations only has one of the domains. When I try to project this simulation onto the PCA I get this error:

Error in project.pca(closed_tBamA_trj.fit[, closed_tBamA_inds$b$xyz], : Dimensionality mismatch: ncol(data)!=ncol(pca$U)

Is there any way to avoid this error, or is it impossible to project coordinates of a part of a protein onto a PCA calculated with the whole protein?

Comments (17)

  1. Xinqiu Yao

    It is impossible to project part of a protein to a PCA using the entire protein. The dimension must match.

  2. Matthew Watson

    but why can you make the initial PCA from a list of pdbs which have come from different organisms and thus have different lengths, only to then not be able to project frames of an MD onto it, is there no way that the function could do whatever the original PCA did to keep only the common parts of the structures?

  3. Xinqiu Yao

    The example in the tutorial you mentioned used the same dimension (i.e., the aligned C-alpha atoms) for both PDB and MD. In your case, you want to project a single-domain MD to a PCA done on two domains, which is impossible. You have to redo PCA with just one domain and then project your one-domain MD trajectory (and two-domain MD trajectories if you have, and in this case, pick up residues from MD that match the domain used in PCA).

  4. Xinqiu Yao

    In other words, the coordinates used in PCA can be a subset of all data (trajectories) you want to project, but not the opposite.

  5. Matthew Watson

    in my case the PCA was made fro ma curated list of PDBs that did not contain some accessory proteins while the MD I wish to project onto the PCA includes these accessory proteins. The problem seems sto be at the trajectory fitting to its own first frame though

  6. Matthew Watson

    It worked for one trajectory but now I have another issue when trying to add on a second trajectory- at the point where I’m aligning the first frame PDB of the trajectory onto the curated set of pdbs, I use

    apo_inds <- pdb2aln.ind(pdbs, pdb_apo, gaps.res$f.inds)
    

    and it returns the error

    In pdb2aln.ind(pdbs, pdb_apo) :Gaps are found in equivalent positions in PDB
    

    what can I do to make it ignore the gaps? rm.gaps=TRUE does not work

  7. Xinqiu Yao

    That tells there are one or more aligned positions (residues) in the original alignment that were used to calculate the PCA cannot find equivalent residues in the trajectory PDB.

    There are two possible ways to solve it:

    • Check the alignment manually (open the 'pdb2aln.fa’ file using SEAVIEW, for example) and see if you can fix the gap problem by adjusting the alignment based on, e.g., structures.

    Or

    • Redo PCA using aligned (non-gap) positions that all have equivalent residues in the trajectory PDB.

  8. Matthew Watson

    I’m confused since the PDB in question is so similar to the other larger one which didn’t return this issue…

  9. Matthew Watson

    I also tried remaking the PCA with the PDB included and still aligning the PDB to that PCA failed for the smae reason

  10. Xinqiu Yao

    If you included the PDB, then there shouldn’t be a problem. Can you provide a short example to reproduce the error?

  11. Matthew Watson

    sure.

    pdb_apo <- read.pdb("path to file/apo.pdb")  #works
    dcd_apo <- read.dcd("path to file/apo.dcd")   #works
    pdbs <- pdbaln(files, fit=TRUE)                #works
    gaps.pos <- gap.inspect(pdbs$xyz)              #works
    gaps.res <- gap.inspect(pdbs$ali)              #works
    pc.xray <- pca(pdbs, core.find=TRUE)           #works
    
    
    apo_inds <- pdb2aln.ind(pdbs, pdb_apo, gaps.res$f.inds)   #fails
    

    plotting the PCA components 1 and 2 also works as does projecting a different trajectory on top of it.

    How can I send yo uthe files?

  12. Matthew Watson

    I'm just sending them now, all of the ones with ‘bilayer’ in the name were used to make the PCA

  13. Matthew Watson

    trying various things now remaking the intial list of PDBS, returning a new erroe

    Error in read.fasta.pdb(s, prefix = "", pdbext = "", pdblist = files, :
    No corresponding PDB files found

    but the files are the same as before, I haven’t changed them

  14. Xinqiu Yao

    Closed because a solution was provided by Email but the user has no response. Can be reopened if the issue is going on.

  15. Log in to comment