mean RMSF calculation for protein ensembles

Issue #833 new
Former user created an issue

Hi all, I have a couple of protein ensemble conformers in .pdb format (10 conformers per each ensemble) from Markov state modeling calculation (MSM). I would like to know how protein conformers are changed between different MSM states, like comparison of average RMSF for state-1 (ensemble conformers, selected residues) to state-2 as the reference and etc. So the differentiate RMSD and RMSF modes are needed. I was used these command lines for a normal RMSF of independent simulation:


dcd <- read.dcd(“model.dcd") pdb <- read.pdb("model.pdb") ca.inds <- atom.select(pdb, elety = 'CA', resno = c(1:25,74:99,109:115,185:209,258:283,293:299)) xyz <- fit.xyz(fixed=pdb$xyz,mobile=trj,fixed.inds=ca.inds$xyz,mobile.inds=ca.inds$xyz) rmdf_result <- rmsf(xyz [,ca.inds$xyz])


I have used the following command lines for protein ensemble conformers:

pdb = read.pdb("model.pdb", multi=TRUE) sele <- atom.select(pdb, resno = c(1:25,74:99,109:115,185:209,258:283,293:299), elety = "CA") xyz <- pdb$xyz rmsd <- rmsd(pdb, fit=TRUE) print(rmsd) xyz <- fit.xyz(xyz[1, ], xyz) resno1 <- pdb$atom$resno[sele$atom] r <- rmsf(xyz[, sele$xyz], grpby=resno) plot.bio3d(r, resno=pdb, col="#E16A86", ylab="RMSF (state-4)", typ="l")

it shows RMSF for all residues in the protein (1 to 300) not the selected ones in sele, and there is a warning message for RMSD part:

Warning message: In fit.xyz(xyz[1, ], xyz) : No fitting indices provided, using the 5886 non NA positions

Any help will be appreciated Bests Zahra

Comments (11)

  1. Xinqiu Yao

    Hi,

    Can you reformat your post by putting code lines in a “code block”? It is hard to read.

  2. Mahnaz Tehrani

    thank you, here are code lines:

    pdb = read.pdb("model.pdb", multi=TRUE)
    sele <- atom.select(pdb, resno = c(1:25,74:99,109:115,185:209,258:283,293:299), elety = "CA")
    xyz <- pdb$xyz rmsd <- rmsd(pdb, fit=TRUE) print(rmsd)
    xyz <- fit.xyz(xyz[1, ], xyz)
    resno1 <- pdb$atom$resno[sele$atom]
    r <- rmsf(xyz[, sele$xyz], grpby=resno)
    plot.bio3d(r, resno=pdb, col="#E16A86", ylab="RMSF (state-4)", typ="l")

    it shows RMSF for all residues in the protein (1 to 300) not the selected ones in sele,
    and there is a warning message for RMSD part:
    Warning message: In fit.xyz(xyz[1, ], xyz) :
    No fitting indices provided, using the 5886 non NA positions

  3. Xinqiu Yao

    The warning message is fine. But remember that it tells that you are using all atoms (5886) to do the fitting (is it what you expect?)

    Can you check the dimension of the rmsf result (the variable 'r') since it more directly tells how many residues/atoms are calculated. For example, length(r). Does the result match the number of selected atoms/residues?

    The x-axis of the plot may not tell which residues are plotted/skipped because the range of the residue number is kept the same/similar, i.e., 1-300, after the atom select. Because you use resno=pdb, it will label the axis by PDB residue number instead of the actual index of the residues used for the RMSF calculation (and so it appears all 300 residues are calculated but may not be true).

  4. Mahnaz Tehrani

    Dear Xinqiu,

    Thank you, I have updated the code and shared the .out and plot in pdf format in attachment:

    setwd(".")
    library(bio3d.core)

    pdb <- read.pdb("model-state-4.pdb", multi=TRUE)
    sele <- atom.select(pdb, resno = c(1:34,56:64,74:99,109:115,185:218,240:248,258:283,293:299), elety = "CA")
    xyz <- pdb$xyz
    rmsd <- rmsd(pdb, fit=TRUE)
    print(rmsd)
    xyz <- fit.xyz(xyz[1, ], xyz)
    resno <- pdb$atom$resno[sele$atom]
    r <- rmsf(xyz[, sele$xyz], grpby=resno)
    length(r)
    plot.bio3d(r, resno = c(1:34,56:64,74:99,109:115,185:218,240:248,258:283,293:299), col="blue", ylab="RMSF", typ="l", sub="model-state-4")
    #geostas
    library(bio3d.geostas)
    gs <- geostas(pdb, fit=TRUE)
    plot(gs, contour=TRUE, main="AMSM,model-state-4")

    __________

    I think RMSF looks fine by updating the code. Actually as I mentioned I would like to compare structural parameters (RMSD, RMSF, NMA and domain analysis/geostas) for different protein conformational states of Markov state modeling calculation, in which each state includes ensemble of 10 PDB files. To aim this, I think the mean or average properties of each structural parameters states should be calculated (i.e., RMSF, NMA and GS if its applicable).

    For RMSD only C alpha atoms of all residues in the protein is enough.

    Bests

    Zahra

  5. Mahnaz Tehrani

    Yes but the previou post was about instalation.. This post has been overlooked since last week .. Please check the comments in the current post. You asked me to test command line and I shared the results. Thanks

  6. Xinqiu Yao

    I was mentioning this post. Your original problem is

    it shows RMSF for all residues in the protein (1 to 300) not the selected ones in sele

    Based on the ‘md.Rout’ file you have attached, it seems the problem has been resolved: The command length(r)returns ‘152’, which is the correct length of ‘sele’. I don’t know what your new problem/question is.

  7. Xinqiu Yao

    Ok. If you want to show the x-axis to be 1:152, simply remove ‘resno=XXX’ in the plot.bio3d().

  8. Mahnaz Tehrani

    @Xinqiu Yao

    Thank you for the comment. I have a question about “Domain analysis with GeoStaS”, can you please let me know how the colors on the residue number axis is defined on Geostas (blue, red, gray and etc.), I have tested it on ensemble of Markov state model modeling structures but need more information for Interpreting the results. I think it might be related to defined thresholds on the code.

    Bests

    Zahra

  9. Log in to comment