RNA structures with core.find()

Issue #125 resolved
Barry Grant created an issue

More of a question but want to check if others have feedback on making bio3d play nicely with non protein structures....

Dear Prof. Grant,

Hello! I am a graduate student at Yale university and I am sorry for bothering you with a question about bio3d.

Bio3d is really a great and handy package to analyze structures and structural trajectories. It is really helpful and I appreciate your works. However, I am currently working on RNA structures, and I know bio3d is designed for proteins. Here I have a specific question. I am now trying to identify the structural core of a RNA structure based two structures in which the RNA is crystallized at different conformations. I just aligned the backbone phosphate of the two structures using fit.xyz, and then feed the aligned position matrices into core.find. However, the output PDB files are automatically labeled as Aln. I am just wondering whether this is simply a symbol and the atom numbers are still valid for my analysis. I think the above statement is probably true, but I would like to be cautious and confirm with you.

Thank you so much for your time!

Sincerely, Chen

Comments (8)

  1. Barry Grant reporter

    Section of my reply...

    You are correct, the indices should be good and the pdb output from the core.find() function does indeed use dummy fields for residue names as well as other non xyz fields (these reflect our current protein bias). Unfortunately, we generally don't have much experience using bio3d (i.e. testing) with non protein structures but we initially designed it in such a way as to be agnostic of input structure type. However, as always, I would encourage you to check output at each sage to make sure things are as you expect.

  2. Lars Skjærven

    jepp.. compatibility with nucleic structures is something we need to address.

    As a start I added option 'nucleic' and 'nonnucleic' to atom.select() in commit f5d5e02.

    summary.pdb() now provides:

     Call:  read.pdb(file = "1L2X")
    
       Total Models#: 1
         Total Atoms#: 803,  XYZs#: 2409  Chains#: 1  (values: A)
    
         Protein Atoms#: 0  (residues/Calpha atoms#: 0)
         Nucleic acid Atoms#: 582  (residues/phosphate atoms#: 27)
    
         Non-protein/nucleic Atoms#: 221  (residues: 190)
         Non-protein/nucleic resid values: [GTP (1), HOH (179), K (1), MG (6), NA (3) ]
    
       Nucleic acid sequence:
          GCGCGGCACCGUCCGCGGAACAAACGG
    
    + attr: atom, helix, sheet, seqres, xyz,
            calpha, call
    
  3. Lars Skjærven

    In addition, we could have

    atom.select(pdb, "phosphates")
    

    which would fetch "P" atoms in nucleic acid residues (as a counterpart to string="calpha"). We could also provide attribute "phosphate" for nucleic acids (corresponding to the "calpha" for proteins).

    For the question above, the user could then do:

    pdb    <- read.pdb("2MFD", multi=TRUE)
    p.inds <- atom.select(pdb, "phosphates")
    core   <- core.find(pdb$xyz[, p.inds$xyz])
    
  4. Barry Grant reporter

    Great, I like it. Adding these types of things to atom.select() is very sensible. By adding the "phosphate attribute" do you mean to the output of read.pdb() like the $calpha?

    I will mention in passing that I added a note on the wiki ToDo page a while back that pointed to VMD's selection keywords. See: http://www.ks.uiuc.edu/Research/vmd/vmd-1.3/ug/node133.html

    In VMD they apparently use a check on the actual atom names within a residue to find protein and nucleic acid residue types. Currently we do the simpler check on resid name, which is fine for now but means we may have to update the list of protein residues every time we find a new non-standard protein resid. Probably not a big deal though.

  5. Lars Skjærven

    good point with with atom name check. we should definitively have that.

    jepp- adding pdb$phosphate. probably doesn't make sense, but the RNAs feel unfairly treated

  6. Barry Grant reporter

    The disadvantage of the VMD way is that you may need all those predefined atoms there to be considered protein or nucleic acid and we often work with fragments of structures (calpha only and missing atoms etc.). Would perhaps be a bigger change than at first sight.

  7. Lars Skjærven

    proposing to close this issue with commit b4118d9.

    a user can now do

    # with RNA
    pdb <- read.pdb('2MFD', multi=TRUE)
    core <- core.find(pdb)
    
    # with protein
    pdb <- read.pdb('1d1d', multi=TRUE)
    core <- core.find(pdb)
    
  8. Log in to comment