deuterium atom names missing from bio3d::atom.index

Issue #521 new
Former user created an issue

Hi

In previous versions of bio3d (March 2017 and earlier) deuterium atom names were accepted by bio3d::atom2ele() and converted to D. I would use this function to convert atom names to elements for PDB files without an element column (elesy).

Are there plans to add the deuterium atom names to bio3d::atom.index? Or was I somehow just lucky in the past and deuterium atom names were skipped?

Please let me know if you need additional information.

Thank you Emilio

sessionInfo() R version 3.3.3 (2017-03-06) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X Yosemite 10.10.5

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] magrittr_1.5

loaded via a namespace (and not attached): [1] Rcpp_0.12.12 bio3d_2.3-3 grid_3.3.3 plyr_1.8.4 gtable_0.2.0 scales_0.4.1 ggplot2_2.2.1 stringi_1.1.5 rlang_0.1.2 reshape2_1.4.2 lazyeval_0.2.0 fastcluster_1.1.22 [13] cowplot_0.8.0 openxlsx_4.0.17 tools_3.3.3 stringr_1.2.0 munsell_0.4.3 yaml_2.1.14 parallel_3.3.3 colorspace_1.3-2 tibble_1.3.4

Comments (8)

  1. Lars Skjærven

    Hi, We can certainly add more specific atoms here. Can you provide a reproducible example?

    Note that you can always add your own mapping with argument elety.custom.

    myelety <- data.frame(name = "CL2", symb = "Cl")
    atom2ele(lig, elety.custom = myelety)
    
  2. Emilio Xavier Esposito

    Hi Lars Thank you for the suggestion. I have moved to using the elety.custom option.

    Here is an example for PDB 5pti. It also happens for 5rsa.

    pdb.5pti <- bio3d::read.pdb2("5pti")
      HEADER    PROTEINASE INHIBITOR (TRYPSIN)          05-OCT-84   5PTI               
       PDB has ALT records, taking A only, rm.alt=TRUE
    bio3d::atom2ele.pdb(pdb=pdb.5pti)
    Error in atom2ele.default(atom.names, ...) : 
        elements could not be determined for: D1, D2, D3, DE, DH11, DH12, DH21, DH22, D, DH, DG1, DZ1, DZ2, DZ3, DD21, DD22, DE21, DE22, DG
    

    I see there was a change to how atom2ele() determines the element in June 2017. Would it be possible to add deuterium (D) to the elements list?

  3. Lars Skjærven

    Since D is not an element of the periodic table (but an isotope of H) I don't think it makes sense to add it to the list of elements (elements). We can then either make a special case for deuterium (which programs like pymol does), or leave it to the user to map these particular atom names.

    Let me know what you think.

    Example of mapping all D atoms:

    # D atom names
    > unq = unique(pdb$atom$elety[grep("^D", pdb$atom$elety)])
    
    # data frame of mappings
    > cust = data.frame(name = unq, symb = "D")
    
    # provide custom mapping to atom2ele
    > ele = atom2ele(pdb, elety.custom = cust)
    

    Note the warning messages on two particular atoms:

        mapped element HG21 to Hg
        mapped element UNK to U
    

    These are obviously wrong mappings, and must be fixed manually. I will add atom name UNK to atom.index

  4. Emilio Xavier Esposito

    I understand your position for adding D to the list of elements (elements), but it is often used in crystallographic studies and thus found in RCSB structures. I like your idea of creating a special case, and thank you for the example code. Leaving it up to the user to map unusually yet frequent atom names might be problematic. Will you be including the mapping of all D atoms to atom2ele?

  5. Emilio Xavier Esposito

    When you say "These are obviously wrong mappings, and must be fixed manually." What exactly does this mean? Does the user have to fix the incorrect element symbol (elesy)? Is it possible to include various hydrogen atom names in the atom.index? Including HA, HB#, HD#, HE#, HG, and HZ atom names would make identifying hydrogen atoms easier.

    Not including hydrogen, there are five atoms who's element symbol starts with "H". With the exception of Hg, these four elements are unlikely to be included in a protein structure, If they are part of a protein structure, they will likely be in the HETATM section where Hg typically resides as a refraction aid or part of a ligand.

    > bio3d::elements[grepl(pattern="H", x=bio3d::elements$symb), ]
        num symb areneg rcov  rbo rvdw maxbnd    mass elneg ionization elaffinity  red green blue     name
    2     1    H   2.20 0.31 0.31 1.10      1   1.008  2.20     13.598     0.7542 1.00  1.00 1.00 Hydrogen
    3     2   He   0.00 0.28 0.28 1.40      0   4.003  0.00     24.587     0.0000 0.85  1.00 1.00   Helium
    68   67   Ho   0.00 1.92 1.92 2.33      6 164.930  1.23      6.021     0.5000 0.00  1.00 0.61  Holmium
    73   72   Hf   1.23 1.75 1.75 2.25      6 178.490  1.30      6.825     0.0000 0.30  0.76 1.00  Hafnium
    81   80   Hg   1.44 1.32 1.32 2.05      6 200.590  2.00     10.438     0.0000 0.71  0.71 0.76  Mercury
    109 108   Hs   0.00 1.60 1.60 2.00      6 270.000  0.00      0.000     0.0000 0.90  0.00 0.18  Hassium
    
  6. Barry Grant

    It is indeed very common and sounds like a special case for adding to me. Would adding cause problems elsewhere Lars?

    Thanks for posting on this Emilio!

  7. Log in to comment