read.pdb() behavior with het2atom=TRUE

Issue #52 resolved
Barry Grant created an issue

I propose that if read.pdb() has het2atom=TRUE then the $het output component should be NULL.

Note that currently when you set het2atom=TRUE in read.pdb() all $het data gets put in $atom however we also keep this data in $het. Therefore we store duplicate data.

This is not usually a problem as we don't typically do much with $het. But the output from summary.pdb()/print.pdb() will be confusing... e.g.

> summary( read.pdb("4q21") )
##  Note: Accessing online PDB file
##  HEADER    ONCOGENE PROTEIN                        25-SEP-91   4Q21
##
## Call:  read.pdb(file = "4q21")
##
##  Atom Count: 1447
##
##   Total ATOMs#: 1340
##     Protein ATOMs#: 1340   ( Calpha ATOMs#: 168 )
##     Non-protein ATOMs#: 0   ( residues:  )
##
##   Total HETATOMs: 107
##     Residues HETATOMs#: 80   ( residues: MG GDP HOH )
##
##+ attr: atom, het, helix, sheet, seqres,
##        xyz, xyz.models, calpha, call

> summary( read.pdb("4q21", het2atom=T) )
##  Note: Accessing online PDB file
##  HEADER    ONCOGENE PROTEIN                        25-SEP-91   4Q21
##
## Call:  read.pdb(file = "4q21", het2atom = T)
##
##  Atom Count: 1554
##
##   Total ATOMs#: 1447
##     Protein ATOMs#: 1340   ( Calpha ATOMs#: 168 )
##     Non-protein ATOMs#: 107   ( residues: MG GDP HOH )
##
##   Total HETATOMs: 107
##     Residues HETATOMs#: 80   ( residues: MG GDP HOH )
##
##+ attr: atom, het, helix, sheet, seqres,
##        xyz, xyz.models, calpha, call

Note the duplicate 107 non-protein and HETATOM report in the second case above. I propose that if het2atom=TRUE we should blank-out the $het records

Comments (1)

  1. Barry Grant reporter

    Made this change, see more sensible output below:

    > library(bio3d)
    > read.pdb("4q21")
      Note: Accessing online PDB file
      HEADER    ONCOGENE PROTEIN                        25-SEP-91   4Q21
    
     Call:  read.pdb(file = "4q21")
    
      Atom Count: 1447
    
       Total ATOMs#: 1340
         Protein ATOMs#: 1340   ( Calpha ATOMs#: 168 )
         Non-protein ATOMs#: 0   ( residues:  )
         Chains#: 1   ( values: A )
    
       Total HETATOMs: 107
         Residues HETATOMs#: 80   ( residues: MG GDP HOH )
         Chains#: 1   ( values: A )
    
       Sequence:
          MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
          QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDL
          AARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKL
    
    + attr: atom, het, helix, sheet, seqres,
            xyz, xyz.models, calpha, call
    >
    >
    > read.pdb("4q21", het2atom=T)
      Note: Accessing online PDB file
      HEADER    ONCOGENE PROTEIN                        25-SEP-91   4Q21
    
     Call:  read.pdb(file = "4q21", het2atom = T)
    
      Atom Count: 1447
    
       Total ATOMs#: 1447
         Protein ATOMs#: 1340   ( Calpha ATOMs#: 168 )
         Non-protein ATOMs#: 107   ( residues: MG GDP HOH )
         Chains#: 1   ( values: A )
    
       Total HETATOMs: 0
         Residues HETATOMs#: 0   ( residues: none )
         Chains#: 1   ( values: 0 )
    
       Sequence:
          MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
          QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDL
          AARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKL
    
    + attr: atom, het, helix, sheet, seqres,
            xyz, xyz.models, calpha, call
    >
    
  2. Log in to comment