atom.select() and pdbsplit() with blank chains, e.g. chain=NA in atom.select

Issue #126 resolved
Barry Grant created an issue

Another user question on how to select blank chain atoms with atom.select()

e.g.

> pdb<-read.pdb("4q21")

# Create some blank chain records for testing
> pdb$atom$chain[1:10]=" "

> atom.select(pdb, chain=NA)

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA)

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain=" ")

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = " ")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain="NA")

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = "NA")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
>

I can fix this with a modification to atom.select() (specifically the sel.txt2type() internal function) but this might have unintended consequences elsewhere - good test functions would help out here....

Comments (9)

  1. Xinqiu Yao

    Shouldn't it be: If chain=NA or empty it means entire system? To me it is a bit weird that one file has mixed empty and specified chain ID...

  2. Barry Grant reporter

    Fixed...

    > pdb<-read.pdb("4q21")
    # Create some blank chain records for testing
    > pdb$atom$chain[1:10]=" "
    > write.pdb(pdb, file="tmp.pdb")
    
    > tmp<-read.pdb("tmp.pdb")
    > atom.select(tmp, chain=NA)
    
     Build selection from input components
     *  Selected a total of: 10 intersecting atoms  *
    
     Call:  atom.select(pdb = tmp, chain = NA)
    
       Atom Indices#: 10  ($atom)
       XYZ  Indices#: 30  ($xyz)
    
    + attr: atom, xyz, call
    
    > pdbsplit("tmp.pdb")
      |======================================================================| 100%
    [1] "split_chain/tmp_A.pdb"
    > pdbsplit("tmp.pdb")
      |======================================================================| 100%
    [1] "split_chain/tmp_NA.pdb" "split_chain/tmp_A.pdb"
    >
    
  3. Barry Grant reporter

    Fixed with commit 9c3cb30 for cases where missing chains are NA - this will always be the case from read.pdb(). However, if they are " " (i.e. a single space, as in the modified pdb object in the examples above) then we still can't select blank entries as spaces are removed from selection text within atom.select(). One fix would be to ensure internaly within atom.select() that input pdb object has a valid chain id (i.e. map " " to NA).

  4. Barry Grant reporter

    Currently it is if chain=NULL then the whole system is selected. Missing chain ids for only portions of a file are unusual but can happen with users edited and concanated files - which I think was the case here. Is this confusing? Not sure of. Good alternative...

  5. Lars Skjærven

    recent commit introduced a bug related to elety="NA". see e.g. PDB id 1L2X:

    > pdb=read.pdb("1L2X")
    > atom.select(pdb, elety="NA")
    
     Build selection from input components
     *  Selected a total of: 0 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, elety = "NA")
    
       Atom Indices#: 0  ($atom)
       XYZ  Indices#: 0  ($xyz)
    
    + attr: atom, xyz, call
    
    > which(pdb$atom$elety=="NA")
    [1] 621 622 623
    

    before recent commit:

    > atom.select(pdb, elety="NA")
    
     Build selection from input components
     *  Selected a total of: 3 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, elety = "NA")
    
       Atom Indices#: 3  ($atom)
       XYZ  Indices#: 9  ($xyz)
    
    + attr: atom, xyz, call
    
  6. Barry Grant reporter

    I suspected this might happen but didn't have a good test pdb on hand. Will dig in further. thanks!

  7. Barry Grant reporter

    Should be doing more sensible things now:

    > pdb <- read.pdb("1L2X")
      Note: Accessing on-line PDB file
      HEADER    RNA                                     25-FEB-02   1L2X
       PDB has ALT records, taking A only, rm.alt=TRUE
    
    # PDB summary
    > pdb
    
     Call:  read.pdb(file = "1L2X")
    
       Total Models#: 1
         Total Atoms#: 803,  XYZs#: 2409  Chains#: 1  (values: A)
    
         Protein Atoms#: 0  (residues/Calpha atoms#: 0)
         Nucleic acid Atoms#: 582  (residues/phosphate atoms#: 27)
    
         Non-protein/nucleic Atoms#: 221  (residues: 190)
         Non-protein/nucleic resid values: [GTP (1), HOH (179), K (1), MG (6), NA (3) ]
    
       Nucleic acid sequence:
          GCGCGGCACCGUCCGCGGAACAAACGG
    
    + attr: atom, helix, sheet, seqres, xyz,
            calpha, call
    
    # Selection tests
    > atom.select(pdb, elety="NA")
    
     Build selection from input components
     *  Selected a total of: 3 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, elety = "NA")
    
       Atom Indices#: 3  ($atom)
       XYZ  Indices#: 9  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, chain=NA)
    
     Build selection from input components
     *  Selected a total of: 0 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA)
    
       Atom Indices#: 0  ($atom)
       XYZ  Indices#: 0  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, elety="NA", chain=NA)
    
     Build selection from input components
     *  Selected a total of: 0 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")
    
       Atom Indices#: 0  ($atom)
       XYZ  Indices#: 0  ($xyz)
    
    + attr: atom, xyz, call
    
    ## Test with missing chain id
    > pdb$atom$chain <- NA
    > atom.select(pdb, elety="NA")
    
     Build selection from input components
     *  Selected a total of: 3 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, elety = "NA")
    
       Atom Indices#: 3  ($atom)
       XYZ  Indices#: 9  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, chain=NA)
    
     Build selection from input components
     *  Selected a total of: 803 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA)
    
       Atom Indices#: 803  ($atom)
       XYZ  Indices#: 2409  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, elety="NA", chain=NA)
    
     Build selection from input components
     *  Selected a total of: 3 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")
    
       Atom Indices#: 3  ($atom)
       XYZ  Indices#: 9  ($xyz)
    
    + attr: atom, xyz, call
    
    ## Test with subset of missing chain id
    > pdb$atom$chain = "A"
    > pdb$atom$chain[1:10] <- NA
    > atom.select(pdb, elety="NA")
    
     Build selection from input components
     *  Selected a total of: 3 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, elety = "NA")
    
       Atom Indices#: 3  ($atom)
       XYZ  Indices#: 9  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, chain=NA)
    
     Build selection from input components
     *  Selected a total of: 10 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA)
    
       Atom Indices#: 10  ($atom)
       XYZ  Indices#: 30  ($xyz)
    
    + attr: atom, xyz, call
    > atom.select(pdb, elety="NA", chain=NA)
    
     Build selection from input components
     *  Selected a total of: 0 intersecting atoms  *
    
     Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")
    
       Atom Indices#: 0  ($atom)
       XYZ  Indices#: 0  ($xyz)
    
    + attr: atom, xyz, call
    

    Also chain=" " now also works similar to chain=NA and chain="NA" but elety=" " does not work the same as elety="NA". This is sensible I think.

  8. Barry Grant reporter

    Marking this as resolved. Please re-open if any strange NA selection issues are found further down the line.

  9. Log in to comment