atom.select() and pdbsplit() with blank chains, e.g. chain=NA in atom.select

Issue #126 resolved

Barry Grant created an issue 2014-11-06

Another user question on how to select blank chain atoms with atom.select()

e.g.

> pdb<-read.pdb("4q21")

# Create some blank chain records for testing
> pdb$atom$chain[1:10]=" "

> atom.select(pdb, chain=NA)

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA)

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain=" ")

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = " ")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain="NA")

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = "NA")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
>

I can fix this with a modification to atom.select() (specifically the sel.txt2type() internal function) but this might have unintended consequences elsewhere - good test functions would help out here....

Comments (9)

Xinqiu Yao
Shouldn't it be: If chain=NA or empty it means entire system? To me it is a bit weird that one file has mixed empty and specified chain ID...
- 2014-11-06T21:45:04+00:00

Barry Grant reporter

Fixed...

> pdb<-read.pdb("4q21")
# Create some blank chain records for testing
> pdb$atom$chain[1:10]=" "
> write.pdb(pdb, file="tmp.pdb")

> tmp<-read.pdb("tmp.pdb")
> atom.select(tmp, chain=NA)

 Build selection from input components
 *  Selected a total of: 10 intersecting atoms  *

 Call:  atom.select(pdb = tmp, chain = NA)

   Atom Indices#: 10  ($atom)
   XYZ  Indices#: 30  ($xyz)

+ attr: atom, xyz, call

> pdbsplit("tmp.pdb")
  |======================================================================| 100%
[1] "split_chain/tmp_A.pdb"
> pdbsplit("tmp.pdb")
  |======================================================================| 100%
[1] "split_chain/tmp_NA.pdb" "split_chain/tmp_A.pdb"
>

2014-11-06T22:04:13+00:00

Barry Grant reporter
Fixed with commit 9c3cb30 for cases where missing chains are NA - this will always be the case from read.pdb(). However, if they are " " (i.e. a single space, as in the modified pdb object in the examples above) then we still can't select blank entries as spaces are removed from selection text within atom.select(). One fix would be to ensure internaly within atom.select() that input pdb object has a valid chain id (i.e. map " " to NA).
- 2014-11-06T22:10:40+00:00
Barry Grant reporter
Currently it is if chain=NULL then the whole system is selected. Missing chain ids for only portions of a file are unusual but can happen with users edited and concanated files - which I think was the case here. Is this confusing? Not sure of. Good alternative...
- 2014-11-07T02:10:28+00:00

Lars Skjærven

recent commit introduced a bug related to elety="NA". see e.g. PDB id 1L2X:

> pdb=read.pdb("1L2X")
> atom.select(pdb, elety="NA")

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, elety = "NA")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call

> which(pdb$atom$elety=="NA")
[1] 621 622 623

before recent commit:

> atom.select(pdb, elety="NA")

 Build selection from input components
 *  Selected a total of: 3 intersecting atoms  *

 Call:  atom.select(pdb = pdb, elety = "NA")

   Atom Indices#: 3  ($atom)
   XYZ  Indices#: 9  ($xyz)

+ attr: atom, xyz, call

2014-11-07T10:56:40+00:00

Barry Grant reporter
I suspected this might happen but didn't have a good test pdb on hand. Will dig in further. thanks!
- 2014-11-07T15:08:23+00:00
Barry Grant reporter
BUGFIX: Update to allow both chain=NA and elety=NA

See issue ~~#126~~

→ <<cset da359d1ada73>>
- 2014-11-07T15:40:46+00:00

Barry Grant reporter

Should be doing more sensible things now:

> pdb <- read.pdb("1L2X")
  Note: Accessing on-line PDB file
  HEADER    RNA                                     25-FEB-02   1L2X
   PDB has ALT records, taking A only, rm.alt=TRUE

# PDB summary
> pdb

 Call:  read.pdb(file = "1L2X")

   Total Models#: 1
     Total Atoms#: 803,  XYZs#: 2409  Chains#: 1  (values: A)

     Protein Atoms#: 0  (residues/Calpha atoms#: 0)
     Nucleic acid Atoms#: 582  (residues/phosphate atoms#: 27)

     Non-protein/nucleic Atoms#: 221  (residues: 190)
     Non-protein/nucleic resid values: [GTP (1), HOH (179), K (1), MG (6), NA (3) ]

   Nucleic acid sequence:
      GCGCGGCACCGUCCGCGGAACAAACGG

+ attr: atom, helix, sheet, seqres, xyz,
        calpha, call

# Selection tests
> atom.select(pdb, elety="NA")

 Build selection from input components
 *  Selected a total of: 3 intersecting atoms  *

 Call:  atom.select(pdb = pdb, elety = "NA")

   Atom Indices#: 3  ($atom)
   XYZ  Indices#: 9  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain=NA)

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA)

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, elety="NA", chain=NA)

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call

## Test with missing chain id
> pdb$atom$chain <- NA
> atom.select(pdb, elety="NA")

 Build selection from input components
 *  Selected a total of: 3 intersecting atoms  *

 Call:  atom.select(pdb = pdb, elety = "NA")

   Atom Indices#: 3  ($atom)
   XYZ  Indices#: 9  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain=NA)

 Build selection from input components
 *  Selected a total of: 803 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA)

   Atom Indices#: 803  ($atom)
   XYZ  Indices#: 2409  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, elety="NA", chain=NA)

 Build selection from input components
 *  Selected a total of: 3 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")

   Atom Indices#: 3  ($atom)
   XYZ  Indices#: 9  ($xyz)

+ attr: atom, xyz, call

## Test with subset of missing chain id
> pdb$atom$chain = "A"
> pdb$atom$chain[1:10] <- NA
> atom.select(pdb, elety="NA")

 Build selection from input components
 *  Selected a total of: 3 intersecting atoms  *

 Call:  atom.select(pdb = pdb, elety = "NA")

   Atom Indices#: 3  ($atom)
   XYZ  Indices#: 9  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, chain=NA)

 Build selection from input components
 *  Selected a total of: 10 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA)

   Atom Indices#: 10  ($atom)
   XYZ  Indices#: 30  ($xyz)

+ attr: atom, xyz, call
> atom.select(pdb, elety="NA", chain=NA)

 Build selection from input components
 *  Selected a total of: 0 intersecting atoms  *

 Call:  atom.select(pdb = pdb, chain = NA, elety = "NA")

   Atom Indices#: 0  ($atom)
   XYZ  Indices#: 0  ($xyz)

+ attr: atom, xyz, call

Also chain=" " now also works similar to chain=NA and chain="NA" but elety=" " does not work the same as elety="NA". This is sensible I think.

2014-11-07T15:48:36+00:00

Barry Grant reporter
- changed status to resolved
Marking this as resolved. Please re-open if any strange NA selection issues are found further down the line.
- 2014-11-18T14:52:45+00:00
Log in to comment

Assignee: Barry Grant

Type: enhancement

Priority: major

Status: resolved

Component: –

Version: –

Votes: 1

Watchers: 3