atom.select() and pdbsplit() with blank chains, e.g. chain=NA in atom.select
Another user question on how to select blank chain atoms with atom.select()
e.g.
> pdb<-read.pdb("4q21")
# Create some blank chain records for testing
> pdb$atom$chain[1:10]=" "
> atom.select(pdb, chain=NA)
Build selection from input components
* Selected a total of: 0 intersecting atoms *
Call: atom.select(pdb = pdb, chain = NA)
Atom Indices#: 0 ($atom)
XYZ Indices#: 0 ($xyz)
+ attr: atom, xyz, call
> atom.select(pdb, chain=" ")
Build selection from input components
* Selected a total of: 0 intersecting atoms *
Call: atom.select(pdb = pdb, chain = " ")
Atom Indices#: 0 ($atom)
XYZ Indices#: 0 ($xyz)
+ attr: atom, xyz, call
> atom.select(pdb, chain="NA")
Build selection from input components
* Selected a total of: 0 intersecting atoms *
Call: atom.select(pdb = pdb, chain = "NA")
Atom Indices#: 0 ($atom)
XYZ Indices#: 0 ($xyz)
+ attr: atom, xyz, call
>
I can fix this with a modification to atom.select() (specifically the sel.txt2type() internal function) but this might have unintended consequences elsewhere - good test functions would help out here....
Comments (9)
-
-
reporter Fixed...
> pdb<-read.pdb("4q21") # Create some blank chain records for testing > pdb$atom$chain[1:10]=" " > write.pdb(pdb, file="tmp.pdb") > tmp<-read.pdb("tmp.pdb") > atom.select(tmp, chain=NA) Build selection from input components * Selected a total of: 10 intersecting atoms * Call: atom.select(pdb = tmp, chain = NA) Atom Indices#: 10 ($atom) XYZ Indices#: 30 ($xyz) + attr: atom, xyz, call > pdbsplit("tmp.pdb") |======================================================================| 100% [1] "split_chain/tmp_A.pdb" > pdbsplit("tmp.pdb") |======================================================================| 100% [1] "split_chain/tmp_NA.pdb" "split_chain/tmp_A.pdb" >
-
reporter Fixed with commit 9c3cb30 for cases where missing chains are NA - this will always be the case from read.pdb(). However, if they are " " (i.e. a single space, as in the modified pdb object in the examples above) then we still can't select blank entries as spaces are removed from selection text within atom.select(). One fix would be to ensure internaly within atom.select() that input pdb object has a valid chain id (i.e. map " " to NA).
-
reporter Currently it is if chain=NULL then the whole system is selected. Missing chain ids for only portions of a file are unusual but can happen with users edited and concanated files - which I think was the case here. Is this confusing? Not sure of. Good alternative...
-
recent commit introduced a bug related to elety="NA". see e.g. PDB id 1L2X:
> pdb=read.pdb("1L2X") > atom.select(pdb, elety="NA") Build selection from input components * Selected a total of: 0 intersecting atoms * Call: atom.select(pdb = pdb, elety = "NA") Atom Indices#: 0 ($atom) XYZ Indices#: 0 ($xyz) + attr: atom, xyz, call > which(pdb$atom$elety=="NA") [1] 621 622 623
before recent commit:
> atom.select(pdb, elety="NA") Build selection from input components * Selected a total of: 3 intersecting atoms * Call: atom.select(pdb = pdb, elety = "NA") Atom Indices#: 3 ($atom) XYZ Indices#: 9 ($xyz) + attr: atom, xyz, call
-
reporter I suspected this might happen but didn't have a good test pdb on hand. Will dig in further. thanks!
-
reporter -
reporter Should be doing more sensible things now:
> pdb <- read.pdb("1L2X") Note: Accessing on-line PDB file HEADER RNA 25-FEB-02 1L2X PDB has ALT records, taking A only, rm.alt=TRUE # PDB summary > pdb Call: read.pdb(file = "1L2X") Total Models#: 1 Total Atoms#: 803, XYZs#: 2409 Chains#: 1 (values: A) Protein Atoms#: 0 (residues/Calpha atoms#: 0) Nucleic acid Atoms#: 582 (residues/phosphate atoms#: 27) Non-protein/nucleic Atoms#: 221 (residues: 190) Non-protein/nucleic resid values: [GTP (1), HOH (179), K (1), MG (6), NA (3) ] Nucleic acid sequence: GCGCGGCACCGUCCGCGGAACAAACGG + attr: atom, helix, sheet, seqres, xyz, calpha, call # Selection tests > atom.select(pdb, elety="NA") Build selection from input components * Selected a total of: 3 intersecting atoms * Call: atom.select(pdb = pdb, elety = "NA") Atom Indices#: 3 ($atom) XYZ Indices#: 9 ($xyz) + attr: atom, xyz, call > atom.select(pdb, chain=NA) Build selection from input components * Selected a total of: 0 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA) Atom Indices#: 0 ($atom) XYZ Indices#: 0 ($xyz) + attr: atom, xyz, call > atom.select(pdb, elety="NA", chain=NA) Build selection from input components * Selected a total of: 0 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA, elety = "NA") Atom Indices#: 0 ($atom) XYZ Indices#: 0 ($xyz) + attr: atom, xyz, call ## Test with missing chain id > pdb$atom$chain <- NA > atom.select(pdb, elety="NA") Build selection from input components * Selected a total of: 3 intersecting atoms * Call: atom.select(pdb = pdb, elety = "NA") Atom Indices#: 3 ($atom) XYZ Indices#: 9 ($xyz) + attr: atom, xyz, call > atom.select(pdb, chain=NA) Build selection from input components * Selected a total of: 803 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA) Atom Indices#: 803 ($atom) XYZ Indices#: 2409 ($xyz) + attr: atom, xyz, call > atom.select(pdb, elety="NA", chain=NA) Build selection from input components * Selected a total of: 3 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA, elety = "NA") Atom Indices#: 3 ($atom) XYZ Indices#: 9 ($xyz) + attr: atom, xyz, call ## Test with subset of missing chain id > pdb$atom$chain = "A" > pdb$atom$chain[1:10] <- NA > atom.select(pdb, elety="NA") Build selection from input components * Selected a total of: 3 intersecting atoms * Call: atom.select(pdb = pdb, elety = "NA") Atom Indices#: 3 ($atom) XYZ Indices#: 9 ($xyz) + attr: atom, xyz, call > atom.select(pdb, chain=NA) Build selection from input components * Selected a total of: 10 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA) Atom Indices#: 10 ($atom) XYZ Indices#: 30 ($xyz) + attr: atom, xyz, call > atom.select(pdb, elety="NA", chain=NA) Build selection from input components * Selected a total of: 0 intersecting atoms * Call: atom.select(pdb = pdb, chain = NA, elety = "NA") Atom Indices#: 0 ($atom) XYZ Indices#: 0 ($xyz) + attr: atom, xyz, call
Also chain=" " now also works similar to chain=NA and chain="NA" but elety=" " does not work the same as elety="NA". This is sensible I think.
-
reporter - changed status to resolved
Marking this as resolved. Please re-open if any strange NA selection issues are found further down the line.
- Log in to comment
Shouldn't it be: If chain=NA or empty it means entire system? To me it is a bit weird that one file has mixed empty and specified chain ID...