BUG?: trim.pdb doesn't work well with pdb without chain identifier
Issue #88
resolved
The pdb file treated with AMBER usually doesn't have chain identifier. This will cause problems sometimes. For example, in trim.pdb():
> head(pdb$atom[, "chain"])
[1] NA NA NA NA NA NA
>
> sse <- dssp(pdb)
> pdb$helix <- sse$helix
> pdb$sheet <- sse$sheet
> pdb$helix
$start
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
16 33 70 91 104 122 129 141 212 241 266 299 113 175 197 227 253
$end
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
27 60 83 102 110 126 132 146 224 247 278 313 116 182 201 229 255
$length
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
12 28 14 12 7 5 4 6 13 7 13 15 4 8 5 3 3
$chain
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
$type
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
"H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "G" "G" "G" "G" "G"
> spdb <- trim.pdb(pdb, inds=atom.select(pdb, 'calpha'))
> spdb$helix
NULL
Instead of debugging every other function, one simpler way is to correct non-standard pdb by adding chains with chain.pdb() in the function read.pdb().
Comments (3)
-
-
reporter Thanks, Barry. It works now! I've also updated a little bit dssp() and stride() to catch for missing chain IDs, in a similar way.
head(pdb$atom[, "chain"]) #[1] NA NA NA NA NA NA pdb$helix #$start # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 16 33 70 91 104 122 129 141 212 241 266 299 113 175 197 227 253 # #$end # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 27 60 83 102 110 126 132 146 224 247 278 313 116 182 201 229 255 # #$length # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 #12 28 14 12 7 5 4 6 13 7 13 15 4 8 5 3 3 # #$chain # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 #" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " # #$type # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 #"H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "G" "G" "G" "G" "G" spdb <- trim.pdb(pdb, inds=atom.select(pdb, 'calpha')) spdb$helix #$start # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 16 33 70 91 104 113 122 129 141 175 197 212 227 241 253 266 299 # #$end # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 27 60 83 102 110 116 126 132 146 182 201 224 229 247 255 278 313 # #$chain # 27_NA 60_NA 83_NA 102_NA 110_NA 116_NA 126_NA 132_NA 146_NA 182_NA 201_NA # NA NA NA NA NA NA NA NA NA NA NA #224_NA 229_NA 247_NA 255_NA 278_NA 313_NA # NA NA NA NA NA NA # #$type # 27_NA 60_NA 83_NA 102_NA 110_NA 116_NA 126_NA 132_NA 146_NA 182_NA 201_NA # "H" "H" "H" "H" "H" "G" "H" "H" "H" "G" "G" #224_NA 229_NA 247_NA 255_NA 278_NA 313_NA # "H" "G" "H" "G" "H" "H"
-
-
assigned issue to
- changed status to resolved
Great, I am marking this as resolved for now then.
-
assigned issue to
- Log in to comment
I am reluctant to do this chain.pdb() call as having no chain ID in the input file may be the intention of a user and forcing one within Bio3D may cause confusion. I would prefer to debug the 'sse' section of trim.pdb() or better still make a change to dssp() (see below).
Note that stripping out the chain ID from a regular PDB will not easily reproduce this error as write.pdb() does not output $helix and $sheet components. Similarly AMBER generated PDB files are unlikely to contain SSE information. I see you use DSSP above to assign $helix and $sheet. Then I guess the real issue here is to have dssp() return NA for missing chain identifiers rather than " ". The chain ID is used internally in trim.pdb() and the mismatch of " " from dssp() with NA from read.pdb() will cause problems.
Regardless, I will add a catch within trim.pdb() for this - but suggest we edit dssp() accordingly. Can you test this initial fix please on the FEATURE_VISUALIZE_DATAFRAME branch. Thanks!
See: commit 70fc84b