BUG?: trim.pdb doesn't work well with pdb without chain identifier

Issue #88 resolved
Xinqiu Yao created an issue

The pdb file treated with AMBER usually doesn't have chain identifier. This will cause problems sometimes. For example, in trim.pdb():

> head(pdb$atom[, "chain"])
[1] NA NA NA NA NA NA
> 
> sse <- dssp(pdb)
> pdb$helix <- sse$helix
> pdb$sheet <- sse$sheet
> pdb$helix
$start
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
 16  33  70  91 104 122 129 141 212 241 266 299 113 175 197 227 253 

$end
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
 27  60  83 102 110 126 132 146 224 247 278 313 116 182 201 229 255 

$length
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
12 28 14 12  7  5  4  6 13  7 13 15  4  8  5  3  3 

$chain
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 

$type
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
"H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "G" "G" "G" "G" "G" 

> spdb <- trim.pdb(pdb, inds=atom.select(pdb, 'calpha'))
> spdb$helix
NULL

Instead of debugging every other function, one simpler way is to correct non-standard pdb by adding chains with chain.pdb() in the function read.pdb().

Comments (3)

  1. Barry Grant

    I am reluctant to do this chain.pdb() call as having no chain ID in the input file may be the intention of a user and forcing one within Bio3D may cause confusion. I would prefer to debug the 'sse' section of trim.pdb() or better still make a change to dssp() (see below).

    Note that stripping out the chain ID from a regular PDB will not easily reproduce this error as write.pdb() does not output $helix and $sheet components. Similarly AMBER generated PDB files are unlikely to contain SSE information. I see you use DSSP above to assign $helix and $sheet. Then I guess the real issue here is to have dssp() return NA for missing chain identifiers rather than " ". The chain ID is used internally in trim.pdb() and the mismatch of " " from dssp() with NA from read.pdb() will cause problems.

    Regardless, I will add a catch within trim.pdb() for this - but suggest we edit dssp() accordingly. Can you test this initial fix please on the FEATURE_VISUALIZE_DATAFRAME branch. Thanks!

    See: commit 70fc84b

  2. Xinqiu Yao reporter

    Thanks, Barry. It works now! I've also updated a little bit dssp() and stride() to catch for missing chain IDs, in a similar way.

    head(pdb$atom[, "chain"])
    #[1] NA NA NA NA NA NA
    
    pdb$helix
    #$start
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    # 16  33  70  91 104 122 129 141 212 241 266 299 113 175 197 227 253
    #
    #$end
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    # 27  60  83 102 110 126 132 146 224 247 278 313 116 182 201 229 255
    #
    #$length
    # 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
    #12 28 14 12  7  5  4  6 13  7 13 15  4  8  5  3  3
    #
    #$chain
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    #" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
    #
    #$type
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    #"H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "G" "G" "G" "G" "G"
    
    spdb <- trim.pdb(pdb, inds=atom.select(pdb, 'calpha'))
    
    spdb$helix
    #$start
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    # 16  33  70  91 104 113 122 129 141 175 197 212 227 241 253 266 299
    #
    #$end
    #  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    # 27  60  83 102 110 116 126 132 146 182 201 224 229 247 255 278 313
    #
    #$chain
    # 27_NA  60_NA  83_NA 102_NA 110_NA 116_NA 126_NA 132_NA 146_NA 182_NA 201_NA
    #    NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
    #224_NA 229_NA 247_NA 255_NA 278_NA 313_NA
    #    NA     NA     NA     NA     NA     NA
    #
    #$type
    # 27_NA  60_NA  83_NA 102_NA 110_NA 116_NA 126_NA 132_NA 146_NA 182_NA 201_NA
    #   "H"    "H"    "H"    "H"    "H"    "G"    "H"    "H"    "H"    "G"    "G"
    #224_NA 229_NA 247_NA 255_NA 278_NA 313_NA
    #   "H"    "G"    "H"    "G"    "H"    "H"
    
  3. Log in to comment