read.fasta.pdb ERROR

Issue #555 resolved
Former user created an issue

I am trying to renumber the residues of a pdb taking into account the alignment of its sequence with the uniprot sequence for this gene. I have used the following commands:

library(bio3d)
uni.seq <- get.seq("P51610")
file.name <- get.pdb("4GO6")
pdb <- read.pdb(file.name)
pdb.seq <- pdbseq(pdb, atom.select(pdb, chain="D", elety="CA"))
aln <- seqaln( seqbind( pdb.seq, uni.seq), id=c( file.name, unlist("P51610")) )
pdbs <- read.fasta.pdb(aln)
pdb_ali <- cbind( 1:ncol(pdbs$ali), t(pdbs$ali), pdbs$resno[1,])

I expect to have a table with 4 columns being:

  1. number of the residue respect the uniprot sequence;
  2. aminoacids of the pdb sequence;
  3. aminoacids of the uniprot sequence;
  4. number of the residue respect the pdb sequence --> they have to be the same as I get in the variable pdb.seq for every aminoacid.

The problem appears if I want to do it for a chain of the pdb that is not the first with the sequence used. In this case B and D are the same protein but have some differences in the aminoacids. If I do the same with the B chain, there aren't errors, but if I try to do it with chain D the comand --> read.fasta.pdb raise an error because the sequence of the alignment does not match the sequence in the pdb (it is checking chain B not D in the pdb)

ERROR:

ERROR Alignment mismatch. See alignment below for further details (row 1 of aln and sequence of './4GO6.pdb'). First mismatch residue in PDB is: ASP-1833 (B) occurring at alignment position: 1853

Comments (3)

  1. Xinqiu Yao

    First, this is not a bug.

    The error explicitly tells you what happened: your sequence is from chain D but the pdb file you were trying to read is four chains, and thereby they mismatch.

    To solve it, create pdb files based on single chains and read a chain-specific pdb file. For example,

    file.name <- get.pdb("4GO6", split=TRUE)
    file.name
    [1] "./split_chain/4GO6_A.pdb" "./split_chain/4GO6_B.pdb" "./split_chain/4GO6_C.pdb" "./split_chain/4GO6_D.pdb"
    
    pdb <- read.pdb(file.name[4])
    pdb.seq <- pdbseq(pdb)
    aln <- seqaln( seqbind( pdb.seq, uni.seq), id=c( file.name[4], unlist("P51610")) )
    pdbs <- read.fasta.pdb(aln)
    resno <- rep(NA, ncol(pdbs$ali))
    resno[!is.gap(pdbs$ali[1,])] <- trim(pdb, 'calpha')$atom$resno
    pdb_ali <- cbind( 1:ncol(pdbs$ali), t(pdbs$ali), resno)
    head(pdb_ali)
             ./split_chain/4GO6_D.pdb P51610 resno
    [1,] "1" "-"                      "M"    NA   
    [2,] "2" "-"                      "A"    NA   
    [3,] "3" "-"                      "S"    NA   
    [4,] "4" "-"                      "A"    NA   
    [5,] "5" "-"                      "V"    NA   
    [6,] "6" "-"                      "S"    NA   
    
  2. Log in to comment