Distance Matrix for antibody with insertion sites (H100A, H100B...)

Issue #343 resolved
Former user created an issue

Hi,

On calculating a distance matrix as below, it seems to miss out the insertion sites in an antibody pdb. There are 227 residues in total, but nrow only shows 222. There are 5 insertion sites that are missed out:

  pdbInterest <- read.pdb("pdb-files/AB0194.pdb")
  k <- dm(pdbInterest,mask.lower=FALSE)
  print(ncol(k))

I have attached AB0194.pdb for you to look at. I essentially just want calpha's and it should give 227 in total.

Thanks, Daniel

Comments (9)

  1. Daniel James

    Do you recommend simply using

    convert.pdb(pdb, type="pdb", renumber=T)
    

    ?

    So:

    pdbInterest <- read.pdb("pdb-files/AB0194.pdb")
      k <- dm(convert.pdb(pdbInterest, type="pdb", renumber=T),mask.lower=FALSE)
      print(ncol(k))
    

    Then if I export I can re-add the proper row and column names to the matrix later?

  2. Xinqiu Yao

    Yes, it is indeed a bug and we will fix it soon. Thanks for the report! Please keep watching on the releases or master branch for the update. Also refer to this page for how to download and install the development version.

    Renumbering all residues before the calculation is an alternative method and should work well. I recommend use the clean.pdb() function, which can do the renumbering and also many other checking for your pdb. Let me know if you have any question or problem.

  3. Daniel James

    Ok. Thanks! Another thing to think about is when carrying out a difference matrix (subtracting a dmat from another), should bio3d have a default in place if matrices are of different size.

    Currently, I have been calculating distance matrices between antibodies of different length, and so to get around the error that ncol and nrow are not equal, I add extra columns/rows so the dmats have same dimensions. Maybe this is something that bio3d could do, adding cols and rows with 'NA' to indicate that a calculation couldn't be made. This would be very helpful.

    Daniel

  4. Barry Grant

    Thanks for catching this Daniel. Looks like Xinqiu fixed this bug here d05f82c by including the insert record along with residue number and chain entries in our grpby command.

    I like the suggestion for not failing by default when different size distance matrices are to be compared. However, to me this would only make sense if one protein had a C-terminal extension relative to the other. In most other cases such a comparison would likely be a mistake as we would probably not be subtracting elements for equivalent residue pairs.

    If you are only interested in C-alpha distance matrices then perhaps using an aligned pdbs object as input for the dm calculation would be best. Should we have a new dm.pdbs() function to look after this and thus have the NAs in the correct gap positions etc.?

    Note that this dm.pdbs might help when it comes to assess the correctness of our sequence based structural alignment columns.

  5. Xinqiu Yao

    Yes, I agree. Having a new dm.pdbs() function is a good idea to deal with such comparison. Will put to the ToDo list.

  6. Log in to comment