NA values in contact map while mask.lower=FALSE

Issue #301 resolved
Former User created an issue

Hi,

I am using the bio3d package to create contact matrices for pdf files. This is the code for one pdb file:

pdb <- read.pdb("1a05.pdb")
inds <- atom.select(pdb, "calpha")
cm <- cmap( pdb$xyz[inds$xyz], dcut=5, mask.lower=FALSE)
diag(cm)<-1

Although I have made the mask.lower false and made the diagonal equal 1, still there are a lot of NA value sin the matrix ! As you can see in the screen shot attached.

Comments (29)

  1. Xinqiu Yao

    Hi,

    These NA values are for consecutive residue pairs, which are not included in contact calculation since the set of dcut option. You can simply remove these NAs by cm[is.na(cm)] <- 0. Then all NAs will be 0. Let me know if you have any problem!

  2. Former User reporter

    One more question, how could I know the order of the amino acids in the matrix? Is the order based on the sequence?

  3. Xinqiu Yao

    Yes, the order is the same as the sequence.

    A good practice is to label rows and columns of the cmap matrix with residue numbers. For example, rownames(cm) <- pdb$atom[inds$atom, 'resno']; colnames(cm) <- pdb$atom[inds$atom, 'resno']. Then, when you want to know if the residues e.g. 100 and 200 contact or not, just type cm['100', '200'].

    Hope it helps.

  4. Former User reporter

    Thanks a lot for the help. Actually, I need to build a matrix of labels for the amino acid (each column correspond to one label) whose order is the same as the order of the contact matrix. The labels that I want to use are weight, charge, and hydrophobicity.

  5. Former User reporter

    I just checked the length of the sequence of one of the enzymes (whose pdb id is 1A4S) and got 503 as the number of the amino acids in the sequence. However, the corresponding contact matrix is of the size 2012 x 2012. Could you please elaborate on this?

  6. Lars Skjærven
    > pdb <- read.pdb("1A4S")
      Note: Accessing on-line PDB file
    trying URL 'http://www.rcsb.org/pdb/files/1A4S.pdb'
    downloaded 1.3 MB
    
    > sele = atom.select(pdb, "calpha")
    > sele
    
     Call:  atom.select.pdb(pdb = pdb, string = "calpha")
    
       Atom Indices#: 2012  ($atom)
       XYZ  Indices#: 6036  ($xyz)
    
    + attr: atom, xyz, call
    
    > sum(pdb$calpha)
    [1] 2012
    

    Looks like the protein has 2012 calpha atoms. Lars

  7. Former User reporter

    But, then for the enzyme with the pdb id "1A5Z" the dimensions of the contact matrix is 312 x 312, while the sequence length is 319, which means 7 amino acids are missing in the contact matrix. This is the case also in many other contact matrices.

  8. Lars Skjærven

    Looks like there are 312 calpha atoms here. How did you get 319?

    > pdb <- read.pdb("1A5Z")
      Note: Accessing on-line PDB file
    trying URL 'http://www.rcsb.org/pdb/files/1A5Z.pdb'
    downloaded 283 KB
    
    > pdb
    
     Call:  read.pdb(file = "1A5Z")
    
       Total Models#: 1
         Total Atoms#: 2852,  XYZs#: 8556  Chains#: 1  (values: A)
    
         Protein Atoms#: 2415  (residues/Calpha atoms#: 312)
         Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
    
         Non-protein/nucleic Atoms#: 437  (residues: 351)
         Non-protein/nucleic resid values: [ CD (3), FBP (2), HOH (344), NAD (1), OXM (1) ]
    
       Protein sequence:
          MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG
          DYADLKGSDVVIVAAGVPQKPGETRLQLLGRNARVMKEIARNVSKYAPDSIVIVVTNPVD
          VLTYFFLKESGMDPRKVFGSGTVLDTARLRTLIAQHCGFSPRSVHVYVIGEHGDSEVPVW
          SGAMIGGIPLQNMCQVCQKCDSKILENFAEKTKRAAYEIIERKGA...<cut>...AEEN
    
    + attr: atom, xyz, seqres, helix, sheet,
            calpha, remark, call
    
    
    > sele = atom.select(pdb, "calpha")
    > sele
    
     Call:  atom.select.pdb(pdb = pdb, string = "calpha")
    
       Atom Indices#: 312  ($atom)
       XYZ  Indices#: 936  ($xyz)
    
    + attr: atom, xyz, call
    
    > pdb <- trim(pdb, "protein")
    > grpby <- paste(pdb$atom$resno, pdb$atom$insert, pdb$atom$chain)
    > length(unique(grpby))
    [1] 312
    
    > cm2 <- cmap(pdb$xyz, grpby=grpby)
    > dim(cm2)
    [1] 312 312
    

    Bug found:

    > cm <- cmap( pdb )
    > dim(cm)
    [1] 313 313
    

    Fix: add pdb$atom$insert in paste statement here.

  9. Former User reporter

    I just encountered another problem. If I set cm[is.na(cm)] <- 0, then the matrix loses its symmetry.

  10. Xinqiu Yao

    Hi,

    Thanks for spotting this. It is a bug and has been fixed with this commit. Please update your local bio3d with the latest development version from master or patched version from releases branch (See here for instructions).

    Let me know if you still have any problem.

  11. Former User reporter

    Hi,

    Thanks a lot for the help. I still have this problem of retrieving the codons of the sequence. Since I work with Matlab and not R, I don't understand which option returns the array of 3-letter (or 1-letter) calpha residues of the corresponding pdb id.

  12. Xinqiu Yao

    Try pdbseq() to get 1-letter sequence from a pdb object. aa123() and aa321() to convert between 1-letter and 3-letter codes.

  13. Log in to comment