NA values in contact map while mask.lower=FALSE

Issue #301 resolved

Former User created an issue 2015-11-09

Hi,

I am using the bio3d package to create contact matrices for pdf files. This is the code for one pdb file:

pdb <- read.pdb("1a05.pdb")
inds <- atom.select(pdb, "calpha")
cm <- cmap( pdb$xyz[inds$xyz], dcut=5, mask.lower=FALSE)
diag(cm)<-1

Although I have made the mask.lower false and made the diagonal equal 1, still there are a lot of NA value sin the matrix ! As you can see in the screen shot attached.

Screenshot from 2015-11-10 00:31:37.png

Comments (29)

Former User reporter
- edited description
- 2015-11-09T22:34:36+00:00
Former User reporter
- edited description
- attached Screenshot from 2015-11-10 00:31:37.png
- 2015-11-09T22:35:39+00:00
Former User reporter
- edited description
- 2015-11-09T22:36:10+00:00
Xinqiu Yao
Hi,

These NA values are for consecutive residue pairs, which are not included in contact calculation since the set of dcut option. You can simply remove these NAs by cm[is.na(cm)] <- 0. Then all NAs will be 0. Let me know if you have any problem!
- 2015-11-09T23:35:41+00:00
Former User reporter
Thanks a lot! The problem seems to be resolved!
- 2015-11-10T13:02:30+00:00
Former User reporter
- changed status to resolved
- 2015-11-10T13:04:01+00:00
Former User reporter
One more question, how could I know the order of the amino acids in the matrix? Is the order based on the sequence?
- 2015-11-11T18:40:32+00:00
Former User reporter
- changed status to new
- 2015-11-11T18:42:02+00:00
Xinqiu Yao
Yes, the order is the same as the sequence.

A good practice is to label rows and columns of the cmap matrix with residue numbers. For example, rownames(cm) <- pdb$atom[inds$atom, 'resno']; colnames(cm) <- pdb$atom[inds$atom, 'resno']. Then, when you want to know if the residues e.g. 100 and 200 contact or not, just type cm['100', '200'].

Hope it helps.
- 2015-11-11T19:21:29+00:00
Xinqiu Yao
- changed component to Q&A
- changed version to v2.1
- marked as task
- marked as minor
- 2015-11-12T19:57:00+00:00
Former User reporter
Thanks a lot for the help. Actually, I need to build a matrix of labels for the amino acid (each column correspond to one label) whose order is the same as the order of the contact matrix. The labels that I want to use are weight, charge, and hydrophobicity.
- 2015-11-13T09:49:47+00:00
Former User reporter
I just checked the length of the sequence of one of the enzymes (whose pdb id is 1A4S) and got 503 as the number of the amino acids in the sequence. However, the corresponding contact matrix is of the size 2012 x 2012. Could you please elaborate on this?
- 2015-11-13T10:57:32+00:00

Lars Skjærven

> pdb <- read.pdb("1A4S")
  Note: Accessing on-line PDB file
trying URL 'http://www.rcsb.org/pdb/files/1A4S.pdb'
downloaded 1.3 MB

> sele = atom.select(pdb, "calpha")
> sele

 Call:  atom.select.pdb(pdb = pdb, string = "calpha")

   Atom Indices#: 2012  ($atom)
   XYZ  Indices#: 6036  ($xyz)

+ attr: atom, xyz, call

> sum(pdb$calpha)
[1] 2012

Looks like the protein has 2012 calpha atoms. Lars

2015-11-13T11:41:47+00:00

Former User reporter
But, then for the enzyme with the pdb id "1A5Z" the dimensions of the contact matrix is 312 x 312, while the sequence length is 319, which means 7 amino acids are missing in the contact matrix. This is the case also in many other contact matrices.
- 2015-11-13T12:04:51+00:00

Lars Skjærven

Looks like there are 312 calpha atoms here. How did you get 319?

> pdb <- read.pdb("1A5Z")
  Note: Accessing on-line PDB file
trying URL 'http://www.rcsb.org/pdb/files/1A5Z.pdb'
downloaded 283 KB

> pdb

 Call:  read.pdb(file = "1A5Z")

   Total Models#: 1
     Total Atoms#: 2852,  XYZs#: 8556  Chains#: 1  (values: A)

     Protein Atoms#: 2415  (residues/Calpha atoms#: 312)
     Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)

     Non-protein/nucleic Atoms#: 437  (residues: 351)
     Non-protein/nucleic resid values: [ CD (3), FBP (2), HOH (344), NAD (1), OXM (1) ]

   Protein sequence:
      MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG
      DYADLKGSDVVIVAAGVPQKPGETRLQLLGRNARVMKEIARNVSKYAPDSIVIVVTNPVD
      VLTYFFLKESGMDPRKVFGSGTVLDTARLRTLIAQHCGFSPRSVHVYVIGEHGDSEVPVW
      SGAMIGGIPLQNMCQVCQKCDSKILENFAEKTKRAAYEIIERKGA...<cut>...AEEN

+ attr: atom, xyz, seqres, helix, sheet,
        calpha, remark, call


> sele = atom.select(pdb, "calpha")
> sele

 Call:  atom.select.pdb(pdb = pdb, string = "calpha")

   Atom Indices#: 312  ($atom)
   XYZ  Indices#: 936  ($xyz)

+ attr: atom, xyz, call

> pdb <- trim(pdb, "protein")
> grpby <- paste(pdb$atom$resno, pdb$atom$insert, pdb$atom$chain)
> length(unique(grpby))
[1] 312

> cm2 <- cmap(pdb$xyz, grpby=grpby)
> dim(cm2)
[1] 312 312

Bug found:

> cm <- cmap( pdb )
> dim(cm)
[1] 313 313

Fix: add pdb$atom$insert in paste statement here.

2015-11-13T13:35:24+00:00

Lars Skjærven
- marked as bug
- changed component to ToDo
- assigned issue to
  
  Lars Skjærven
- changed version to v2.3 [future]
bug found in cmap.pdb()
- 2015-11-13T13:36:58+00:00
Lars Skjærven
bugfix with commit 55130ca
- 2015-11-13T13:48:28+00:00
Barry Grant
- changed status to resolved
- 2015-11-18T14:39:01+00:00
Former User reporter
I just encountered another problem. If I set cm[is.na(cm)] <- 0, then the matrix loses its symmetry.
- 2015-11-21T16:23:23+00:00
Xinqiu Yao
- changed status to open
User still has a question
- 2015-11-22T20:49:18+00:00
Xinqiu Yao
- marked as task
- edited description
- 2015-11-22T20:50:24+00:00
Xinqiu Yao
- marked as bug
- 2015-11-22T20:50:53+00:00
Former User reporter
Even if I set the NA values to 1, the matrix is still asymmetric.
- 2015-11-22T21:07:23+00:00
Xinqiu Yao
Hi,

Thanks for spotting this. It is a bug and has been fixed with this commit. Please update your local bio3d with the latest development version from master or patched version from releases branch (See here for instructions).

Let me know if you still have any problem.
- 2015-11-22T21:21:49+00:00
Former User reporter
Hi,

Thanks a lot for the help. I still have this problem of retrieving the codons of the sequence. Since I work with Matlab and not R, I don't understand which option returns the array of 3-letter (or 1-letter) calpha residues of the corresponding pdb id.
- 2015-11-22T21:29:21+00:00
Xinqiu Yao
Try pdbseq() to get 1-letter sequence from a pdb object. aa123() and aa321() to convert between 1-letter and 3-letter codes.
- 2015-11-22T22:39:23+00:00
Former User reporter
Many thanks! Now I got everything I was looking for :)
- 2015-11-23T13:02:15+00:00
Barry Grant
- changed status to resolved
- 2015-11-29T15:39:04+00:00
Xinqiu Yao
- changed version to v2.3 [devel]
- 2016-09-21T15:00:12+00:00
Log in to comment

Assignee: Lars Skjærven

Type: bug

Priority: minor

Status: resolved

Component: ToDo

Version: v2.3 [devel]

Votes: 0

Watchers: 3