- edited description
NA values in contact map while mask.lower=FALSE
Hi,
I am using the bio3d package to create contact matrices for pdf files. This is the code for one pdb file:
pdb <- read.pdb("1a05.pdb")
inds <- atom.select(pdb, "calpha")
cm <- cmap( pdb$xyz[inds$xyz], dcut=5, mask.lower=FALSE)
diag(cm)<-1
Although I have made the mask.lower false and made the diagonal equal 1, still there are a lot of NA value sin the matrix ! As you can see in the screen shot attached.
Comments (29)
-
reporter -
reporter - edited description
- attached Screenshot from 2015-11-10 00:31:37.png
-
reporter - edited description
-
Hi,
These NA values are for consecutive residue pairs, which are not included in contact calculation since the set of dcut option. You can simply remove these NAs by
cm[is.na(cm)] <- 0
. Then all NAs will be 0. Let me know if you have any problem! -
reporter Thanks a lot! The problem seems to be resolved!
-
reporter - changed status to resolved
-
reporter One more question, how could I know the order of the amino acids in the matrix? Is the order based on the sequence?
-
reporter - changed status to new
-
Yes, the order is the same as the sequence.
A good practice is to label rows and columns of the cmap matrix with residue numbers. For example,
rownames(cm) <- pdb$atom[inds$atom, 'resno']; colnames(cm) <- pdb$atom[inds$atom, 'resno']
. Then, when you want to know if the residues e.g. 100 and 200 contact or not, just typecm['100', '200']
.Hope it helps.
-
-
reporter Thanks a lot for the help. Actually, I need to build a matrix of labels for the amino acid (each column correspond to one label) whose order is the same as the order of the contact matrix. The labels that I want to use are weight, charge, and hydrophobicity.
-
reporter I just checked the length of the sequence of one of the enzymes (whose pdb id is 1A4S) and got 503 as the number of the amino acids in the sequence. However, the corresponding contact matrix is of the size 2012 x 2012. Could you please elaborate on this?
-
> pdb <- read.pdb("1A4S") Note: Accessing on-line PDB file trying URL 'http://www.rcsb.org/pdb/files/1A4S.pdb' downloaded 1.3 MB > sele = atom.select(pdb, "calpha") > sele Call: atom.select.pdb(pdb = pdb, string = "calpha") Atom Indices#: 2012 ($atom) XYZ Indices#: 6036 ($xyz) + attr: atom, xyz, call > sum(pdb$calpha) [1] 2012
Looks like the protein has 2012 calpha atoms. Lars
-
reporter But, then for the enzyme with the pdb id "1A5Z" the dimensions of the contact matrix is 312 x 312, while the sequence length is 319, which means 7 amino acids are missing in the contact matrix. This is the case also in many other contact matrices.
-
Looks like there are 312 calpha atoms here. How did you get 319?
> pdb <- read.pdb("1A5Z") Note: Accessing on-line PDB file trying URL 'http://www.rcsb.org/pdb/files/1A5Z.pdb' downloaded 283 KB > pdb Call: read.pdb(file = "1A5Z") Total Models#: 1 Total Atoms#: 2852, XYZs#: 8556 Chains#: 1 (values: A) Protein Atoms#: 2415 (residues/Calpha atoms#: 312) Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) Non-protein/nucleic Atoms#: 437 (residues: 351) Non-protein/nucleic resid values: [ CD (3), FBP (2), HOH (344), NAD (1), OXM (1) ] Protein sequence: MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG DYADLKGSDVVIVAAGVPQKPGETRLQLLGRNARVMKEIARNVSKYAPDSIVIVVTNPVD VLTYFFLKESGMDPRKVFGSGTVLDTARLRTLIAQHCGFSPRSVHVYVIGEHGDSEVPVW SGAMIGGIPLQNMCQVCQKCDSKILENFAEKTKRAAYEIIERKGA...<cut>...AEEN + attr: atom, xyz, seqres, helix, sheet, calpha, remark, call > sele = atom.select(pdb, "calpha") > sele Call: atom.select.pdb(pdb = pdb, string = "calpha") Atom Indices#: 312 ($atom) XYZ Indices#: 936 ($xyz) + attr: atom, xyz, call > pdb <- trim(pdb, "protein") > grpby <- paste(pdb$atom$resno, pdb$atom$insert, pdb$atom$chain) > length(unique(grpby)) [1] 312 > cm2 <- cmap(pdb$xyz, grpby=grpby) > dim(cm2) [1] 312 312
Bug found:
> cm <- cmap( pdb ) > dim(cm) [1] 313 313
Fix: add pdb$atom$insert in paste statement here.
-
- marked as bug
- changed component to ToDo
-
assigned issue to
- changed version to v2.3 [future]
bug found in cmap.pdb()
-
bugfix with commit 55130ca
-
- changed status to resolved
-
reporter I just encountered another problem. If I set cm[is.na(cm)] <- 0, then the matrix loses its symmetry.
-
- changed status to open
User still has a question
-
- marked as task
- edited description
-
- marked as bug
-
reporter Even if I set the NA values to 1, the matrix is still asymmetric.
-
-
reporter Hi,
Thanks a lot for the help. I still have this problem of retrieving the codons of the sequence. Since I work with Matlab and not R, I don't understand which option returns the array of 3-letter (or 1-letter) calpha residues of the corresponding pdb id.
-
Try pdbseq() to get 1-letter sequence from a pdb object. aa123() and aa321() to convert between 1-letter and 3-letter codes.
-
reporter Many thanks! Now I got everything I was looking for :)
-
- changed status to resolved
-
- changed version to v2.3 [devel]
- Log in to comment