How to use Bio3D to carry out a cluster and Principal Components analysis (PCA) of DNA structures
I am interesting to use Bio3D to carry out a cluster and Principal Components analysis (PCA)of some pdb files3D DNA structures. I have several problems when I use the program for DNA structures. My question is: Can I use Bio3D for the analysis of DNA structures or this program is only working with protein structures? I would like to run the PCA and cluster analysis of DNA structures which the PDB codes are : 2MO2, 2MO7, 2LIA, 2L7D, 2KAL, 2K0V, 2Z2H, 2Z2H, 2NPW, 2NQ0, 2NQ0, 2NQ1, 2NQ4, 2HKB. I want to select only the first structure of the NMR file and X-ray for the PCA and analysis and the atoms that I would like to analyze in all structures are: a) the carbons in the bases. b) the nitrogens in the bases c) phospates I am interesting to know the steps to use this program for DNA structures. Thank you very much for your help
Comments (7)
-
-
Thank you very much for your answer!! I was trying to use your code with this example where I have selected only the guanines of these DNAs file to carry out the PCA analysis. Every is working OK until I try to save the PCA result as a PDB file using the code
mktrj.pca(pc, pc=1, file="pc1.pdb", resno=pdb$atom$resno[sele$atom], elety=pdb$atom$elety[sele$atom], resid=pdb$atom$resid[sele$atom], )
This is the error that I got
Error in write.pdb(xyz = coor, file = file, ...) : write.pdb: the lengths of all input vectors != 'length(xyz)/3'.
But when I used the code
mktrj.pca(pc, pc=1, file="pc1.pdb")
I get a PDB file like this:
MODEL 1 ATOM 1 CA ALA 1 9.344 -15.030 14.086 1.00 0.00
ATOM 2 CA ALA 2 9.650 -15.839 13.823 1.00 0.00
ATOM 3 CA ALA 3 9.070 -14.059 13.564 1.00 0.00
………I do not know what the problem is in order to write a PDB file. How can I get a correct PDBfile?
In addition, I would like to know how I can write the labels (PDB ID) of the DNA structures in the plot of PC1 vs PC2.
######Input code # fetch pdbs pdbcodes = c("2MO2", "2MO7", "2LIA", "2L7D") files = get.pdb(pdbcodes) # matrix to store xyz coordinates xyz <- NULL for (n in 1:4) { pdb = read.pdb(files[n]) sele = atom.select(pdb, "noh", resid="DG") xyz <- rbind(xyz, pdb$xyz[1, sele$xyz]) } # superimpose xyz = fit.xyz(fixed = xyz[1,], mobile=xyz) # PCA pc = pca.xyz(xyz) #plot PCA results plot(pc) mktrj.pca(pc, pc=1, file="pc1.pdb", resno=pdb$atom$resno[sele$atom], elety=pdb$atom$elety[sele$atom], resid=pdb$atom$resid[sele$atom], )
****output of mktrj.pca(pc, pc=1, file="pc1.pdb", resno=pdb$atom$resno[sele$atom], elety=pdb$atom$elety[sele$atom], resid=pdb$atom$resid[sele$atom],) function is
Error in write.pdb(xyz = coor, file = file, ...) : write.pdb: the lengths of all input vectors != 'length(xyz)/3'.
-
Hi Rafa, Note the warnings appearing in your for-loop.
Warning messages: 1: In rbind(xyz, pdb$xyz[1, sele$xyz]) : number of columns of result is not a multiple of vector length (arg 2) 2: In rbind(xyz, pdb$xyz[1, sele$xyz]) : number of columns of result is not a multiple of vector length (arg 2) 3: In rbind(xyz, pdb$xyz[1, sele$xyz]) : number of columns of result is not a multiple of vector length (arg 2)
The problem is that you select 176, 132, 126, and 132 atoms for the four PDBs, and then attempt to merge the XYZ coordinates of these into one matrix. As I mentioned in the previous reply you need to make sure that you select the same number of atoms for each PDB. That was why I didn't use a for-loop in my example, but wrote the atom selection tailored for each PDB. atom.select(pdb, "noh", resid="DG") selects all atoms within residues with name DG - which would be find if the PDBs had equal number of DG residues. Lars
-
-
- changed component to ToDo
- changed version to v2.3 [future]
- marked as enhancement
- marked as minor
-
- changed status to resolved
Assumed to be resolved, please re-open if required. We have a long-term, low priority ToDo item for being more friendly to nucleic acid containing structures.
-
- changed version to v2.3 [devel]
- Log in to comment
Hi, This is fully doable with bio3d, but unfortunately it will require a bit more coding than what it would for proteins. The main reason is that the pdbaln() function is not designed for nucleic acids at the moment. Thus, you'll have to go over each pdb file and fetch the corresponding coordinates.
This is obviously only for the two first PDBs in your list and only the two first residues, but you get the picture. Use atom.select() to retrieve only the atoms you are interested in (i.e. all pdbs doesn't contain the same sequence).
Hope this helps! Lars