PCA with non-standard residue name
Dear Bio3d team,
I am having issue with PCA analysis with non-standard residue name.
I had the same issue with NM analysis (please refer issue 537). In that issue it was suggested that either I use custom mass or add frame to aa.table.
With both solutions suggested before calls do not work. Please take a look at NMA issue and results listed below. I opened above mentioned issue recently.
previously I got around the problem with physically changing CYR to CYS since it was modified Cystein with label attached to it, for initial results just to run some tests.
However currently for both analysis PCA (mass-weighted) and NMA above mentioned work around cannot work since mass of Cystein and Cystein with label is different.
I would appreciate you help.
I am listing dataframe addition and PCA call results below:
aa.table = rbind(aa.table, CYR=c("CYR", "C", 287.443, NA, NA))
pc <- pca.xyz(xyz[,ca.inds$xyz], mass=pdb) Error in pca.xyz(xyz[, ca.inds$xyz], mass = pdb) : Input mass vector does not match xyz
I am listing table below:
aa3 aa1 mass formula name
ALA ALA A 71.078 C3 H5 N O1 Alanine ARG ARG R 157.194 C6 H13 N4 O1 Arginine ASN ASN N 114.103 C4 H6 N2 O2 Asparagine ASP ASP D 114.079 C4 H4 N O3 Aspartic Acid CYS CYS C 103.143 C3 H5 N O1 S Cystein GLN GLN Q 117.126 C4 H9 N2 O2 Glutamine GLU GLU E 128.106 C5 H6 N O3 Glutamic Acid GLY GLY G 57.051 C2 H3 N O1 Glycine HIS HIS H 137.139 C6 H7 N3 O1 Histidine ILE ILE I 113.158 C6 H11 N O1 Isoleucine LEU LEU L 113.158 C6 H11 N O1 Leucine LYS LYS K 129.18 C6 H13 N2 O1 Lysine MET MET M 131.196 C5 H9 N O1 S Methionine PHE PHE F 147.174 C9 H9 N O1 Phenylalanine PRO PRO P 97.115 C5 H7 N O1 Proline SER SER S 87.077 C3 H5 N O2 Serine THR THR T 101.104 C4 H7 N O2 Threonine TRP TRP W 186.21 C11 H10 N2 O1 Tryptophan TYR TYR Y 163.173 C9 H9 N O2 Tyrosine VAL VAL V 99.131 C5 H9 N O1 Valine ABA ABA X 85.104 C4 H7 N1 O1 alpha-aminobutyric acid ASH ASH D 115.087 C4 H5 N O3 Aspartic acid Neutral CIR CIR R 157.17 C6 H11 N3 O2 citrulline CME CME C 179.26 C5 H9 N O2 S2 s,s-(2-hydroxyethyl)thiocysteine CMT CMT C 115.154 C4 H5 N O1 S o-methylcysteine CSD CSD C 134.134 C3 H4 N O3 S s-cysteinesulfinic acid CSO CSO C 119.142 C3 H5 N O2 S s-hydroxycysteine CSW CSW C 135.142 C3 H5 N O3 S cysteine-s-dioxide CSX CSX C 119.142 C3 H5 N O2 S s-oxy cysteine CYM CYM C 102.135 C3 H4 N O1 S Cystein Negative CYX CYX C 102.135 C3 H4 N O1 S Cystein SSbond DDE DDE H 280.346 C13 H22 N5 O2 diphthamide GLH GLH G 129.114 C5 H7 N O3 Glutatmic acid Neutral HID HID H 137.139 C6 H7 N3 O1 Histidine HIE HIE H 137.139 C6 H7 N3 O1 Histidine HIP HIP H 138.147 C6 H8 N3 O1 Histidine Positive HSD HSD H 137.139 C6 H7 N3 O1 Histidine HSE HSE H 137.139 C6 H7 N3 O1 Histidine HSP HSP H 138.147 C6 H8 N3 O1 Histidine Positive IAS IAS D 115.087 C4 H5 N O3 beta-aspartyl KCX KCX K 172.182 C7 H12 N2 O3 lysine nz-carboxylic acid LYN LYN K 129.18 C6 H13 N2 O1 Lysine Neutral MHO MHO M 147.195 C5 H9 N O2 S s-oxymethionine MLY MLY K 156.225 C8 H16 N2 O1 n-dimethyl-lysine MSE MSE M 178.091 C5 H9 N O1 SE selenomethionine OCS OCS C 151.141 C3 H5 N O4 S cysteinesulfonic acid PFF PFF F 165.164 C9 H8 F N O1 4-fluoro-l-phenylalanine PTR PTR Y 243.153 C9 H10 N O5 P o-phosphotyrosine SEP SEP S 167.057 C3 H6 N O5 P phosphoserine TPO TPO T 181.084 C4 H8 N O5 P phosphothreonine CYR CYR C 287.443 <NA> <NA>
Comments (14)
-
-
reporter Thank you for your response.
Please find attached PDB file.
Mamta
-
reporter Thank you for your response.
I has attached a pdb file to my response via email. I am not sure where to attach file here.
Mamta
-
You can edit your first post and there is an option to add attachment. Alternatively, send the file to xinqiu.yao@gmail.com
-
reporter Please find attached PDB file
Mamta
-
Hi,
For your previous problem mentioned in issue
#537, I found the reason is the residue name "CYR1", which is "illegal" because PDB requires a 3-letter name. Change it to "CYR" and provide a mass solve the problem. See,pdb <- read.pdb('T4L72131_combined_P1_27ns_protein_CYR.pdb') inds <- atom.select(pdb, 'calpha') modes <- nma(pdb, inds=inds, mass=TRUE, mass.custom=list("CYR"=260)) Building Hessian... Done in 0.05 seconds. Diagonalizing Hessian... Done in 0.17 seconds. Warning message: In nma.pdb(pdb, inds = inds, mass = TRUE, mass.custom = list(CYR = 260)) : Possible multi-chain structure or missing in-structure residue(s) present Fluctuations at neighboring positions may be affected. modes Call: nma.pdb(pdb = pdb, inds = inds, mass = TRUE, mass.custom = list(CYR = 260)) Class: VibrationalModes (nma) Number of modes: 492 (6 trivial) Frequencies: Mode 7: 0.011 Mode 8: 0.012 Mode 9: 0.016 Mode 10: 0.021 Mode 11: 0.022 Mode 12: 0.026 + attr: modes, frequencies, force.constants, fluctuations, U, L, xyz, mass, temp, triv.modes, natoms, call
For the pca problem here, I couldn't test it because I don't have the 'xyz'. Make sure you don't have the same residue name error in the input pdb. But, I can also have a look if you can provide a short example 'xyz' data.
Hope it helps.
-
reporter Dear Xin,
Thank you very much. I will try today. Please wait for PCA. It think it is the same issue. if it is not I will write and give you small xyz file as well. I have a feeling it might get resolved by name change. The issue you found never even occurred to me.
Thank you again. Mamta
-
- changed status to resolved
User input issues. In the future we may wish to add warning msg's for such things.
-
reporter - changed status to open
-
reporter - attached dcd_1000_frames.dcd
-
reporter Dear Xin,
I am sorry to reopen issue.
I am attaching 1000 frame dcd with mail.
PCA is still having same issue.
Please find error below:
pc <- pca.xyz(xyz[,ca.inds$xyz], mass=pdb) Error in pca.xyz(xyz[, ca.inds$xyz], mass = pdb) : Input mass vector does not match xyz
-
Yes,
pca.xyz()
does not recognize non-standard amino acids. You need to calculate masses beforehand and feed them to pca(). For examplemass <- aa2mass(pdb$atom$resid[ca.inds$atom], mass.custom=list('CYR'=264)) pca.xyz(xyz[, ca.inds$xyz], mass=mass) Call: pca.xyz(xyz = xyz[, ca.inds$xyz], mass = mass) Class: pca Number of eigenvalues: 492 Eigenvalue Variance Cumulative PC 1 1895.059 21.820 21.820 PC 2 1291.398 14.869 36.689 PC 3 866.588 9.978 46.667 PC 4 441.286 5.081 51.748 PC 5 339.433 3.908 55.656 PC 6 289.820 3.337 58.993 (Obtained from 1001 conformers with 492 xyz input values). + attr: L, U, z, au, sdev, mean, mass, call
-
reporter Thank you.
-
- changed status to resolved
- Log in to comment
Hi,
Could you provide an example pdb that we can reproduce your errors? That would save a lot time for us to find out the reason. Thanks.