PCA with non-standard residue name

Issue #563 resolved
m.mohan@northeastern.edu created an issue

Dear Bio3d team,

I am having issue with PCA analysis with non-standard residue name.

I had the same issue with NM analysis (please refer issue 537). In that issue it was suggested that either I use custom mass or add frame to aa.table.

With both solutions suggested before calls do not work. Please take a look at NMA issue and results listed below. I opened above mentioned issue recently.

previously I got around the problem with physically changing CYR to CYS since it was modified Cystein with label attached to it, for initial results just to run some tests.

However currently for both analysis PCA (mass-weighted) and NMA above mentioned work around cannot work since mass of Cystein and Cystein with label is different.

I would appreciate you help.

I am listing dataframe addition and PCA call results below:

aa.table = rbind(aa.table, CYR=c("CYR", "C", 287.443, NA, NA))

pc <- pca.xyz(xyz[,ca.inds$xyz], mass=pdb) Error in pca.xyz(xyz[, ca.inds$xyz], mass = pdb) : Input mass vector does not match xyz

I am listing table below:

aa3 aa1    mass       formula                             name

ALA ALA A 71.078 C3 H5 N O1 Alanine ARG ARG R 157.194 C6 H13 N4 O1 Arginine ASN ASN N 114.103 C4 H6 N2 O2 Asparagine ASP ASP D 114.079 C4 H4 N O3 Aspartic Acid CYS CYS C 103.143 C3 H5 N O1 S Cystein GLN GLN Q 117.126 C4 H9 N2 O2 Glutamine GLU GLU E 128.106 C5 H6 N O3 Glutamic Acid GLY GLY G 57.051 C2 H3 N O1 Glycine HIS HIS H 137.139 C6 H7 N3 O1 Histidine ILE ILE I 113.158 C6 H11 N O1 Isoleucine LEU LEU L 113.158 C6 H11 N O1 Leucine LYS LYS K 129.18 C6 H13 N2 O1 Lysine MET MET M 131.196 C5 H9 N O1 S Methionine PHE PHE F 147.174 C9 H9 N O1 Phenylalanine PRO PRO P 97.115 C5 H7 N O1 Proline SER SER S 87.077 C3 H5 N O2 Serine THR THR T 101.104 C4 H7 N O2 Threonine TRP TRP W 186.21 C11 H10 N2 O1 Tryptophan TYR TYR Y 163.173 C9 H9 N O2 Tyrosine VAL VAL V 99.131 C5 H9 N O1 Valine ABA ABA X 85.104 C4 H7 N1 O1 alpha-aminobutyric acid ASH ASH D 115.087 C4 H5 N O3 Aspartic acid Neutral CIR CIR R 157.17 C6 H11 N3 O2 citrulline CME CME C 179.26 C5 H9 N O2 S2 s,s-(2-hydroxyethyl)thiocysteine CMT CMT C 115.154 C4 H5 N O1 S o-methylcysteine CSD CSD C 134.134 C3 H4 N O3 S s-cysteinesulfinic acid CSO CSO C 119.142 C3 H5 N O2 S s-hydroxycysteine CSW CSW C 135.142 C3 H5 N O3 S cysteine-s-dioxide CSX CSX C 119.142 C3 H5 N O2 S s-oxy cysteine CYM CYM C 102.135 C3 H4 N O1 S Cystein Negative CYX CYX C 102.135 C3 H4 N O1 S Cystein SSbond DDE DDE H 280.346 C13 H22 N5 O2 diphthamide GLH GLH G 129.114 C5 H7 N O3 Glutatmic acid Neutral HID HID H 137.139 C6 H7 N3 O1 Histidine HIE HIE H 137.139 C6 H7 N3 O1 Histidine HIP HIP H 138.147 C6 H8 N3 O1 Histidine Positive HSD HSD H 137.139 C6 H7 N3 O1 Histidine HSE HSE H 137.139 C6 H7 N3 O1 Histidine HSP HSP H 138.147 C6 H8 N3 O1 Histidine Positive IAS IAS D 115.087 C4 H5 N O3 beta-aspartyl KCX KCX K 172.182 C7 H12 N2 O3 lysine nz-carboxylic acid LYN LYN K 129.18 C6 H13 N2 O1 Lysine Neutral MHO MHO M 147.195 C5 H9 N O2 S s-oxymethionine MLY MLY K 156.225 C8 H16 N2 O1 n-dimethyl-lysine MSE MSE M 178.091 C5 H9 N O1 SE selenomethionine OCS OCS C 151.141 C3 H5 N O4 S cysteinesulfonic acid PFF PFF F 165.164 C9 H8 F N O1 4-fluoro-l-phenylalanine PTR PTR Y 243.153 C9 H10 N O5 P o-phosphotyrosine SEP SEP S 167.057 C3 H6 N O5 P phosphoserine TPO TPO T 181.084 C4 H8 N O5 P phosphothreonine CYR CYR C 287.443 <NA> <NA>

Comments (14)

  1. Xinqiu Yao

    Hi,

    Could you provide an example pdb that we can reproduce your errors? That would save a lot time for us to find out the reason. Thanks.

  2. m.mohan@northeastern.edu reporter

    Thank you for your response.

    I has attached a pdb file to my response via email. I am not sure where to attach file here.

    Mamta

  3. Xinqiu Yao

    You can edit your first post and there is an option to add attachment. Alternatively, send the file to xinqiu.yao@gmail.com

  4. Xinqiu Yao

    Hi,

    For your previous problem mentioned in issue #537, I found the reason is the residue name "CYR1", which is "illegal" because PDB requires a 3-letter name. Change it to "CYR" and provide a mass solve the problem. See,

    pdb <- read.pdb('T4L72131_combined_P1_27ns_protein_CYR.pdb')
    inds <- atom.select(pdb, 'calpha')
    modes <- nma(pdb, inds=inds, mass=TRUE, mass.custom=list("CYR"=260))
     Building Hessian...        Done in 0.05 seconds.
     Diagonalizing Hessian...   Done in 0.17 seconds.
    Warning message:
    In nma.pdb(pdb, inds = inds, mass = TRUE, mass.custom = list(CYR = 260)) :
      Possible multi-chain structure or missing in-structure residue(s) present
      Fluctuations at neighboring positions may be affected.
    
    modes
    
    Call:
      nma.pdb(pdb = pdb, inds = inds, mass = TRUE, mass.custom = list(CYR = 260))
    
    Class:
      VibrationalModes (nma)
    
    Number of modes:
      492 (6 trivial)
    
    Frequencies:
      Mode 7:   0.011
      Mode 8:   0.012
      Mode 9:   0.016
      Mode 10:  0.021
      Mode 11:  0.022
      Mode 12:  0.026
    
    + attr: modes, frequencies, force.constants, fluctuations,
            U, L, xyz, mass, temp, triv.modes, natoms, call
    

    For the pca problem here, I couldn't test it because I don't have the 'xyz'. Make sure you don't have the same residue name error in the input pdb. But, I can also have a look if you can provide a short example 'xyz' data.

    Hope it helps.

  5. m.mohan@northeastern.edu reporter

    Dear Xin,

    Thank you very much. I will try today. Please wait for PCA. It think it is the same issue. if it is not I will write and give you small xyz file as well. I have a feeling it might get resolved by name change. The issue you found never even occurred to me.

    Thank you again. Mamta

  6. m.mohan@northeastern.edu reporter

    Dear Xin,

    I am sorry to reopen issue.

    I am attaching 1000 frame dcd with mail.

    PCA is still having same issue.

    Please find error below:

    pc <- pca.xyz(xyz[,ca.inds$xyz], mass=pdb) Error in pca.xyz(xyz[, ca.inds$xyz], mass = pdb) : Input mass vector does not match xyz

  7. Xinqiu Yao

    Yes, pca.xyz() does not recognize non-standard amino acids. You need to calculate masses beforehand and feed them to pca(). For example

    mass <- aa2mass(pdb$atom$resid[ca.inds$atom], mass.custom=list('CYR'=264))
    pca.xyz(xyz[, ca.inds$xyz], mass=mass)
    Call:
      pca.xyz(xyz = xyz[, ca.inds$xyz], mass = mass)
    
    Class:
      pca
    
    Number of eigenvalues:
      492
    
            Eigenvalue Variance Cumulative
       PC 1   1895.059   21.820     21.820
       PC 2   1291.398   14.869     36.689
       PC 3    866.588    9.978     46.667
       PC 4    441.286    5.081     51.748
       PC 5    339.433    3.908     55.656
       PC 6    289.820    3.337     58.993
    
       (Obtained from 1001 conformers with 492 xyz input values).
    
    + attr: L, U, z, au, sdev, mean, mass, call
    
  8. Log in to comment