Create a data file for residue/value mappings

Issue #43 resolved
Barry Grant created an issue

This would be called by aa2mass() and possibly aa321() and aa123().

File should be called resid.mat and contain "aa1", "aa3", and "aaMass" cols, It should be placed in the "bio3d/inst/matrices/" dir.

We then need to update the previously mentioned functions aa2mass() and possibly aa321() and aa123()*.

Comments (11)

  1. Lars Skjærven

    I tried with a version in inst/matrices. Sure we shouldn't put in data/?

    Also, I put previously atom.index.R and sdENM.RData in data/. Is this ok?

  2. Barry Grant reporter

    Depends, if they are binary .rda or as I call them .RData files then put them in data/ if they are scripts, tables or text etc. put them matrices/ Does this not sound sensible?

  3. Lars Skjærven

    aa.mass was included some time ago for this purpose. it's looked up by aa2mass(), but not aa321 and aa123.

    > aa.mass
        aa3 aa1  aaMass       formula                             name
    ALA ALA   A  71.078    C3 H5 N O1                          Alanine
    ARG ARG   R 157.194  C6 H13 N4 O1                         Arginine
    ASN ASN   N 114.103   C4 H6 N2 O2                       Asparagine
    ASP ASP   D 114.079    C4 H4 N O3                    Aspartic Acid
    CYS CYS   C 103.143  C3 H5 N O1 S                          Cystein
    GLN GLN   Q 117.126   C4 H9 N2 O2                        Glutamine
    GLU GLU   E 128.106    C5 H6 N O3                    Glutamic Acid
    GLY GLY   G  57.051    C2 H3 N O1                          Glycine
    HIS HIS   H 137.139   C6 H7 N3 O1                        Histidine
    ILE ILE   I 113.158   C6 H11 N O1                       Isoleucine
    LEU LEU   L 113.158   C6 H11 N O1                          Leucine
    LYS LYS   K 129.180  C6 H13 N2 O1                           Lysine
    MET MET   M 131.196  C5 H9 N O1 S                       Methionine
    PHE PHE   F 147.174    C9 H9 N O1                    Phenylalanine
    PRO PRO   P  97.115    C5 H7 N O1                          Proline
    SER SER   S  87.077    C3 H5 N O2                           Serine
    THR THR   T 101.104    C4 H7 N O2                        Threonine
    TRP TRP   W 186.210 C11 H10 N2 O1                       Tryptophan
    TYR TYR   Y 163.173    C9 H9 N O2                         Tyrosine
    VAL VAL   V  99.131    C5 H9 N O1                           Valine
    ABA ABA   C  85.104   C4 H7 N1 O1          alpha-aminobutyric acid
    ASH ASH   X 115.087    C4 H5 N O3            Aspartic acid Neutral
    CME CME   C 179.260 C5 H9 N O2 S2 s,s-(2-hydroxyethyl)thiocysteine
    CMT CMT   C 117.169  C4 H7 N O1 S                 o-methylcysteine
    CSD CSD   C 133.126  C3 H3 N O3 S          s-cysteinesulfinic acid
    CSO CSO   C 119.142  C3 H5 N O2 S                s-hydroxycysteine
    CSW CSW   X 135.142  C3 H5 N O3 S               cysteine-s-dioxide
    CSX CSX   C 119.142  C3 H5 N O2 S                   s-oxy cysteine
    CYM CYM   C 102.135  C3 H4 N O1 S                 Cystein Negative
    CYX CYX   C 102.135  C3 H4 N O1 S                   Cystein SSbond
    GLH GLH   X 129.114    C5 H7 N O3           Glutatmic acid Neutral
    HID HID   H 137.139   C6 H7 N3 O1                        Histidine
    HIE HIE   H 137.139   C6 H7 N3 O1                        Histidine
    HIP HIP   H 138.147   C6 H8 N3 O1               Histidine Positive
    HSD HSD   H 137.139   C6 H7 N3 O1                        Histidine
    HSE HSE   H 137.139   C6 H7 N3 O1                        Histidine
    HSP HSP   H 138.147   C6 H8 N3 O1               Histidine Positive
    IAS IAS   D 115.087    C4 H5 N O3                    beta-aspartyl
    KCX KCX   X 172.182  C7 H12 N2 O3        lysine nz-carboxylic acid
    LYN LYN   X 128.172  C6 H12 N2 O1                   Lysine Neutral
    MHO MHO   M 147.195  C5 H9 N O2 S                  s-oxymethionine
    MLY MLY   K 156.225  C8 H16 N2 O1                n-dimethyl-lysine
    MSE MSE   M 131.196 C5 H9 N O1 SE                 selenomethionine
    OCS OCS   X 169.156  C3 H7 N O5 S            cysteinesulfonic acid
    PFF PFF   Y 165.164  C9 H8 F N O1         4-fluoro-l-phenylalanine
    PTR PTR   X 243.153 C9 H10 N O5 P                o-phosphotyrosine
    SEP SEP   S 167.057  C3 H6 N O5 P                    phosphoserine
    TPO TPO   T 181.084  C4 H8 N O5 P                 phosphothreonine
    
  4. Barry Grant reporter

    I guess they should all point to the same file/table. However, aa321 etc. are not broken currently so I sugest we just leave this migration on the back burner for whenever we next need to update for other reasons here. One reason to hold off is that for aa123 the table pasted above contains multiple "X" mapping to different residues where is shoukd rtn UNK I think

  5. Lars Skjærven

    right, but can map KCX to K, LYN to L, etc in this table? (as we apparently do here for TPO to T)

  6. Barry Grant reporter

    Yes, if we want to use this for aa321() we should have KCX to K and all the other common non-standard modified residue mappings we have in aa321() currently.

    Note that we can do the mapping of 3-to-1 like this but not uniquely the other way (1 to 3) unless we demand that only the first instance of a single amino acid code maps to the standard 3 letter version (i.e. H to HIS before H to HSD etc. (as it is in the pasted table in the earlier message)).

    I see you have linked this issue to issue #82, for that are you proposing that atom.select() use this list of of protein resid's also in place of the hardcoded 'prot.aa' variable on line 71 of that function?

  7. Lars Skjærven

    The list of residue names should perhaps primarily be used for sequence stuff (e.g. mapping 321), while the VMD approach for atom.select(). We should anyway aim to have only one list of residue names which I was hoping would solve some of these issues, and at the same time be easier to maintain. In the current version there are three slightly deviating lists (atom.select, aa321/aa123, and table aa.mass).

  8. Barry Grant reporter

    Lets start by using this list for atom.select() and 321 conversion. Migrating atom.select() to the VMD approach, which is based on atom names within a residue, will likely cause other issues. Having a single consistent list as you say will be an improvement.

  9. Log in to comment