Specifying modifications

Issue #2 new
Anonymous created an issue

Hi,

I am trying to use IdentiPy to process some TMT 10-plex labeled samples. However, I am having some trouble finding details on the notation I should use. I have tried with adding them to the configuration file by specifying the TMT modification mass as something like:

Tmt = 229.162932 fixed = TmtK, Tmt-

But it doesn't work. It gives an error that it can't calculate the mass.

INFO: [10:21:04] 40844 spectra pass quality criteria. Traceback (most recent call last): File "/home/chughes/MyPython/bin/identipy", line 11, in <module> sys.exit(run()) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/cli.py", line 187, in run utils.write_output(inputfile, settings, main.process_file(inputfile, settings)) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 1208, in write_output return writer(inputfile, settings, results) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 921, in write_pepxml results = [x for x in results if x['candidates'].size] File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/peptide_centric.py", line 239, in process_peptides kwargs = prepare_peptide_processor(fname, settings) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/peptide_centric.py", line 92, in prepare_peptide_processor aa_mass = utils.get_aa_mass(settings) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 681, in get_aa_mass m, aa = parser._split_label(mod) ValueError: need more than 1 value to unpack

If I try doing it directly from the command line as 229.162932@K, 229.162932@-, it also does not work.

identipy ch_16Feb2018_SP3low-SP3high-SP3Mag-TMT10_hph_6_MGF.mgf -db /projects/ptx_analysis/chughes/databases/apr2018/uniprot_human-crap-apr2018_FWD_concatenated_target_decoy.fasta -punit ppm -ptol 20 -funit Da -ftol 0.05 -cmin 2 -cmax 4 -mc 2 -fmods 57.021464@C,229.162932@K,229.162932@- -vmods 15.994915@M,42.01056@-

INFO: [10:19:21] 40844 spectra pass quality criteria. Traceback (most recent call last): File "/home/chughes/MyPython/bin/identipy", line 11, in <module> sys.exit(run()) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/cli.py", line 187, in run utils.write_output(inputfile, settings, main.process_file(inputfile, settings)) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 1208, in write_output return writer(inputfile, settings, results) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 921, in write_pepxml results = [x for x in results if x['candidates'].size] File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/peptide_centric.py", line 239, in process_peptides kwargs = prepare_peptide_processor(fname, settings) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/peptide_centric.py", line 92, in prepare_peptide_processor aa_mass = utils.get_aa_mass(settings) File "/home/chughes/MyPython/lib/python2.7/site-packages/identipy/utils.py", line 681, in get_aa_mass m, aa = parser._split_label(mod) File "/home/chughes/MyPython/lib/python2.7/site-packages/pyteomics/parser.py", line 200, in _split_label raise PyteomicsError('Cannot split a non-modX label: %s' % label) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: 'Cannot split a non-modX label: aaa-'

Any input on how to correctly input custom modifications like this?

Thanks, Chris

Comments (11)

  1. Mark Ivanov

    Hi Chris!

    Thank you for using our software!

    Firstly, please do not use uppercase for modification labels (use tmt instead of Tmt). We will add warning and auto-conversion to lowercase in the near future.

    Secondly, you have found a bug - for some reasons N- and C- terminal modifications cannot be used as fixed, only as variable ones. We will fix it as soon as possible (I suppose it will take 1-2 days).

    Also, the command line terminal modifications should be in X!Tandem-like style, so XXX@[ means N-term modification and XXX@] means C-term modification. We also will update command-line help soon.

  2. chrishuges

    Ah ok!

    I originally had it as all lowercase, but I was using the pyteomics parser tool to check if it satisfied is.modX...and tmt- was False.

    I will wait to hear about the bug fix, but will just use them as variable modifications for now as this seems to work ok.

    Thanks! Chris

  3. chrishuges

    Hi Mark,

    So the search does indeed run now with modifications specified as fixed (see attached screenshot of my config file to see how I have it set up), however, one of two things is happening:

    1. In the TSV report output, in the 'Modified Sequence' column, does it only show variable modifications? Because I only see the variable mods there, and nothing for my fixed mods (cam or tmt).

    2. The search isn't actually using the fixed mods. This one seems less likely to me because it wouldn't identify much/anything in this case.

    *edit - if I put the TMT mods as variable, they show up in modified sequence, so I assume scenario '1' from above is what is happening.

    Cheers, Chris

    Screen Shot 2018-05-08 at 1.28.32 PM.png

  4. Mark Ivanov

    If there is no secrets in your data, It would be great if you can share any spectra file, cfg parameters and fasta file with me. You can contact me directly using email: markmipt@gmail.com. I want to test it and compare with X!Tandem or msgf+ search engines results

    Mark

  5. chrishuges

    We are actually just testing this using data we have published that is deposited: https://www.ebi.ac.uk/pride/archive/projects/PXD008698.

    The data file we are using is: ch_16Feb2018_SP3low-SP3high-SP3Mag-TMT10_hph_6.raw.

    The config file is the standard one, with the only changes being the modifications as they are in the above screenshot, missed cleavages as 2, and the mass tolerance for MS2 (the data are Orbitrap MS1 scan, Orbitrap MS2 scan...so something like 20ppm for parent, 0.05Da for fragment is suitable).

    The data are human, and we just use the standard uniprot human database.

    Cheers, Chris

  6. Mark Ivanov

    Yes, you was right. Situation 1 is happening here. So, using pepXML output does not affected by this issue. The csv output is still not fixed. I'll update info here when it will be fixed

    P.S. Your data looks quite accurate in terms of parent (3 ppm systematic error and 0.55 ppm standard deviation) and fragment masses (4 ppm (~0.002-0.003 Da)). So, you can easily increase efficiency (~+5% protein groups at 1% protein FDR) of your search by turning on Identipy auto optimization (add "-at yes" in command-line interface or put first stage: identipy.extras.optimization in .cfg file).

    P.S.2 You can install cython and pyteomics.cythonize modules to significantly reduce processing time of Identipy

  7. chrishuges

    Yes thank you for the suggestion! This change to auto-tune did indeed improve the result, especially by peptide numbers.

    After testing I am very impressed by the performance of IdentiPy in combination with MPScore! In standalone comparisons it does very very well compared to other engines (e.g. XTandem, Comet, MS-GF+, MS Amanda, MyriMatch, Tide, Novor, DirecTag, and OMSSA). However, it still lags a bit behind once I combine 2 or more of these engines using tools like SearchCLI and PeptideShakerCLI (https://github.com/compomics/peptide-shaker). Would be fantastic if you guys could support inclusion of IdentiPy into these tools at some point.

  8. Log in to comment