Fast way to retrieve all spectra

Issue #31 closed
Anonymous created an issue


I was wondering if there is a faster way to get all identifications back, without applying FDR? I am using now the following line of code for that:

tandem.filter('results.t.xml', fdr=1)

It returns about 10000 psms with FDR = 1 (so, I thought all ms/ms spectra) However, if I look to my mgf file, there are about 15000 spectra over there. How does it come that with FDR = 1 this number is not the same? And, how could this be improved?

Kind regards, Tim

Comments (4)

  1. Lev Levitsky repo owner

    Hi Tim,

    if you don't want to do any filtering, you can use instead of tandem.filter.

    The main reason why your code reports less identifications is probably because tandem.filter removes all decoys by default (you can change it with remove_decoy=False). However, some low-scored identifications can still be filtered out even with fdr=1.

    Also the total number of IDs will not necessarily be equal to the number of spectra. That would depend on the data and X!Tandem settings. But will give you everything there is in the file.

  2. Tim Van Den Bossche


    When I use tandem.filter with a decoy_prefix, I still get decoys back in my proteins. Is this possible or is this a bug? The decoy_prefix is correct, since if a use a bogus decoy_prefix, I just get the initial number of IDs back.

    Edit: Thanks already for the earlier (and quick!) feedback :)

    Best, T.

  3. Lev Levitsky repo owner

    Hi Tim,

    Decoy proteins may be in the output for a couple of reasons:

    1. if you call tandem.filter with remove_decoy=False;
    2. if the peptide is shared between decoy and target proteins. In this case it is not considered a decoy (you should see some targets in the list of alternative proteins);
    3. something else, including a bug. If this is the case, please let me know and include the relevant code (a sample file would be great).

    Best regards,


  4. Log in to comment