I have a question regarding the number of the decoys we need to generate for each extraction. Seems like the default is 2000 but not sure what's a logic behind this. In my case I have ~140000 peptides in the background file and ~50 peptides of interest. I am wondering if the appropriate number of decoys in my case would be something around 140000. Your help would be highly appreciated.

Best, Amir

    Thanks for your interest in PECAN. The 2000 decoys here are used for background subtraction, not for target-decoy FDR estimation.

    PECAN uses two types of decoys. The first type is what people are most familiar with, used for target-decoy FDR estimation, which would be a 1:1 ratio of target and decoy. In your case, there will be 50 decoys. The second type is unique to PECAN, used to only estimate how high on average a background score is achieve per charge state, per isolation window, over retention time. The calculation is taking the average of this set of decoys (default 2,000 per charge state, per isolation window) drawn and shuffled from the background proteome. If your DIA has 25 isolation windows and you are querying both +2 and +3 charge states, PECAN would actually generate a total of 100,000 decoys to estimate theses 25x2 sets of background scores over RT time. You can also imagine this being equivalent to a population mean estimation, where if you have a big enough sample size, the sample mean is close enough to the true population mean. Detailed information on how this size (2,000) is determined is in the supplementary information of the paper.

    PS. 50 peptides of interest is a pretty low number. I'm afraid with this number, Percolator is likely to break (with error separation too good). PECAN relies on Percolator to estimated FDR and unfortunately, Percolator probably can't learn much from 50 targets and 50 decoys. I'd recommend throwing in at least several hundreds of peptides to the query that you know some are present and some are absent from your sample. For example, all tryptic peptides form an in silico digestion of keratin, trypsin, albumin, or some cell type specific proteins. to avoid this issue.

