Search crashes immediately when output format is csv or tsv

Issue #8 resolved
Joshua Klein
created an issue

When running a search where the output format is csv or tsv, the search process crashes almost immediately with the following traceback:

(workspace-2017) [jaklein@scc1 proteomics]$ identipy -db ../../../uniprot/human_uniprot.fa ./2017-09-28-MEM-AGP-0004.centroided.mzML -cfg ./identipy.cfg
INFO: [10:05:13] Reading defaults from /projectnb/workspace/app/virtualenvs/workspace-2017/identipy/identipy/default.cfg
INFO: [10:05:13] Reading config from ./identipy.cfg
Traceback (most recent call last):
  File "/usr2/postdoc/jaklein/.virtualenvs/workspace-2017/bin/identipy", line 11, in <module>
    load_entry_point('identipy', 'console_scripts', 'identipy')()
  File "/projectnb/workspace/app/virtualenvs/workspace-2017/identipy/identipy/cli.py", line 209, in run
    utils.write_output(inputfile, settings, main.process_file(inputfile, settings))
  File "/projectnb/workspace/app/virtualenvs/workspace-2017/identipy/identipy/utils.py", line 1410, in write_output
    return writer(inputfile, settings, results)
  File "/projectnb/workspace/app/virtualenvs/workspace-2017/identipy/identipy/utils.py", line 1285, in write_csv
    df = dataframe(inputfile, settings, results)
  File "/projectnb/workspace/app/virtualenvs/workspace-2017/identipy/identipy/utils.py", line 1292, in dataframe
    logger.info('Accumulated results: %s', len(results))
TypeError: object of type 'generator' has no len()

This looks like it is because utils.dataframe calls len on the generator returned by peptide_centric.process_peptides. It looks like utils.dataframe should first call list on results before doing anything to force it to accumulate all of the results before it begins writing.

If that's all that's needed, I can put in a PR.

Comments (6)

  1. Lev Levitsky repo owner

    Thanks for reporting. results used to be converted into a list together with get_output filtering.
    Now that get_output is no longer used, a simple list conversion is needed. I added it in the latest commit.

  2. Joshua Klein reporter

    The update fixes the crash, but the csv output format produces a tab-separated file instead of a comma separated file. This appears to be because both format names map to write_csv, which explicitly uses \t.

  3. Lev Levitsky repo owner

    Yes, csv is an alias to tsv. Adding a configurable separator shouldn't be a problem, though. Default can be deduced from the format value. I'm sure it was planned but not a priority. Thanks for reminding.

  4. Log in to comment