Recurrent crashes in intogen

Issue #18 resolved
Andrew Beggs created an issue

Hi, I'm running the Intogen pipeline on a series of VCF files (from Strelka 2). I have run the pipeline previously on WGS data with no issues (and great results!) but running it on this has caused recurrent crashes. The error I see is:

  • ERROR WHILE RUNNING: intogen-recurrences -i /home/beggsa/ffpeexome/finalvcfs/output/project/POOL_all_projects/sample_variant+transcript.impact -o /home/beggsa/ffpeexome/finalvcfs/output/project/POOL_all_projects/gene.recurrences --group_by GENE OUTPUT: 07:13:29 INFO: CONF no configuration neeeded for this task sys:1: DtypeWarning: Columns (1,10) have mixed types. Specify dtype option on import or set low_memory=False. 07:13:44 INFO: Group by key is: ['GENE'] /home/beggsa/anaconda3/lib/python3.6/site-packages/numpy/lib/arraysetops.py:472: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison mask |= (ar1 == a) 07:13:44 INFO: Group by key is: ['GENE'] 07:13:44 INFO: Index(['MUTS_PAM', 'MUTS_PAM_SAMPLES'], dtype='object') Traceback (most recent call last): File "/home/beggsa/anaconda3/bin/intogen-recurrences", line 11, in <module> sys.exit(cmdline()) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/tasks/recurrences.py", line 108, in cmdline group_by=eval(options.group_by) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/tasks/recurrences.py", line 85, in run all.sort(columns=[MUTS_CS], ascending=False, inplace=True) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 4372, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'sort'

    )' raised in ... Task = def recurrences_genes(...): Job = [.../POOL_all_projects/sample_variant+transcript.impact -> .../POOL_all_projects/gene.recurrences, POOL_all_projects]

    Traceback (most recent call last): File "/home/beggsa/anaconda3/lib/python3.6/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions register_cleanup, touch_files_only) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files ret_val = user_defined_work_func(*params) File "/home/beggsa/anaconda3/bin/intogen", line 238, in recurrences_genes scheduler=scheduler('recurrences_genes', project_key) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/executor/drmaa.py", line 89, in submit self._submit_local(args, kwargs, scheduler, job_name, debugging=DEBUGGING) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/executor/drmaa.py", line 122, in _submit_local raise Exception("\n\nERROR WHILE RUNNING:\n {0}\nOUTPUT:\n {1}\n".format(cmd, error)) Exception:

    ERROR WHILE RUNNING: intogen-recurrences -i /home/beggsa/ffpeexome/finalvcfs/output/project/POOL_all_projects/sample_variant+transcript.impact -o /home/beggsa/ffpeexome/finalvcfs/output/project/POOL_all_projects/gene.recurrences --group_by GENE OUTPUT: 07:13:29 INFO: CONF no configuration neeeded for this task sys:1: DtypeWarning: Columns (1,10) have mixed types. Specify dtype option on import or set low_memory=False. 07:13:44 INFO: Group by key is: ['GENE'] /home/beggsa/anaconda3/lib/python3.6/site-packages/numpy/lib/arraysetops.py:472: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison mask |= (ar1 == a) 07:13:44 INFO: Group by key is: ['GENE'] 07:13:44 INFO: Index(['MUTS_PAM', 'MUTS_PAM_SAMPLES'], dtype='object') Traceback (most recent call last): File "/home/beggsa/anaconda3/bin/intogen-recurrences", line 11, in <module> sys.exit(cmdline()) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/tasks/recurrences.py", line 108, in cmdline group_by=eval(options.group_by) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/intogen/tasks/recurrences.py", line 85, in run all.sort(columns=[MUTS_CS], ascending=False, inplace=True) File "/home/beggsa/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 4372, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'sort'*

Comments (3)

  1. Loris Mularoni

    @abeggs,

    the issue seems to be due to the fact that in the pandas library the 'sort()' function has been deprecated in favor of either sort_values or sort_index. A workaround would be to install an older version of pandas that still uses 'sort'. You can try to:

    • source activate YOUR_CONDA_ENVIROMENT
    • conda install pandas==0.19.2 (or pip install pandas==0.19.2 if you are not using conda)

    After this instead of an error you should just get a warning (FutureWarning: sort(....) is deprecated, use sort_index(.....))

  2. Andrew Beggs reporter

    Aha... that is very helpful thank you - I think that I had this problem before and seem to have invented a conda environment already to work around it!! Sorry for wasting your time!

  3. Log in to comment