re-running using --blast-fp with Bad file descriptor error

Issue #41 new
yu ta created an issue

Dear developer,
Thanks for your such amazing work.
My issue is that, when I run vContact with 200, 000 contigs, it seems successfully finished former step by use commad:

nohup time vcontact2 -t 100 --raw-proteins /mnt/data/viral.faa --rel-mode 'Diamond' --proteins-fp viral_genomes_g2g.csv --blast-fp merged.self-diamond.tab --db 'ProkaryoticViralRefSeq201-Merged'  --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /mnt/data/share/software/vContact2/bin/cluster_one-1.0.jar --output-dir ./ &> vcontact2.log &

outputs were:

merged.dmnd                             3.1G
merged.faa                              3.0G
merged.self-diamond.tab                 15G
merged.self-diamond.tab.abc             8.4G
merged.self-diamond.tab.mci             4.9G
merged.self-diamond.tab_mcl20.clusters  162M
merged.self-diamond.tab_mcxload.tab     230M
PMHVD_viral_genomes_g2g.csv             425M
rerunning.log                           2.2K
vcontact2.log                           34K

and logs were:

27  ...................   0.24  6.70 1.00/0.76/1.00 1.00 1.00 0.09   0
28  ...................   0.24  6.62 1.00/0.76/1.00 1.00 1.00 0.09   0
29  ...................   0.11  6.63 1.00/0.89/1.00 1.00 1.00 0.09   0
30  ...................   0.01  6.56 1.00/0.99/1.00 1.00 1.00 0.09   0
31  ...................   0.00  6.46 1.00/1.00/1.00 1.00 1.00 0.09   0
[mcl] jury pruning marks: <97,94,95>, out of 100
[mcl] jury pruning synopsis: <96.0 or sensational> (cf -scheme, -do log)
[mcl] output is in ./merged.self-diamond.tab_mcl20.clusters
[mcl] 452431 clusters found
[mcl] output is in ./merged.self-diamond.tab_mcl20.clusters

Please cite:
Stijn van Dongen, Graph Clustering by Flow Simulation.  PhD thesis,
University of Utrecht, May 2000.
(  <http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf>
or  <http://micans.org/mcl/lit/svdthesis.pdf.gz)>
OR
Stijn van Dongen, A cluster algorithm for graphs. Technical
Report INS-R0010, National Research Institute for Mathematics
and Computer Science in the Netherlands, Amsterdam, May 2000.
(  <http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z>
or  <http://micans.org/mcl/lit/INS-R0010.ps.Z)>

ESC[1;42mINFOESC[1;0m:vcontact2: Building the cluster and profiles (this may take some time...)
               If it fails, try re-running using --blast-fp flag and specifiying merged.self-diamond.tab (or merged.self-blastp.tab)
               ```
               but  it blocked in above step. so I re-running ```--blast-fp merged.self-diamond.tab```with command:
                 ```
               nohup time vcontact2 -t 100 --raw-proteins /mnt/data/viral.faa --rel-mode 'Diamond' --proteins-fp viral_genomes_g2g.csv --blast-fp merged.self-diamond.tab --db 'ProkaryoticViralRefSeq201-Merged'  --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /mnt/data/share/software/vContact2/bin/cluster_one-1.0.jar --blast-fp merged.self-diamond.tab --output-dir ./ &> rerunning.log &
                 ```
               and it return ```OSError: [Errno 9] Bad file descriptor```
               logs were:
                 ```
               nohup: ignoring input
               INFO:vcontact2: Found Diamond: /home/miniconda3/bin/diamond
               INFO:vcontact2: Found MCL: /home/miniconda3/envs/vContact2_new/bin/mcxload
               INFO:vcontact2: Identified 100 CPUs
               INFO:vcontact2: Using reference database: ProkaryoticViralRefSeq201-Merged
               INFO:vcontact2: Using existing directory ./.
               INFO:vcontact2: Identified existing 'merged.faa' in output path: re-using...
               INFO:vcontact2: Re-using existing Diamond file...
               INFO:vcontact2: Loading proteins...
               INFO:vcontact2: Merging ProkaryoticViralRefSeq201-Merged to user gene-to-genome mapping...
               INFO:vcontact2: Building the cluster and profiles (this may take some time...)
               If it fails, try re-running using --blast-fp flag and specifiying merged.self-diamond.tab (or merged.self-blastp.tab)

               ============================This is vConTACT2 0.9.19============================



                 ----------------------------------Pre-Analysis----------------------------------


                 ------------------------------Reference databases-------------------------------


                 -------------------------------Protein clustering-------------------------------
                 Traceback (most recent call last):
                 File "/home/miniconda3/bin/vcontact2", line 757, in <module>
                 main(options)
               File "/home/miniconda3/bin/vcontact2", line 470, in main
               pcs_fp, gene2genome_df, pcs_mode)
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 187, in build_clusters
clusters_df, name, c = load_mcl_clusters(fp)
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 245, in load_mcl_clusters
c = [line.rstrip("\n").split("\t") for line in f]
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 245, in <listcomp>
c = [line.rstrip("\n").split("\t") for line in f]
OSError: [Errno 9] Bad file descriptor
Command exited with non-zero status 1

Looking forward your reply.

Comments (1)

  1. Ben Bolduc

    Could you try re-running vConTACT2 using the blast file to a fresh output directory? I think vConTACT2 is getting confused trying to decide between re-running automatically and being given a checkpoint file (i.e. the blast file)

    -Ben

  2. Log in to comment