re-running using --blast-fp with Bad file descriptor error
Issue #41
new
Dear developer,
Thanks for your such amazing work.
My issue is that, when I run vContact with 200, 000 contigs, it seems successfully finished former step by use commad:
nohup time vcontact2 -t 100 --raw-proteins /mnt/data/viral.faa --rel-mode 'Diamond' --proteins-fp viral_genomes_g2g.csv --blast-fp merged.self-diamond.tab --db 'ProkaryoticViralRefSeq201-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /mnt/data/share/software/vContact2/bin/cluster_one-1.0.jar --output-dir ./ &> vcontact2.log &
outputs were:
merged.dmnd 3.1G
merged.faa 3.0G
merged.self-diamond.tab 15G
merged.self-diamond.tab.abc 8.4G
merged.self-diamond.tab.mci 4.9G
merged.self-diamond.tab_mcl20.clusters 162M
merged.self-diamond.tab_mcxload.tab 230M
PMHVD_viral_genomes_g2g.csv 425M
rerunning.log 2.2K
vcontact2.log 34K
and logs were:
27 ................... 0.24 6.70 1.00/0.76/1.00 1.00 1.00 0.09 0
28 ................... 0.24 6.62 1.00/0.76/1.00 1.00 1.00 0.09 0
29 ................... 0.11 6.63 1.00/0.89/1.00 1.00 1.00 0.09 0
30 ................... 0.01 6.56 1.00/0.99/1.00 1.00 1.00 0.09 0
31 ................... 0.00 6.46 1.00/1.00/1.00 1.00 1.00 0.09 0
[mcl] jury pruning marks: <97,94,95>, out of 100
[mcl] jury pruning synopsis: <96.0 or sensational> (cf -scheme, -do log)
[mcl] output is in ./merged.self-diamond.tab_mcl20.clusters
[mcl] 452431 clusters found
[mcl] output is in ./merged.self-diamond.tab_mcl20.clusters
Please cite:
Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis,
University of Utrecht, May 2000.
( <http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf>
or <http://micans.org/mcl/lit/svdthesis.pdf.gz)>
OR
Stijn van Dongen, A cluster algorithm for graphs. Technical
Report INS-R0010, National Research Institute for Mathematics
and Computer Science in the Netherlands, Amsterdam, May 2000.
( <http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z>
or <http://micans.org/mcl/lit/INS-R0010.ps.Z)>
ESC[1;42mINFOESC[1;0m:vcontact2: Building the cluster and profiles (this may take some time...)
If it fails, try re-running using --blast-fp flag and specifiying merged.self-diamond.tab (or merged.self-blastp.tab)
```
but it blocked in above step. so I re-running ```--blast-fp merged.self-diamond.tab```with command:
```
nohup time vcontact2 -t 100 --raw-proteins /mnt/data/viral.faa --rel-mode 'Diamond' --proteins-fp viral_genomes_g2g.csv --blast-fp merged.self-diamond.tab --db 'ProkaryoticViralRefSeq201-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /mnt/data/share/software/vContact2/bin/cluster_one-1.0.jar --blast-fp merged.self-diamond.tab --output-dir ./ &> rerunning.log &
```
and it return ```OSError: [Errno 9] Bad file descriptor```
logs were:
```
nohup: ignoring input
INFO:vcontact2: Found Diamond: /home/miniconda3/bin/diamond
INFO:vcontact2: Found MCL: /home/miniconda3/envs/vContact2_new/bin/mcxload
INFO:vcontact2: Identified 100 CPUs
INFO:vcontact2: Using reference database: ProkaryoticViralRefSeq201-Merged
INFO:vcontact2: Using existing directory ./.
INFO:vcontact2: Identified existing 'merged.faa' in output path: re-using...
INFO:vcontact2: Re-using existing Diamond file...
INFO:vcontact2: Loading proteins...
INFO:vcontact2: Merging ProkaryoticViralRefSeq201-Merged to user gene-to-genome mapping...
INFO:vcontact2: Building the cluster and profiles (this may take some time...)
If it fails, try re-running using --blast-fp flag and specifiying merged.self-diamond.tab (or merged.self-blastp.tab)
============================This is vConTACT2 0.9.19============================
----------------------------------Pre-Analysis----------------------------------
------------------------------Reference databases-------------------------------
-------------------------------Protein clustering-------------------------------
Traceback (most recent call last):
File "/home/miniconda3/bin/vcontact2", line 757, in <module>
main(options)
File "/home/miniconda3/bin/vcontact2", line 470, in main
pcs_fp, gene2genome_df, pcs_mode)
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 187, in build_clusters
clusters_df, name, c = load_mcl_clusters(fp)
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 245, in load_mcl_clusters
c = [line.rstrip("\n").split("\t") for line in f]
File "/home/miniconda3/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 245, in <listcomp>
c = [line.rstrip("\n").split("\t") for line in f]
OSError: [Errno 9] Bad file descriptor
Command exited with non-zero status 1
Looking forward your reply.
Could you try re-running vConTACT2 using the blast file to a fresh output directory? I think vConTACT2 is getting confused trying to decide between re-running automatically and being given a checkpoint file (i.e. the blast file)
-Ben