ERROR:vcontact2: Error in contig clustering ERROR:vcontact2: No columns to parse from file
Hello, i have the same problem as issues #21 but none of the proposed solutions worked. I do give the full path to c1-bin.
ERROR:vcontact2: Error in contig clustering
ERROR:vcontact2: No columns to parse from file
Traceback (most recent call last):
File "/project/6007483/software/env_vcontact/bin/vcontact2", line 615, in main
gc = vcontact2.contig_clusters.ContigCluster(pcp, output_dir, cluster_one_fp, cluster_one_args,
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 91, in __init__
self.clusters, self.cluster_results = self.one_cluster(os.path.join(self.folder, self.name),
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 227, in one_cluster
return self.load_one_clusters(fi_clusters)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 318, in load_one_clusters
clusters_df = pd.read_csv(one_fn, header=0)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 1917, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Comments (12)
-
-
Hi Bridget,
Trying to narrow this down further… Have you tried “ProkaryoticViralRefSeq97-Merged” or “ProkaryoticViralRefSeq201-Merged”? And placing cluster_one-1.0.jar within your system $PATH? If you’ve installed via miniconda, then wherever your environment’s path.
And what version of vContact2 are you using?
-Ben
-
reporter Hi,
For me i tried “ProkaryoticViralRefSeq97-Merged” and “ProkaryoticViralRefSeq201-Merged” and “ProkaryoticViralRefSeq94-Merged” and i have the same error at each time. i do place cluster_one-1.0.jar within my system $PATH.
Thank!
-
Hi Fadwa,
Thanks for sticking around and working through this. Could you attach the full run log? I’m still trying to narrow down why this error pops up for a group of people but not for others.
-Ben
-
reporter Hi,
here is the screen shot of the full run.
-
Hi Fadwa,
Thanks for sending this - it’s immensely helpful!
Did you install vContact2 through Bioconda or manually with Bitbucket? If you installed manually (git clone → pip install), can you update to the latest version, 0.9.22? I will be pushing an update to Bioconda sometime this week whenever I can make time.
Can you also try running this in a fresh directory? vContact2 is smart enough to see that Diamond has already been run, so it continues from that step. It’s possible that a previous error is being re-incorporated into each new analysis.
For your gene2genome file, do you have the headers “contig_id” “protein_id” and “keywords”? It looks like some proteins are unable to be matched against their contig.
Also, do you have java installed on your machine? I never ask users to install it, and it’s not a dependency - though it’s in the singularity definitions. You can check this with:
java -version
Sorry this is taking so long to figure out.
-Ben
-
Hi Ben, Thanks for all the work to try to figure this out! One of my labmates managed to get the singularity container to work for me, so I’m all set. I haven’t had a chance to talk to him more about what the issue was, but it seems like it was something related to the HPC we work with and a nightmare to figure out.
-
reporter Hi Ben, thanks for your help it works now.
-
Thanks for letting me know! Glad it’s working for you now!
-
- changed status to resolved
Issue solved.
-
Hi, I have the same problems. Can you give me some advice for solving this horrible problem? Thank you!
-
Hi, I am also receiving the same error. Does anyone have any suggestions to fix this problem? It is already in my $PATH and and I added the absolute path using --c1-bin, however, I am still getting this error.
Thanks!
- Log in to comment
Thank you for everyone’s help and answers to my previous comment on issue 30. I made the changes suggested in that issue and am now getting this same error as @fadwa mehdaoui on the example files provided. I installed vcontact into a miniconda environment using mamba (otherwise, I had conflicts) specifying the packages as indicated in the updated singularity file (changing versions of numpy and pandas). I also give the full path to clusterone and it appears to work when run independently. Any additional suggestions would be greatly appreciated and I’m happy to provide any additional information that would be useful. Thanks!
For reference, here is the command I’m running:
vcontact2 --raw-proteins test_data/VIRSorter_genomes.faa --rel-mode Diamond --proteins-fp test_data/VIRSorter_genomes_g2g.csv --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin <path to clusterone>/cluster_one-1.0.jar --output-dir <output directory path>