ERROR:vcontact2: Error in contig clustering ERROR:vcontact2: No columns to parse from file

Issue #36 resolved
fadwa mehdaoui created an issue

Hello, i have the same problem as issues #21 but none of the proposed solutions worked. I do give the full path to c1-bin.

ERROR:vcontact2: Error in contig clustering

ERROR:vcontact2: No columns to parse from file

Traceback (most recent call last):

File "/project/6007483/software/env_vcontact/bin/vcontact2", line 615, in main

gc = vcontact2.contig_clusters.ContigCluster(pcp, output_dir, cluster_one_fp, cluster_one_args,

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 91, in __init__

self.clusters, self.cluster_results = self.one_cluster(os.path.join(self.folder, self.name),

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 227, in one_cluster

return self.load_one_clusters(fi_clusters)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 318, in load_one_clusters

clusters_df = pd.read_csv(one_fn, header=0)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 685, in parser_f

return _read(filepath_or_buffer, kwds)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 457, in _read

parser = TextFileReader(fp_or_buf, **kwds)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 895, in __init__

self._make_engine(self.engine)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 1135, in _make_engine

self._engine = CParserWrapper(self.f, **self.options)

File "/project/6007483/software/env_vcontact/lib/python3.8/site-packages/pandas/io/parsers.py", line 1917, in __init__

self._reader = parsers.TextReader(src, **kwds)

File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__

pandas.errors.EmptyDataError: No columns to parse from file

Comments (12)

  1. Bridget Hegarty

    Thank you for everyone’s help and answers to my previous comment on issue 30. I made the changes suggested in that issue and am now getting this same error as @fadwa mehdaoui on the example files provided. I installed vcontact into a miniconda environment using mamba (otherwise, I had conflicts) specifying the packages as indicated in the updated singularity file (changing versions of numpy and pandas). I also give the full path to clusterone and it appears to work when run independently. Any additional suggestions would be greatly appreciated and I’m happy to provide any additional information that would be useful. Thanks!

    For reference, here is the command I’m running:

    vcontact2 --raw-proteins test_data/VIRSorter_genomes.faa --rel-mode Diamond --proteins-fp test_data/VIRSorter_genomes_g2g.csv --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin <path to clusterone>/cluster_one-1.0.jar --output-dir <output directory path>

  2. Ben Bolduc

    Hi Bridget,

    Trying to narrow this down further… Have you tried “ProkaryoticViralRefSeq97-Merged” or “ProkaryoticViralRefSeq201-Merged”? And placing cluster_one-1.0.jar within your system $PATH? If you’ve installed via miniconda, then wherever your environment’s path.

    And what version of vContact2 are you using?

    -Ben

  3. fadwa mehdaoui reporter

    Hi,

    For me i tried “ProkaryoticViralRefSeq97-Merged” and “ProkaryoticViralRefSeq201-Merged” and “ProkaryoticViralRefSeq94-Merged” and i have the same error at each time. i do place cluster_one-1.0.jar within my system $PATH.

    Thank!

  4. Ben Bolduc

    Hi Fadwa,

    Thanks for sticking around and working through this. Could you attach the full run log? I’m still trying to narrow down why this error pops up for a group of people but not for others.

    -Ben

  5. Ben Bolduc

    Hi Fadwa,

    Thanks for sending this - it’s immensely helpful!

    Did you install vContact2 through Bioconda or manually with Bitbucket? If you installed manually (git clone → pip install), can you update to the latest version, 0.9.22? I will be pushing an update to Bioconda sometime this week whenever I can make time.

    Can you also try running this in a fresh directory? vContact2 is smart enough to see that Diamond has already been run, so it continues from that step. It’s possible that a previous error is being re-incorporated into each new analysis.

    For your gene2genome file, do you have the headers “contig_id” “protein_id” and “keywords”? It looks like some proteins are unable to be matched against their contig.

    Also, do you have java installed on your machine? I never ask users to install it, and it’s not a dependency - though it’s in the singularity definitions. You can check this with:

    java -version
    

    Sorry this is taking so long to figure out.

    -Ben

  6. Bridget Hegarty

    Hi Ben, Thanks for all the work to try to figure this out! One of my labmates managed to get the singularity container to work for me, so I’m all set. I haven’t had a chance to talk to him more about what the issue was, but it seems like it was something related to the HPC we work with and a nightmare to figure out.

  7. 智睿 曹

    Hi, I have the same problems. Can you give me some advice for solving this horrible problem? Thank you!

  8. Carmen Chen

    Hi, I am also receiving the same error. Does anyone have any suggestions to fix this problem? It is already in my $PATH and and I added the absolute path using --c1-bin, however, I am still getting this error.

    Thanks!

  9. Log in to comment