Missing genome_by_genome_overview.csv in output directory
Hi,
I’m trying to run 355 phage sequences through vContact2, however although the run appears to complete without any errors I do not get the genome_by_genome_overview.csv. Below is the command I’ve been using:
vcontact -t 30 --raw-proteins all_prophage_proteins.faa --rel-mode "Diamond" --proteins-fp gene_to_genome.csv --db "ProkaryoticViralRefSeq97-Merged" --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/Documents/Programs/MAVERICLab-vcontact2-6d6fe8cf260a/bin/cluster_one-1.0.jar --output-dir vContact_output_97 &> vContact_97.log
and this is the output I get after running the vContact command above:
============================This is vConTACT2 0.9.13============================
----------------------------------Pre-Analysis----------------------------------
------------------------------Reference databases-------------------------------
-------------------------------Protein clustering-------------------------------
----------------------------------Loading data----------------------------------
--------------------------------Adding Taxonomy---------------------------------
------------------------Calculating Similarity Networks-------------------------
Loaded graph with 2901 nodes and 127532 edges
[====================] 100% Growing clusters from seeds...
[====================] 100% Finding highly overlapping clusters...
[====================] 100% Merging highly overlapping clusters...
Detected 351 complexes
.................................................. 1M
.................................................. 2M
..................
[mcl] new tab created
[mcl] pid 13812
ite ------------------- chaos time hom(avg,lo,hi) m-ie m-ex i-ex fmv
1 ................... 126.06 1.99 0.96/0.01/12.04 5.63 1.85 1.85 71
2 ................... 82.16 4.03 0.72/0.01/4.91 10.17 0.11 0.21 90
3 ................... 8.43 0.30 0.90/0.07/10.78 1.73 0.22 0.05 14
4 ................... 1.79 0.03 0.98/0.37/14.44 1.02 0.58 0.03 1
5 ................... 1.04 0.02 0.99/0.50/8.41 1.00 0.88 0.02 0
6 ................... 0.31 0.02 1.00/0.58/1.30 1.00 0.96 0.02 0
7 ................... 0.23 0.02 1.00/0.77/1.00 1.00 0.99 0.02 0
8 ................... 0.23 0.02 1.00/0.78/1.00 1.00 1.00 0.02 0
9 ................... 0.00 0.02 1.00/1.00/1.00 1.00 1.00 0.02 0
10 ................... 0.00 0.02 1.00/1.00/1.00 1.00 1.00 0.02 0
[mcl] cut <1> instances of overlap
[mcl] jury pruning marks: <97,99,99>, out of 100
[mcl] jury pruning synopsis: <97.8 or superb> (cf -scheme, -do log)
[mcl] output is in vContact_output_97/modules_mcl_5.0.clusters
[mcl] 765 clusters found
[mcl] output is in vContact_output_97/modules_mcl_5.0.clusters
Please cite:
Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis,
University of Utrecht, May 2000.
( http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf
or http://micans.org/mcl/lit/svdthesis.pdf.gz)
OR
Stijn van Dongen, A cluster algorithm for graphs. Technical
Report INS-R0010, National Research Institute for Mathematics
and Computer Science in the Netherlands, Amsterdam, May 2000.
( http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z
or http://micans.org/mcl/lit/INS-R0010.ps.Z)
'Pseudomonas~virus~D3'
------------------------Contig Clustering & Affiliation-------------------------
--------------------------------Protein modules---------------------------------
---------------------------Link modules and clusters----------------------------
----------------------------Exporting results files-----------------------------
There were 564 genomes (including refs) that were singleton, outlier or overlaps.
I think I get all the other files produced in the output directory except that final file. These are the files I get:
$ ls vContact_output_97/
c1.clusters merged.self-diamond.tab modules_mcl_5.0.clusters sig1.0_mcl2.0_contigs.csv vConTACT_profiles.csv
c1.ntw merged.self-diamond.tab.abc modules_mcl_5.0_modules.pandas sig1.0_mcl2.0_modsig1.0_modmcl5.0_minshared3_link_mod_cluster.csv vConTACT_proteins.csv
merged_df.csv merged.self-diamond.tab.mci modules_mcl_5.0_pcs.pandas sig1.0_mcl5.0_minshared3_modules.csv viral_cluster_overview.csv
merged.dmnd merged.self-diamond.tab_mcl20.clusters modules.ntwk vConTACT_contigs.csv
merged.faa merged.self-diamond.tab_mcxload.tab sig1.0_mcl2.0_clusters.csv vConTACT_pcs.csv
My phage are present in these files (e.g. the c1.clusters, c1.ntw, and viral_cluster_overview.csv) and have been assigned clusters, so all looks fine as far as I can tell. There just isn’t a genome_by_genome_overview.csv.
Thanks in advance for any help you can give and also thanks for making this tool… minus this little issue I love it!
Comments (21)
-
-
Hi Chris and Julian,
Could you attach the viral_cluster_overview file, either here or email (bolduc.10 at osu edu)? I haven’t been able to reproduce this issue, yet it’s still a lingering issue for several people. The weird aspect of this is that the genome_by_genome file is a re-formed version of viral_cluster_overview.
Thanks for giving vConTACT2 a shot with your research!
-
Hi Ben, I just ran into this problem too. Didn’t happen with a smaller sample size (~200 contigs), but I just ran it with >1000 contigs and there’s no genome_by_genome_overview.csv.
There’s no error in the log too.
I will send you the viral_overview_cluster.
Cheers
Alan
-
Has a solution been found for this issue?
-
I’m also having the same issue. Has there been any update?
-
“ERROR:vcontact2: Error in exporting the final summary data: first argument must be string or compiled pattern”
This is the error I get at the end of the run
-
@Ben Bolduc Has there been any update on this issue? I ran vConTACT2 as well, and don’t have the
genome_by_genome_overview.csv
file. And it doesn’t show if the run was complete or incomplete. Just have the following:Thu Jul 2 00:38:55 CEST 2020 ============================This is vConTACT2 0.9.17============================ ----------------------------------Pre-Analysis---------------------------------- ------------------------------Reference databases------------------------------- -------------------------------Protein clustering------------------------------- ----------------------------------Loading data---------------------------------- --------------------------------Adding Taxonomy--------------------------------- ------------------------Calculating Similarity Networks------------------------- ------------------------Contig Clustering & Affiliation------------------------- --------------------------------Protein modules--------------------------------- ---------------------------Link modules and clusters---------------------------- ----------------------------Exporting results files----------------------------- There were 812 genomes (including refs) that were singleton, outlier or overlaps.
Thanks for your help with this!
-
Hi All,
Thank you for reporting these issues. This issue has taken more effort to identify than I anticipated. This error seems to occur most often associated with a “random” genome being printed to stdout, followed by no genome-by-genome file. Susheel’s version indicates it’s still occurring in the most recent version (0.9.17) and has occurred since at least 0.9.13 - and I’m assuming all runs have been using the v97 prokaryotes. It’s also been mentioned that this hasn’t happened with small numbers (200), but when it gets larger (350+?) there’s an issue (or rather, lack of an output file).
Has anyone tried to run with a lower database version (i.e. "ProkaryoticViralRefSeq94-Merged")?
Likewise, increasing the verbosity? vcontact2 <command> -vv
Also, for anyone with a failed run (well, not generating a genome_by_genome file), have you tried to restart the run using the intermediate files?
vcontact2 --contigs vConTACT_contigs.csv --pcs vConTACT_pcs.csv --pc-profiles vConTACT_profiles.csv --output-dir output --db "ProkaryoticViralRefSeq97-Merged"
And has anyone tried using the vConTACT2 app on CyVerse?
The annoying part here is that I can’t reproduce the error - but clearly it’s occurring to multiple people. I’ll need to test with a much larger dataset, outside of those that have worked successfully for me in the past (i.e. the ~15K contigs from the GOV dataset).
If anyone who consistently has their run fails would like to share their gene-to-genome and proteins file, please send it to my bolduc.10 at osu.edu address. The data will only be used to identify the issue, and I’ll remove it once it’s solved. At this point, I’m not sure if it’s an issue stemming from genome names or some complex interaction with certain datasets' network connectivity.
Apologies for this taking so long to resolve. There isn’t really funding for v2 (grants have end dates and they’re not too kind for infinite-length support of tools), and the recent climate has adjusted my priorities as many researchers find themselves doing computational work instead of lab work, so I haven’t had enough spare time to focus on this. Though I’ll continue to try and solve this!
-
Hey @Ben Bolduc ,
Thank you for the support despite the lack of funding. I’m sure everybody here (me included) appreciate your efforts to help our science. I tried to restart the run with the intermediate files using the following:
vcontact2 --contigs vConTACT_contigs.csv --pcs vConTACT_pcs.csv --c1-bin /home/users/sbusi/apps/miniconda3/bin/cluster_one-1.0.jar \ --pc-profiles vConTACT_profiles.csv --output-dir test_output --db "ProkaryoticViralRefSeq97-Merged"
The output I got was the following, but still no
genome_by_genome_overview
.csv` fileINFO:vcontact2.modules: Loading the clustering results ---------------------------Link modules and clusters---------------------------- INFO:vcontact2.modules: 3327 contigs-modules owning association, 46543 filtered (a contig must have 50% of the PCs to own a module). INFO:vcontact2.modules: Linking 652 modules with 371 contigs clusters... INFO:vcontact2.modules: Network done 371 clusters, 652 modules and 314 edges. ----------------------------Exporting results files----------------------------- INFO:vcontact2.exports.summaries: There were 729 sequences (including references) that were singleton, outlier or overlaps. There were 729 genomes (including refs) that were singleton, outlier or overlaps. INFO:vcontact2.exports.summaries: Reading edges for 2862 contigs INFO:vcontact2.exports.summaries: Building PC array INFO:vcontact2.exports.summaries: Calculating comparisons for back-calculations ERROR:vcontact2.exports.summaries: 'contig_11'
Not sure if it has anything to do with the
contig_11
error though.Then I tried the following with the v94 database.
vcontact2 --contigs vConTACT_contigs.csv --pcs vConTACT_pcs.csv --c1-bin /home/users/sbusi/apps/miniconda3/bin/cluster_one-1.0.jar \ --pc-profiles vConTACT_profiles.csv --output-dir test_output --db "ProkaryoticViralRefSeq94-Merged"
and here’s the output from that:
---------------------------Link modules and clusters---------------------------- INFO:vcontact2.modules: 3327 contigs-modules owning association, 46543 filtered (a contig must have 50% of the PCs to own a module). INFO:vcontact2.modules: Linking 652 modules with 371 contigs clusters... INFO:vcontact2.modules: Network done 371 clusters, 652 modules and 314 edges. ----------------------------Exporting results files----------------------------- INFO:vcontact2.exports.summaries: There were 729 sequences (including references) that were singleton, outlier or overlaps. There were 729 genomes (including refs) that were singleton, outlier or overlaps. INFO:vcontact2.exports.summaries: Reading edges for 2862 contigs INFO:vcontact2.exports.summaries: Building PC array INFO:vcontact2.exports.summaries: Calculating comparisons for back-calculations ERROR:vcontact2.exports.summaries: 'contig_11'
-
Issue
#19was marked as a duplicate of this issue. -
- changed status to on hold
Thank you for all who send data. Unfortunately, still unable to reproduce the error (on Mac, Linux), so it's probably a package versioning issue. I do, however, think I've identified the block of code that is likely responsible. I am unable to finish it this week, but should have available time the week after.
-
Thanks a lot @Ben Bolduc ! Looking forward to it.
-
- changed status to open
I've identified the cause and am identifying why it wasn't caught earlier in the code (I specifically have code that checks for this). The cause is due to viral genome naming when one virus' name is a "subset" of another, i.e. "phage G1" and "phage G12". So when one of these viruses is encountered, only one gets saved to the genome summary. However, since all genomes are iterated through for the final genome summary file, that virus that wasn't written gets read, but can't be found... which is why the virus genome name gets printed to screen.
Re-opening while I squash this bug.
-
I have updated vConTACT2 to 0.9.18, which includes handling of this issue. However, if anyone who has encountered this issue would like to try this new version, please do so. I’ve also tightened the dependencies, so a fresh vConTACT2 install (from bitbucket) or update should work.
-
Thanks @Ben Bolduc
@Ben Bolduc
! I tested it out today, and am running to the below error.
ESC[1;42mINFOESC[1;0m:vcontact2: Saving intermediate files... ESC[1;42mINFOESC[1;0m:vcontact2: Read 229672 entries (dropped 2609 singletons) from /scratch/users/sbusi/cosmic_review/vibrant/VIBRANT/vcontact2_output/V6/C120/vConTACT_profiles.csv ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Exporting for ClusterONE ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Clustering the PC Similarity-Network using ClusterONE ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: 372 clusters loaded (singletons and non-connected nodes are dropped). ESC[1;41mERRORESC[1;0m:vcontact2: Error in contig clustering ESC[1;41mERRORESC[1;0m:vcontact2: 'Acidianus~bottle-shaped~virus~2' Traceback (most recent call last): File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Acidianus~bottle-shaped~virus~2' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/bin/vcontact2", line 607, in main gc = vcontact2.contig_clusters.ContigCluster(pcp, output_dir, cluster_one_fp, cluster_one_args, File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 91, in __init__ self.clusters, self.cluster_results = self.one_cluster(os.path.join(self.folder, self.name), File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 227, in one_cluster return self.load_one_clusters(fi_clusters) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 340, in load_one_clusters if pd.isnull(self.contigs.loc[n, "pos_cluster"]): # If never seen before File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexing.py", line 1418, in __getitem__ return self._getitem_tuple(key) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexing.py", line 805, in _getitem_tuple return self._getitem_lowerdim(tup) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexing.py", line 929, in _getitem_lowerdim section = self._getitem_axis(key, axis=i) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexing.py", line 1850, in _getitem_axis return self._get_label(key, axis=axis) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexing.py", line 160, in _get_label return self.obj._xs(label, axis=axis) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/generic.py", line 3737, in xs loc = self.index.get_loc(key) File "/mnt/lscratch/users/sbusi/cosmic_review/vibrant/VIBRANT/.snakemake/conda/1cc7c9fa/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Acidianus~bottle-shaped~virus~2'
I checked the pandas version in the conda environment and have the following. It is 0.25.3
pandas 0.25.3 py38hb3f55d8_0 conda-forge
I used the “ProkaryoticViralRefSeq97-Merged” database. Should I be using
ProkaryoticViralRefSeq94-Merged
instead?Thank you!
-
Hmm. The contig clustering errors are usually due to issues with ClusterONE (in this instance, like not having it installed or java issues, etc). That said, it doesn’t appear that the archaeal viruses are in v97 (!). They’re in 94 and 201. I’ll need to re-update 97 asap.
Please do try another database while I update v97 and let me know if that works.
-
Funnily enough, I didn’t have issues with ClusterONE previously. And the test run worked fine as well. I updated it nonetheless with a clean new installation. Trying with
94
so will let you know. -
@Ben Bolduc I can confirm that with the
ProkaryoticViralRefSeq94-Merged
everything works as expected and I also get thegenomes_by_genomes_overview.csv
file.Thanks a lot for your help with fixing the issues.
Sat Jul 25 00:59:15 CEST 2020 ============================This is vConTACT2 0.9.18============================ ----------------------------------Pre-Analysis---------------------------------- ------------------------------Reference databases------------------------------- -------------------------------Protein clustering------------------------------- ----------------------------------Loading data---------------------------------- --------------------------------Adding Taxonomy--------------------------------- ------------------------Calculating Similarity Networks------------------------- ------------------------Contig Clustering & Affiliation------------------------- --------------------------------Protein modules--------------------------------- ---------------------------Link modules and clusters---------------------------- ----------------------------Exporting results files----------------------------- There were 564 genomes (including refs) that were singleton, outlier or overlaps. Sat Jul 25 01:34:22 CEST 2020
-
Hi,
I had the same problem. I’m trying to run 267,783 viral metagenome contigs through vContact2 (v 0.9.19), and couldn’t get the genome_by_genome_overview.csv. Below is the command I’ve been using:
vcontact2 --raw-proteins oyster/ALL.contigs.cd-hit.phages_combined.simple.faa --rel-mode 'Diamond' --proteins-fp oyster/VIBRANT_genbank_table_ALL.contigs.cd-hit.tsv --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/ubuntu/miniconda3/bin/cluster_one-1.0.jar --output-dir output-oyster -t 8
The contigs ID in file
ALL.contigs.cd-hit.phages_combined.simple.faa
are looks like this:all-k141_3960179 flag=1 multi=30.9838 len=3478_1
all-k141_3960179 flag=1 multi=30.9838 len=3478_2
all-k141_3960179 flag=1 multi=30.9838 len=3478_3
all-k141_3960179 flag=1 multi=30.9838 len=3478_4
all-k141_3960179 flag=1 multi=30.9838 len=3478_5
all-k141_3960179 flag=1 multi=30.9838 len=3478_6
KZY2-k141_66375 flag=1 multi=14.2750 len=1170_1
KZY2-k141_66375 flag=1 multi=14.2750 len=1170_2
KZY2-k141_66375 flag=1 multi=14.2750 len=1170_3
KZY2-k141_66375 flag=1 multi=14.2750 len=1170_4
ZH1-k141_68976 flag=0 multi=4.8691 len=3854_1
ZH1-k141_68976 flag=0 multi=4.8691 len=3854_2
ZH1-k141_68976 flag=0 multi=4.8691 len=3854_3
ZH1-k141_68976 flag=0 multi=4.8691 len=3854_4
ZH1-k141_68976 flag=0 multi=4.8691 len=3854_5
T4S1-k141_394333 flag=1 multi=4.0000 len=2367_1
T4S1-k141_394333 flag=1 multi=4.0000 len=2367_2
T4S1-k141_394333 flag=1 multi=4.0000 len=2367_3I have also tried the other format, like this:
all-k141_3960179-flag=1-multi=30.9838-len=3478_1
all-k141_3960179-flag=1-multi=30.9838-len=3478_2
all-k141_3960179-flag=1-multi=30.9838-len=3478_3
all-k141_3960179-flag=1-multi=30.9838-len=3478_4
all-k141_3960179-flag=1-multi=30.9838-len=3478_5
all-k141_3960179-flag=1-multi=30.9838-len=3478_6
KZY2-k141_66375-flag=1-multi=14.2750-len=1170_1
KZY2-k141_66375-flag=1-multi=14.2750-len=1170_2
KZY2-k141_66375-flag=1-multi=14.2750-len=1170_3
KZY2-k141_66375-flag=1-multi=14.2750-len=1170_4
ZH1-k141_68976-flag=0-multi=4.8691-len=3854_1
ZH1-k141_68976-flag=0-multi=4.8691-len=3854_2
ZH1-k141_68976-flag=0-multi=4.8691-len=3854_3
ZH1-k141_68976-flag=0-multi=4.8691-len=3854_4
ZH1-k141_68976-flag=0-multi=4.8691-len=3854_5
T4S1-k141_394333-flag=1-multi=4.0000-len=2367_1
T4S1-k141_394333-flag=1-multi=4.0000-len=2367_2
T4S1-k141_394333-flag=1-multi=4.0000-len=2367_3
Finally, No matter which form, I will get the same error message, as below
-
-----------------------Contig Clustering & Affiliation-------------------------
-
-------------------------------Protein modules---------------------------------
-
--------------------------Link modules and clusters----------------------------
-
---------------------------Exporting results files-----------------------------
There were 517 genomes (including refs) that were singleton, outlier or overlaps.
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/vContact2/bin/vcontact2", line 757, in <module>
main(options)
File "/home/ubuntu/miniconda3/envs/vContact2/bin/vcontact2", line 749, in main
profiles_fp, vc, excluded)
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact2/exports/summaries.py", line 269, in final_summary
genome_df = summary_df.loc[summary_df['Members'].str.contains(genome, regex=False)]
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1424, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1839, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1133, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1092, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1177, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n ...\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64', length=438)] are in the [index]"
Would you please check it and see what went wrong? Thank you very much!
-
-
Hi 敬哲 姜,
I think the format of the gene-to-genome file might have different headers than the faa file. Could you copy-and-paste the first few lines of both files?
For example, if your FAA file headers are like this:
all-k141_3960179 flag=1 multi=30.9838 len=3478_1 all-k141_3960179 flag=1 multi=30.9838 len=3478_2 all-k141_3960179 flag=1 multi=30.9838 len=3478_3 all-k141_3960179 flag=1 multi=30.9838 len=3478_4 all-k141_3960179 flag=1 multi=30.9838 len=3478_5 all-k141_3960179 flag=1 multi=30.9838 len=3478_6
Then you’ll need to replace the spaces (“ “) with an underscore (“_”), to:
all-k141_3960179_flag=1_multi=30.9838_len=3478_1 all-k141_3960179_flag=1_multi=30.9838_len=3478_2 all-k141_3960179_flag=1_multi=30.9838_len=3478_3 all-k141_3960179_flag=1_multi=30.9838_len=3478_4 all-k141_3960179_flag=1_multi=30.9838_len=3478_5 all-k141_3960179_flag=1_multi=30.9838_len=3478_6
and have the gene-to-genome file like this:
genome_id,gene_id,keywords all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_1,none all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_2,none all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_3,none all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_4,none all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_5,none all-k141_3960179,all-k141_3960179_flag=1_multi=30.9838_len=3478_6,none
(Note the comma (“,”) between the genome_id, gene_id and keywords)
Also, 250K genomes is quite a lot. Usually, the error associated with too many genomes is a subprocess error, so I don’t think it’s due to that.
Cheers,
Ben
-
- changed status to resolved
Closing due to inactivity. Please re-open if this persists.
- Log in to comment
Hi, has there been any progress in resolving this issue? I too have failed to get the genome_by_genome_overview.csv included in the output (similar run parameters as those described above). No errors reported in the log file.