[UNEXPECTED ERROR] Error at the step 7 while loading all parse data

Issue #2 resolved
rémi Denise created an issue

Hi,

I tried to run iphop, but unfortunately, I get an error that I don’t understand and that iphop seems to don’t understand either:

### Welcome to iPHoP ###
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Skip] Skipping computation of blastn against microbial genomes...
[1/3/Skip] Skipping blast parsing...
[2/1/Skip] Skipping computation of blastn against CRISPR...
[2/2/Skip] Skipping crispr parsing...
[3/1/Skip] Skipping computation of WIsH scores...
[3/2/Skip] Skipping WIsH parsing...
[4/1/Skip] Skipping computation of VHM s2 similarities...
[4/2/Skip] Skipping VHM parsing...
[5/1/Skip] Skipping computation of PHP scores...
[5/2/Skip] Skipping PHP parsing...
[6/1/Skip] Skipping RaFAH...
[6/2/Skip] Skipping RaFAH parsing...
[6.5/1/Skip] Skipping diamond search against RaFAH refs...
[6.5/2/Skip] Skipping calculation of AAI to RaFAH refs...
[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
We have some cases with multiple values for what should be a unique key (Virus / Host / Host contig), this is unexpected, we stop
[[False False False False False False]
 [False False False False False False]
 [False False False False False False]
 ...
 [False False False False False False]
 [False False False False False False]
 [False False False False False False]]

I am using iPHoP v1.1.0 on the latest database version that I download Friday using the download task.

I can send the iphop output folder in case you need it to help me understand with iphop have some trouble (but the tar.gz is too big to attached to the issue, it is 163.3Mb)

Best

Remi

Comments (8)

  1. rémi Denise reporter

    Also maybe it is important for you, the command line was this one:

    iphop predict --fa_file ZSM005_contigs.selected.fasta --out_dir iphop --db_dir iPHoP/Sept_2021_pub/ --num_threads 15
    

  2. rémi Denise reporter

    And maybe I should also say that I installed iphop using conda as explained in the README:

    $ conda create -c conda-forge -n iphop_env python=3.8 mamba
    $ conda activate iphop_env
    $ mamba install -c conda-forge -c bioconda iphop
    

  3. Simon Roux repo owner

    Thanks for reporting, and sorry about that, this is an annoying one. Having the full iphop output directory in a tarball would be very helpful, feel free to share it by email (sroux - at - lbl . gov). This type of error is often due to a single sequence, so in the meantime it may also be worth trying to run the tool on subset(s) of the input file, to see if we can narrow down which contig is problematic.

  4. rémi Denise reporter

    Thank you I’ll try to run one sequence at a time to narrow the problem in the meantime, I sent you the output folder by email

  5. Simon Roux repo owner

    Hi,

    I think I found what triggered this error: the input file apparently includes two identical contigs, with an identical identifier (ZSM005-contigs-k119-194842). This causes some unexpected duplications down the road. Removing one of these two copies should work. Let me know how it goes when you have a chance to give it a try ! (Plus: I should probably add this check to the pre-processing, will put that on the todolist).

    Best,

    Simon

  6. rémi Denise reporter

    Thank you I didn’t check either in my data that everything was with one identifier. I’ll try to rerun it tomorrow without the duplication of the name (and also figure out why I have this duplication of name in the first place).

    Best

    Remi

  7. Simon Roux repo owner

    Identical contig names should now be checked automatically in v1.2, and a suffix added if needed (with a message in the log file).

  8. Log in to comment