Error in MakeDb.py

Issue #169 resolved
veronica CS created an issue

Hi, I intend to analyse my immune repertoire data with Changeo, but I am having some troubles.

When I use MakeDb.py I get the error below regarding the depreciated Bio.Alphabet package. Is this a known issue? How can I circumvent it?

I appreciate any help you could give me as this analysis is very important for the paper I am writing. Best regards.

File "/usr/local/bin/MakeDb.py", line 22, in <module>
    from presto.IO import countSeqFile, printLog, printMessage, printProgress, printError, printWarning, readSeqFile
  File "/usr/local/lib/python3.8/site-packages/presto/IO.py", line 15, in <module>
    from Bio.Alphabet import IUPAC
  File "/usr/local/lib/python3.8/site-packages/Bio/Alphabet/__init__.py", line 20, in <module>
    raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly.

Comments (9)

  1. Jason Vander Heiden

    Hi @veronica CS , thanks for the bug report!

    This issue should be fixed in the latest version of the code, but that code hasn’t been pushed to PyPI or the release docker container yet. If you’re using a local install of changeo, then you can install the current development code from Bitbucket via:

    pip3 install git+https://bitbucket.org/kleinstein/changeo@master

    Note, you will have to do the same for presto first, because it has the same issue:

    pip3 install git+https://bitbucket.org/kleinstein/presto@master

    If you’re using the docker container, then it shouldn’t have this problem, but… We can force a build of the development version that you can use instead of v4.1.0, if you prefer.

    We should be able to make official releases of presto/changeo with the fix over the weekend (probably).

  2. veronica CS reporter

    Dear Jason,

    Thank you very much for your answer. I could install it now and it is working.

    However after:

    -Processing the IMGT output with: MakeDb.py

    -Removing non-productive sequences with ParseDb.py

    -Assigning clones with DefineClones.py

    -Reconstructing germline sequences with CreateGermlines.py

    Tried to Build trees with IgPhyML. But I receive either this error if I do CreateGermline.py with -g full

    error> b'COMMAND: igphyml --repfile 101_IgPhy_lineages.tsv -m GY --run_id gy --outrep 101_IgPhy_lineages_gy.tsv --threads 1 --outname 101_IgPhy_lineages.tsv_igphyml_stats_gy.txt \nTrees in repfile: 1853\nSequence of taxon 1128_GERM contains stop codon(s). Impossible to continue. Remove stop codon sites across all sequences and try again.\n' <ERROR> GY94 tree building in IgPhyML failed

    Or this when I do CreateGermline.py with -g vonly

    error> b'' <ERROR> GY94 tree building in IgPhyML failed.

    For the stop codon error I checked and I find that in the germline of those sequences I have a stop codon in the >GERM sequence, but I don't know how to solve this situation.

    Would you be able to help me with this as well?

    Thank you very much.

  3. Kenneth Hoehn

    Unfortunately IgPhyML can’t function if any of the sequences have stop codons. Could you try doing CreateGermlines with -g dmask? That’s how we normally do it. Alternatively, you could filter out sequences that have stop codons in the germline or input. Or you could use the lineage tree building features of Alakazam, which can work with stop codons. Although, I would be suspicious of a clonal lineage with a stop codon in its germline sequence..

  4. veronica CS reporter

    Dear Kenneth,

    Thank you very much for your reply.

    Do you think that it is expected that even after filtering non-productive sequences I still have clonal lineages with stop codon?

    Now I run it with --dmask, but when I tried to build the trees I have this other error:

    igphyml --repfile 101_IgPhy_lineages.tsv -m GY --run_id gy --outrep 101_IgPhy_lineages_gy.tsv --threads 1 --outname 101_IgPhy_lineages.tsv_igphyml_stats_gy.txt
    error> b'' <
    ERROR> GY94 tree building in IgPhyML failed

    -

    I will try using Alakazam.

    Again, thanks for your help.

  5. Kenneth Hoehn

    If the stop codon is in the germline, then filtering nonproductive sequences may not help. Is the stop codon in the junction region of the germline? Could you send the full terminal output from running BuildTrees?

  6. veronica CS reporter

    Dear Jason,

    Thank you for the information. I ended up using SHazaM for what I needed, but this is going to be useful for future analises.

    Best regards.

    Verponica

  7. Log in to comment