MakeDb.py igblast arguments --regions and --scores produces errors

Issue #174 resolved
Fubaoqian Huang created an issue

Hi, I’m testing the scripts igblast-example.sh downloaded from here.

When I ran the following codes in igblast-example.sh using MakeDb.py igblast with arguments --regions and --scores, I got errors.

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --regions --scores --outname data --outdir ./data/output/

usage: MakeDb.py igblast [--version] [-h] [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--format {airr,changeo}] -i ALIGNER_FILES [ALIGNER_FILES ...] -r REPO [REPO ...] -s SEQ_FILES [SEQ_FILES ...]
[--10x CELLRANGER_FILE [CELLRANGER_FILE ...]] [--asis-id] [--asis-calls] [--partial] [--extended]
[--regions {default,rhesus-igl}]
MakeDb.py igblast: error: argument --regions: expected one argument

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r /hwfssz5/ST_PRECISION/PM/5.Group5_CellMap/0.Others/huangbaoqian/18.change_o/04.db/data/imgt/human/vdj/ --scores --outname data --outdir ./data/output/

usage: MakeDb.py [--version] [-h] ...
MakeDb.py: error: unrecognized arguments: --scores


I found the errors seems to be caused by the arguments --regions and --scores, so I remove them and run the folllowing codes.

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --outname data --outdir ./data/output/

START> MakeDB

COMMAND> igblast

ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False

PROGRESS> 19:17:33 |Done | 0.0 min

PROGRESS> 19:19:04 |####################| 100% (25,470) 1.5 min

OUTPUT> data_db-pass.tsv
PASS> 23809
FAIL> 1661
END> MakeDb


And I found that setting regions argument as rhesus-igl also worked, but maybe there is something difference.

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --regions rhesus-igl --outname data --outdir ./data/output/

START> MakeDB

COMMAND> igblast

ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False

PROGRESS> 18:11:58 |Done | 0.0 min

PROGRESS> 18:13:19 |####################| 100% (25,470) 1.4 min

OUTPUT> data_db-pass.tsv
PASS> 213
FAIL> 25257
END> MakeDb

When I went on to use ParseDb.py, I got an empty output(igh_parse-select.tsv), using the results produced by MakeDb.py without arguments --regions and --scores above. (I used the ParseDb.py downloaded from here according to the issue.)

ParseDb.py select -d ./data/output/data_db-pass.tsv -f V_CALL -u IGHV
--regex --outname igh --outdir ./data/output/

$ head igh_parse-select.tsv
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junctionjunction_aa v_cigar d_cigar j_cigar stop_codon vj_in_frame locus junction_length np1_length np2_length v_sequence_startv_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end consensus_count duplicate_count prcons

So I think igh_parse-select.tsv is empty probably due to the 2 arguments.

$ MakeDb.py --version
MakeDb.py: 1.0.1.999 2020.11.18

$ ParseDb.py --version
ParseDb.py: 1.0.1.999 2020.11.18

$ igblastn -version
igblastn: 1.16.0
Package: igblast 1.17.0, build Oct 6 2020 11:26:37

Any ideas or help would be greatly appreciated!

Some useful attachments were uploaded.

Comments (11)

  1. Jason Vander Heiden

    Hi @Fubaoqian Huang , it looks like those webinar materials are outdated - they are using commands from older versions of the tools. The --scores and --region arguments were changed a while ago: https://changeo.readthedocs.io/en/stable/news.html#version-0-4-6-july-19-2019

    The comparable argument to --scores --regions is --extended in the current version. Also, the default format used by changeo is now the AIRR Standard, so unless you specify --format changeo the column names will use AIRR format, which is why the ParseDb command isn’t working (different column names).

    My advice would be to use the Jupyter notebook materials here instead of the materials in that share:

    https://immcantation.readthedocs.io/en/latest/tutorials/tutorials.html#introductory-webinar-and-jupyter-notebook

    It looks like we need to fix the tutorials section of the Immcantation docs. @Edel Aron , could you look at this, please? Can we just drop the old gDrive materials link in favor of the Jupyter notebook? Or are there new materials?

  2. ssnn

    I agree with Jason, use the Jupyter notebook. This notebook was created using the materials from the webinar, and we added a little more information and context, for each step. I run the notebook with each new release of the framework, to be sure the commands work.

  3. Fubaoqian Huang reporter

    Thanks for your reply. But I need to start with 2 fastq sequencing files(not produced by 10X) like this:

    @CL100036998L2C001R001_2/1
    CCCTAGTAGCTGGGAATACNGGTACGCACCACTGTGCCCAGCTAATTTTTGTATTTTCTGCAGAGACAGGGTTTCACCATGTTGCCCAAGCTGGTCTTGA
    +
    F@FFFFFFFFFF6FFFFFF!FFFFFFFDFFEFFFFGFFGFF<FFEFFFEFGFFFFFFFFGFFF?FFFFFGFGCFFFGFGFEFFGFGGF<FFFEFFDF3FC
    @CL100036998L2C001R001_5/1
    TTCCAGTAGTTCATTCTAANGCCTAGATTCTTTTGTGGTTGTTGCTGGCCCAATGAGTCCCTAGTCACATCCCCTGCCAGAGGGAGTTCTTCCTGTCTCT
    +
    FFFFFFFFEDFFFEF:FFE!FE9DEFBFD;FEECFBFF9>FCEFEFFFFFEFFFFFFFFFFFFFFFFFFFFF7FFFFFDE1F8F(EDAFCFFFFFFF>FF
    @CL100036998L2C001R001_20/1
    ATGCAGAAGGTATAGGGGTNAGTCCTTGCTATATTATGCTTGGTTATAATTTTTCATCTTTCCCTTGCGGTACTATATCTCTGTCTCTTATACACATCTC
    +
    FFFFEFF=FFGFGFEFFFF!FFFFFFFFFFFFFFDEFFFFGFFGGFGFFFFFFFFFFFFGFFGFEEEFFFBGFF?FFFFEFGGEFFFEEGGFGGFB>EEF

    I’m sorry that I don’t know how to process it into a fasta file (the input file in the notebook?) like this:

    seq1|ISOTYPE=IGHM
    nnnnnnnnnnnnnnnnnnnnnnnagtgtcaggtgcagctggtggagctggggagcgtggt
    ccagcctgggaggccctgagtctctcctgtgcagcctctggattcaccttcagtctctat
    gctatgcactgggtccgccaggctccaggcaaggggctggagtgggtggcagttatatca
    tatcatgtaagcagtaaatactacgcagactccgtgaagggccgattcaccatctccaga
    gacaattccaagaacacgctgtatctgcaaatgaacagcctgagagctgaggacacggct
    gtgtattactgtgcgagagggccctatagtactggttattactacgagcttgactactgg
    ggccagggaacgctcggtcacccgtctcctcacgggtagtgcatccgccccaaccc
    seq2|ISOTYPE=IGHM
    nnnnnnnnnnnnnnnnnntgtcccaggtgcagctgcaggagtcgggcccaggactggtga
    agccttcacagaccctgtccctcacctgcactgtctctggtggctccatcagcagtggtg
    gttactactggagctggatccgccagcacccagggaagggcctggagtggattgggtaca
    tctattacagtgggagcacctactacaacccgtccctcaagagtcgagttaccatatcag
    tagacacgtctaagaaccagttctccctgaagctgagctctgtgactgccgcggacacgg
    ccgtgtattactgtgcgagagggagaggggatattgtagtagtaccagctgctcccggtc
    cctatgatgcttttgatatctggggccaagggacaatcggtcacccgtctcttcacgggt
    agtgcatccgccccaaccc

    Any ideas?

  4. ssnn

    If you are starting from raw fastq files, you typically process them first with pRESTO (https://presto.readthedocs.io/). The first thing you need to know, is the layout of your reads (the read configuration, where primers are…). You can see some examples under the section “Example Workflows”. Where does your data come from, what protocol? I am asking because the reads seem short, and I get no hits at IMGT, but I do get hits with NCBI’s blast to genes not related to BCR or TCR.

  5. Fubaoqian Huang reporter

    Thanks a lot. I think I’ve found a software named BraCeR, which could deal with raw fastq files to produce fasta files like the input.fasta.

  6. Fubaoqian Huang reporter

    Hi, when I tried to click a link in the notebook part Example data used in the tutorial, I found the tutorial to obtain the input.fasta could not be opened. Could you provide another new one?

    Example data used in the tutorial

    ../data/input.fasta : Processed B cell receptor reads from one healthy donor (PGP1) 3 weeks after flu vaccination (Laserson et al. (2014)). As part of the processing, each sequence has been annotated with the isotype. This step, is not part of the tutorial, but you can learn how to do it here.

    ( Actually, our fastq files were not sequenced by 454 or Illumina but BGI, so the “Example Workflows” of pRESTO might not be suitable for our files. And I got few sequences using the  BraCeR.)

  7. ssnn

    Yes, the link is broken, after we did some reorganization of the menus in the website. I will update the notebook for the next release. You can use this link: https://presto.readthedocs.io/en/latest/examples/primers.html#assigning-isotype-annotations-from-the-constant-region-sequence

    I am not familiar with BGI, and their IR-SEQ, so I can’t give specific comments. pRESTO is modular and that gives you a lot of flexibility to create a workflow that works for your particular data. You can use the examples as inspiration.

  8. ssnn

    This thread is not active, so I will close it. I have created a new issue in Immcantation's repo (#75), to create an updated pRESTO tutorial

  9. Log in to comment