MakeDb.py igblast arguments --regions and --scores produces errors

Issue #174 resolved

Fubaoqian Huang created an issue 2021-01-19

Hi, I’m testing the scripts igblast-example.sh downloaded from here.

When I ran the following codes in igblast-example.sh using MakeDb.py igblast with arguments --regions and --scores, I got errors.

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --regions --scores --outname data --outdir ./data/output/

usage: MakeDb.py igblast [--version] [-h] [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--format {airr,changeo}] -i ALIGNER_FILES [ALIGNER_FILES ...] -r REPO [REPO ...] -s SEQ_FILES [SEQ_FILES ...]
[--10x CELLRANGER_FILE [CELLRANGER_FILE ...]] [--asis-id] [--asis-calls] [--partial] [--extended]
[--regions {default,rhesus-igl}]
MakeDb.py igblast: error: argument --regions: expected one argument

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r /hwfssz5/ST_PRECISION/PM/5.Group5_CellMap/0.Others/huangbaoqian/18.change_o/04.db/data/imgt/human/vdj/ --scores --outname data --outdir ./data/output/

usage: MakeDb.py [--version] [-h] ...
MakeDb.py: error: unrecognized arguments: --scores

‌

I found the errors seems to be caused by the arguments --regions and --scores, so I remove them and run the folllowing codes.

MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --outname data --outdir ./data/output/

START> MakeDB

COMMAND> igblast

ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False

PROGRESS> 19:17:33 |Done | 0.0 min

PROGRESS> 19:19:04 |####################| 100% (25,470) 1.5 min

OUTPUT> data_db-pass.tsv
PASS> 23809
FAIL> 1661
END> MakeDb

‌

And I found that setting regions argument as rhesus-igl also worked, but maybe there is something difference.

START> MakeDB

COMMAND> igblast

ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False

PROGRESS> 18:11:58 |Done | 0.0 min

PROGRESS> 18:13:19 |####################| 100% (25,470) 1.4 min

OUTPUT> data_db-pass.tsv
PASS> 213
FAIL> 25257
END> MakeDb

When I went on to use ParseDb.py, I got an empty output(igh_parse-select.tsv), using the results produced by MakeDb.py without arguments --regions and --scores above. (I used the ParseDb.py downloaded from here according to the issue.)

ParseDb.py select -d ./data/output/data_db-pass.tsv -f V_CALL -u IGHV
--regex --outname igh --outdir ./data/output/

$ head igh_parse-select.tsv
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junctionjunction_aa v_cigar d_cigar j_cigar stop_codon vj_in_frame locus junction_length np1_length np2_length v_sequence_startv_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end consensus_count duplicate_count prcons

So I think igh_parse-select.tsv is empty probably due to the 2 arguments.

$ MakeDb.py --version
MakeDb.py: 1.0.1.999 2020.11.18

$ ParseDb.py --version
ParseDb.py: 1.0.1.999 2020.11.18

$ igblastn -version
igblastn: 1.16.0
Package: igblast 1.17.0, build Oct 6 2020 11:26:37

Any ideas or help would be greatly appreciated!

Some useful attachments were uploaded.

Comments (11)

Fubaoqian Huang reporter
- edited description
- attached fmt7.zip
- 2021-01-19T11:46:47+00:00
Fubaoqian Huang reporter
- edited description
- attached vdj.zip
- 2021-01-19T11:52:57+00:00
Jason Vander Heiden
- assigned issue to
  
  Edel Aron
- 2021-01-19T19:23:52+00:00
Jason Vander Heiden
Hi @Fubaoqian Huang , it looks like those webinar materials are outdated - they are using commands from older versions of the tools. The --scores and --region arguments were changed a while ago: https://changeo.readthedocs.io/en/stable/news.html#version-0-4-6-july-19-2019

The comparable argument to --scores --regions is --extended in the current version. Also, the default format used by changeo is now the AIRR Standard, so unless you specify --format changeo the column names will use AIRR format, which is why the ParseDb command isn’t working (different column names).

My advice would be to use the Jupyter notebook materials here instead of the materials in that share:

https://immcantation.readthedocs.io/en/latest/tutorials/tutorials.html#introductory-webinar-and-jupyter-notebook

It looks like we need to fix the tutorials section of the Immcantation docs. @Edel Aron , could you look at this, please? Can we just drop the old gDrive materials link in favor of the Jupyter notebook? Or are there new materials?
- 2021-01-19T19:25:11+00:00
ssnn
I agree with Jason, use the Jupyter notebook. This notebook was created using the materials from the webinar, and we added a little more information and context, for each step. I run the notebook with each new release of the framework, to be sure the commands work.
- 2021-01-22T15:47:55+00:00
Fubaoqian Huang reporter
Thanks for your reply. But I need to start with 2 fastq sequencing files(not produced by 10X) like this:

@CL100036998L2C001R001_2/1
CCCTAGTAGCTGGGAATACNGGTACGCACCACTGTGCCCAGCTAATTTTTGTATTTTCTGCAGAGACAGGGTTTCACCATGTTGCCCAAGCTGGTCTTGA
+
F@FFFFFFFFFF6FFFFFF!FFFFFFFDFFEFFFFGFFGFF<FFEFFFEFGFFFFFFFFGFFF?FFFFFGFGCFFFGFGFEFFGFGGF<FFFEFFDF3FC
@CL100036998L2C001R001_5/1
TTCCAGTAGTTCATTCTAANGCCTAGATTCTTTTGTGGTTGTTGCTGGCCCAATGAGTCCCTAGTCACATCCCCTGCCAGAGGGAGTTCTTCCTGTCTCT
+
FFFFFFFFEDFFFEF:FFE!FE9DEFBFD;FEECFBFF9>FCEFEFFFFFEFFFFFFFFFFFFFFFFFFFFF7FFFFFDE1F8F(EDAFCFFFFFFF>FF
@CL100036998L2C001R001_20/1
ATGCAGAAGGTATAGGGGTNAGTCCTTGCTATATTATGCTTGGTTATAATTTTTCATCTTTCCCTTGCGGTACTATATCTCTGTCTCTTATACACATCTC
+
FFFFEFF=FFGFGFEFFFF!FFFFFFFFFFFFFFDEFFFFGFFGGFGFFFFFFFFFFFFGFFGFEEEFFFBGFF?FFFFEFGGEFFFEEGGFGGFB>EEF

I’m sorry that I don’t know how to process it into a fasta file (the input file in the notebook?) like this:

seq1|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnnnnnnnagtgtcaggtgcagctggtggagctggggagcgtggt
ccagcctgggaggccctgagtctctcctgtgcagcctctggattcaccttcagtctctat
gctatgcactgggtccgccaggctccaggcaaggggctggagtgggtggcagttatatca
tatcatgtaagcagtaaatactacgcagactccgtgaagggccgattcaccatctccaga
gacaattccaagaacacgctgtatctgcaaatgaacagcctgagagctgaggacacggct
gtgtattactgtgcgagagggccctatagtactggttattactacgagcttgactactgg
ggccagggaacgctcggtcacccgtctcctcacgggtagtgcatccgccccaaccc
seq2|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnntgtcccaggtgcagctgcaggagtcgggcccaggactggtga
agccttcacagaccctgtccctcacctgcactgtctctggtggctccatcagcagtggtg
gttactactggagctggatccgccagcacccagggaagggcctggagtggattgggtaca
tctattacagtgggagcacctactacaacccgtccctcaagagtcgagttaccatatcag
tagacacgtctaagaaccagttctccctgaagctgagctctgtgactgccgcggacacgg
ccgtgtattactgtgcgagagggagaggggatattgtagtagtaccagctgctcccggtc
cctatgatgcttttgatatctggggccaagggacaatcggtcacccgtctcttcacgggt
agtgcatccgccccaaccc

Any ideas?
- 2021-01-25T01:27:54+00:00
ssnn
If you are starting from raw fastq files, you typically process them first with pRESTO (https://presto.readthedocs.io/). The first thing you need to know, is the layout of your reads (the read configuration, where primers are…). You can see some examples under the section “Example Workflows”. Where does your data come from, what protocol? I am asking because the reads seem short, and I get no hits at IMGT, but I do get hits with NCBI’s blast to genes not related to BCR or TCR.
- 2021-01-25T11:02:33+00:00
Fubaoqian Huang reporter
Thanks a lot. I think I’ve found a software named BraCeR, which could deal with raw fastq files to produce fasta files like the input.fasta.
- 2021-01-27T09:12:56+00:00
Fubaoqian Huang reporter
Hi, when I tried to click a link in the notebook part Example data used in the tutorial, I found the tutorial to obtain the input.fasta could not be opened. Could you provide another new one?

‌

Example data used in the tutorial

../data/input.fasta : Processed B cell receptor reads from one healthy donor (PGP1) 3 weeks after flu vaccination (Laserson et al. (2014)). As part of the processing, each sequence has been annotated with the isotype. This step, is not part of the tutorial, but you can learn how to do it here.

( Actually, our fastq files were not sequenced by 454 or Illumina but BGI, so the “Example Workflows” of pRESTO might not be suitable for our files. And I got few sequences using the BraCeR.)
- 2021-01-28T01:21:16+00:00
ssnn
Yes, the link is broken, after we did some reorganization of the menus in the website. I will update the notebook for the next release. You can use this link: https://presto.readthedocs.io/en/latest/examples/primers.html#assigning-isotype-annotations-from-the-constant-region-sequence

I am not familiar with BGI, and their IR-SEQ, so I can’t give specific comments. pRESTO is modular and that gives you a lot of flexibility to create a workflow that works for your particular data. You can use the examples as inspiration.

‌
- 2021-01-28T10:20:39+00:00
ssnn
- changed status to resolved
This thread is not active, so I will close it. I have created a new issue in Immcantation's repo (#75), to create an updated pRESTO tutorial
- 2021-03-04T09:19:44+00:00
Log in to comment

Assignee: Edel Aron

Type: bug

Priority: major

Status: resolved

Milestone: –

Votes: 0

Watchers: 1

Comments (11)

Example data used in the tutorial