- edited description
- attached fmt7.zip
MakeDb.py igblast arguments --regions and --scores produces errors
Hi, I’m testing the scripts igblast-example.sh downloaded from here.
When I ran the following codes in igblast-example.sh using MakeDb.py igblast with arguments --regions and --scores, I got errors.
MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --regions --scores --outname data --outdir ./data/output/
usage: MakeDb.py igblast [--version] [-h] [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--format {airr,changeo}] -i ALIGNER_FILES [ALIGNER_FILES ...] -r REPO [REPO ...] -s SEQ_FILES [SEQ_FILES ...]
[--10x CELLRANGER_FILE [CELLRANGER_FILE ...]] [--asis-id] [--asis-calls] [--partial] [--extended]
[--regions {default,rhesus-igl}]
MakeDb.py igblast: error: argument --regions: expected one argument
MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r /hwfssz5/ST_PRECISION/PM/5.Group5_CellMap/0.Others/huangbaoqian/18.change_o/04.db/data/imgt/human/vdj/ --scores --outname data --outdir ./data/output/
usage: MakeDb.py [--version] [-h] ...
MakeDb.py: error: unrecognized arguments: --scores
I found the errors seems to be caused by the arguments --regions and --scores, so I remove them and run the folllowing codes.
MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --outname data --outdir ./data/output/
START> MakeDB
COMMAND> igblast
ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False
PROGRESS> 19:17:33 |Done | 0.0 min
PROGRESS> 19:19:04 |####################| 100% (25,470) 1.5 min
OUTPUT> data_db-pass.tsv
PASS> 23809
FAIL> 1661
END> MakeDb
And I found that setting regions
argument as rhesus-igl
also worked, but maybe there is something difference.
MakeDb.py igblast -i ./data/output/reads_collapse-unique_atleast-2.fmt7 -s ./data/output/reads_collapse-unique_atleast-2.fasta -r ./data/imgt/human/vdj/ --regions rhesus-igl --outname data --outdir ./data/output/
START> MakeDB
COMMAND> igblast
ALIGNER_FILE> reads_collapse-unique_atleast-2.fmt7
SEQ_FILE> reads_collapse-unique_atleast-2.fasta
ASIS_ID> False
ASIS_CALLS> False
PARTIAL> False
EXTENDED> False
PROGRESS> 18:11:58 |Done | 0.0 min
PROGRESS> 18:13:19 |####################| 100% (25,470) 1.4 min
OUTPUT> data_db-pass.tsv
PASS> 213
FAIL> 25257
END> MakeDb
When I went on to use ParseDb.py, I got an empty output(igh_parse-select.tsv), using the results produced by MakeDb.py without arguments --regions and --scores above. (I used the ParseDb.py downloaded from here according to the issue.)
ParseDb.py select -d ./data/output/data_db-pass.tsv -f V_CALL -u IGHV
--regex --outname igh --outdir ./data/output/
$ head igh_parse-select.tsv
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junctionjunction_aa v_cigar d_cigar j_cigar stop_codon vj_in_frame locus junction_length np1_length np2_length v_sequence_startv_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end consensus_count duplicate_count prcons
So I think igh_parse-select.tsv is empty probably due to the 2 arguments.
$ MakeDb.py --version
MakeDb.py: 1.0.1.999 2020.11.18
$ ParseDb.py --version
ParseDb.py: 1.0.1.999 2020.11.18
$ igblastn -version
igblastn: 1.16.0
Package: igblast 1.17.0, build Oct 6 2020 11:26:37
Any ideas or help would be greatly appreciated!
Some useful attachments were uploaded.
Comments (11)
-
reporter -
reporter - edited description
- attached vdj.zip
-
-
assigned issue to
-
assigned issue to
-
Hi @Fubaoqian Huang , it looks like those webinar materials are outdated - they are using commands from older versions of the tools. The
--scores
and--region
arguments were changed a while ago: https://changeo.readthedocs.io/en/stable/news.html#version-0-4-6-july-19-2019The comparable argument to
--scores --regions
is--extended
in the current version. Also, the default format used by changeo is now the AIRR Standard, so unless you specify--format changeo
the column names will use AIRR format, which is why the ParseDb command isn’t working (different column names).My advice would be to use the Jupyter notebook materials here instead of the materials in that share:
It looks like we need to fix the tutorials section of the Immcantation docs. @Edel Aron , could you look at this, please? Can we just drop the old gDrive materials link in favor of the Jupyter notebook? Or are there new materials?
-
I agree with Jason, use the Jupyter notebook. This notebook was created using the materials from the webinar, and we added a little more information and context, for each step. I run the notebook with each new release of the framework, to be sure the commands work.
-
reporter Thanks for your reply. But I need to start with 2 fastq sequencing files(not produced by 10X) like this:
@CL100036998L2C001R001_2/1
CCCTAGTAGCTGGGAATACNGGTACGCACCACTGTGCCCAGCTAATTTTTGTATTTTCTGCAGAGACAGGGTTTCACCATGTTGCCCAAGCTGGTCTTGA
+
F@FFFFFFFFFF6FFFFFF!FFFFFFFDFFEFFFFGFFGFF<FFEFFFEFGFFFFFFFFGFFF?FFFFFGFGCFFFGFGFEFFGFGGF<FFFEFFDF3FC
@CL100036998L2C001R001_5/1
TTCCAGTAGTTCATTCTAANGCCTAGATTCTTTTGTGGTTGTTGCTGGCCCAATGAGTCCCTAGTCACATCCCCTGCCAGAGGGAGTTCTTCCTGTCTCT
+
FFFFFFFFEDFFFEF:FFE!FE9DEFBFD;FEECFBFF9>FCEFEFFFFFEFFFFFFFFFFFFFFFFFFFFF7FFFFFDE1F8F(EDAFCFFFFFFF>FF
@CL100036998L2C001R001_20/1
ATGCAGAAGGTATAGGGGTNAGTCCTTGCTATATTATGCTTGGTTATAATTTTTCATCTTTCCCTTGCGGTACTATATCTCTGTCTCTTATACACATCTC
+
FFFFEFF=FFGFGFEFFFF!FFFFFFFFFFFFFFDEFFFFGFFGGFGFFFFFFFFFFFFGFFGFEEEFFFBGFF?FFFFEFGGEFFFEEGGFGGFB>EEFI’m sorry that I don’t know how to process it into a fasta file (the input file in the notebook?) like this:
seq1|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnnnnnnnagtgtcaggtgcagctggtggagctggggagcgtggt
ccagcctgggaggccctgagtctctcctgtgcagcctctggattcaccttcagtctctat
gctatgcactgggtccgccaggctccaggcaaggggctggagtgggtggcagttatatca
tatcatgtaagcagtaaatactacgcagactccgtgaagggccgattcaccatctccaga
gacaattccaagaacacgctgtatctgcaaatgaacagcctgagagctgaggacacggct
gtgtattactgtgcgagagggccctatagtactggttattactacgagcttgactactgg
ggccagggaacgctcggtcacccgtctcctcacgggtagtgcatccgccccaaccc
seq2|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnntgtcccaggtgcagctgcaggagtcgggcccaggactggtga
agccttcacagaccctgtccctcacctgcactgtctctggtggctccatcagcagtggtg
gttactactggagctggatccgccagcacccagggaagggcctggagtggattgggtaca
tctattacagtgggagcacctactacaacccgtccctcaagagtcgagttaccatatcag
tagacacgtctaagaaccagttctccctgaagctgagctctgtgactgccgcggacacgg
ccgtgtattactgtgcgagagggagaggggatattgtagtagtaccagctgctcccggtc
cctatgatgcttttgatatctggggccaagggacaatcggtcacccgtctcttcacgggt
agtgcatccgccccaacccAny ideas?
-
If you are starting from raw fastq files, you typically process them first with pRESTO (https://presto.readthedocs.io/). The first thing you need to know, is the layout of your reads (the read configuration, where primers are…). You can see some examples under the section “Example Workflows”. Where does your data come from, what protocol? I am asking because the reads seem short, and I get no hits at IMGT, but I do get hits with NCBI’s blast to genes not related to BCR or TCR.
-
reporter Thanks a lot. I think I’ve found a software named BraCeR, which could deal with raw fastq files to produce fasta files like the input.fasta.
-
reporter Hi, when I tried to click a link in the notebook part Example data used in the tutorial, I found the tutorial to obtain the input.fasta could not be opened. Could you provide another new one?
Example data used in the tutorial
../data/input.fasta
: Processed B cell receptor reads from one healthy donor (PGP1) 3 weeks after flu vaccination (Laserson et al. (2014)). As part of the processing, each sequence has been annotated with the isotype. This step, is not part of the tutorial, but you can learn how to do it here.( Actually, our fastq files were not sequenced by 454 or Illumina but BGI, so the “Example Workflows” of pRESTO might not be suitable for our files. And I got few sequences using the BraCeR.)
-
Yes, the link is broken, after we did some reorganization of the menus in the website. I will update the notebook for the next release. You can use this link: https://presto.readthedocs.io/en/latest/examples/primers.html#assigning-isotype-annotations-from-the-constant-region-sequence
I am not familiar with BGI, and their IR-SEQ, so I can’t give specific comments. pRESTO is modular and that gives you a lot of flexibility to create a workflow that works for your particular data. You can use the examples as inspiration.
-
- changed status to resolved
This thread is not active, so I will close it. I have created a new issue in Immcantation's repo (#75), to create an updated pRESTO tutorial
- Log in to comment