Infinite warning messages
<<input>>
CHROMOSOME POSITION REF ALT SAMPLE
chr1 43145251 G A A_001
chr2 200981649 - G A_001
chr3 3056129 C A A_001
......
chr5 96675555 C A A_150
<<GRCh38>>
CHROMOSOME START END STRAND ELEMENT SEGMENT SYMBOL
9 14807 14940 - ENSG00000181404 ENSG00000181404 WASHC1
9 15081 15149 - ENSG00000181404 ENSG00000181404 WASHC1
<<log>>
2019-12-24 09:20:46 oncodrivefml INFO -- Using HG38 as reference genome
2019-12-24 09:20:46 oncodrivefml INFO -- Running analysis
Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 159986.35it/s]
Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 158373.49it/s]2019-12-24 09:20:50 oncodrivefml INFO -- Building regions tree
2019-12-24 09:20:50 oncodrivefml INFO -- [1 of 19694]
2019-12-24 09:20:55 oncodrivefml INFO -- [7333 of 19694]
2019-12-24 09:21:00 oncodrivefml INFO -- [14665 of 19694]
2019-12-24 09:21:04 oncodrivefml INFO -- [19694 of 19694]
2019-12-24 09:21:22 oncodrivefml INFO -- Mapping mutations
* [0 muts]
(base) root@VBR:woSilent# head -n 100 run.sh_log
2019-12-24 09:20:46 oncodrivefml INFO -- Using HG38 as reference genome
2019-12-24 09:20:46 oncodrivefml INFO -- Running analysis
Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 159986.35it/s]
Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 158373.49it/s]2019-12-24 09:20:50 oncodrivefml INFO -- Building regions tree
2019-12-24 09:20:50 oncodrivefml INFO -- [1 of 19694]
2019-12-24 09:20:55 oncodrivefml INFO -- [7333 of 19694]
2019-12-24 09:21:00 oncodrivefml INFO -- [14665 of 19694]
2019-12-24 09:21:04 oncodrivefml INFO -- [19694 of 19694]
2019-12-24 09:21:22 oncodrivefml INFO -- Mapping mutations
* [0 muts]
2019-12-24 09:21:23 oncodrivefml INFO -- Computing signatures
2019-12-24 09:21:27 oncodrivefml INFO -- Computing OncodriveFML
2019-12-24 09:21:28 oncodrivefml WARNING -- Background mismatch at position 100969623 at 'ENSG00000205277'
2019-12-24 09:21:28 oncodrivefml WARNING -- Background mismatch at position 100969623 at 'ENSG00000205277'
.....
Infinite warnings occurs.
<results>
GENE_ID MUTS MUTS_RECURRENCE SAMPLES P_VALUE Q_VALUE P_VALUE_NEG Q_VALUE_NEG SNP MNP INDELS SYMBOL
ENSG00000140995 6 2 5 1e-06 0.0001608148148148148 1.0 1.0 4 0 2 DEF8
ENSG00000205189 3 1 3 1e-06 0.0001608148148148148 0.999999 1.0 0 0 3 ZBTB10
Am I okay with so many warnings? Can I believe the results ?
Thanks.
Comments (8)
-
-
-
assigned issue to
-
assigned issue to
-
I meet the same error when using the package on data that uses hg38 as reference. The version of the scores is 1.3, but I find cadd1.3 doesn’t support hg38. How can I get the CADD score for hg38? Or what should I do if I want to use this package for hg38?
-
- changed status to resolved
CADD versions 1.4 and 1.5 support hg38. You can find the files in https://cadd.gs.washington.edu/download
-
The cadd1.0 score files downloaded by bg-data are binary files. Do these cadd files for hg38 need to be converted before use on the package? Or is there any help document about using this package for hg38?
Thanks.
-
- changed status to open
-
The CADD 1.0 files are binary files that we created to reduce the size for a first-test use. The CADD files you can download from https://cadd.gs.washington.edu/download are tabix files. You need to change the configuration file appropriately, but tabix is supported. Details are explained in the docs: https://oncodrivefml.readthedocs.io/en/latest/configuration.html#score
In short, you need something like this in your configuration file:
[score] file = "cadd.tsv.gz" format = 'tabix' chr = 0 chr_prefix = "" pos = 1 ref = 2 alt = 3 score = 5
Check the column numbers as they might be different. As score, we typically use the ``PHRED``.
-
- changed status to resolved
- Log in to comment
I am to not sure to which extent the results are reliable if you have lots of warnings. Which scores are you using?