Infinite warning messages

Issue #24 resolved
Former user created an issue
<<input>>
CHROMOSOME      POSITION        REF     ALT     SAMPLE
chr1    43145251        G       A       A_001
chr2    200981649       -       G       A_001
chr3    3056129 C       A       A_001
......
chr5    96675555        C       A       A_150
<<GRCh38>>
CHROMOSOME      START   END     STRAND  ELEMENT SEGMENT SYMBOL
9       14807   14940   -       ENSG00000181404 ENSG00000181404 WASHC1
9       15081   15149   -       ENSG00000181404 ENSG00000181404 WASHC1
<<log>>
2019-12-24 09:20:46 oncodrivefml INFO -- Using HG38 as reference genome
2019-12-24 09:20:46 oncodrivefml INFO -- Running analysis
           Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 159986.35it/s]
           Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 158373.49it/s]2019-12-24 09:20:50 oncodrivefml INFO -- Building regions tree
2019-12-24 09:20:50 oncodrivefml INFO -- [1 of 19694]
2019-12-24 09:20:55 oncodrivefml INFO -- [7333 of 19694]
2019-12-24 09:21:00 oncodrivefml INFO -- [14665 of 19694]
2019-12-24 09:21:04 oncodrivefml INFO -- [19694 of 19694]
2019-12-24 09:21:22 oncodrivefml INFO -- Mapping mutations
* [0 muts]
(base) root@VBR:woSilent# head -n 100 run.sh_log
2019-12-24 09:20:46 oncodrivefml INFO -- Using HG38 as reference genome
2019-12-24 09:20:46 oncodrivefml INFO -- Running analysis
           Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 159986.35it/s]
           Parsing elements 'GRCh38.txt': 100%|█████████▉| 258562/258563 [00:01<00:00, 158373.49it/s]2019-12-24 09:20:50 oncodrivefml INFO -- Building regions tree
2019-12-24 09:20:50 oncodrivefml INFO -- [1 of 19694]
2019-12-24 09:20:55 oncodrivefml INFO -- [7333 of 19694]
2019-12-24 09:21:00 oncodrivefml INFO -- [14665 of 19694]
2019-12-24 09:21:04 oncodrivefml INFO -- [19694 of 19694]
2019-12-24 09:21:22 oncodrivefml INFO -- Mapping mutations
* [0 muts]
2019-12-24 09:21:23 oncodrivefml INFO -- Computing signatures

2019-12-24 09:21:27 oncodrivefml INFO -- Computing OncodriveFML
2019-12-24 09:21:28 oncodrivefml WARNING -- Background mismatch at position 100969623 at 'ENSG00000205277'
2019-12-24 09:21:28 oncodrivefml WARNING -- Background mismatch at position 100969623 at 'ENSG00000205277'
.....

Infinite warnings occurs.

<results>
GENE_ID MUTS    MUTS_RECURRENCE SAMPLES P_VALUE Q_VALUE P_VALUE_NEG     Q_VALUE_NEG     SNP     MNP     INDELS  SYMBOL
ENSG00000140995 6       2       5       1e-06   0.0001608148148148148   1.0     1.0     4       0       2       DEF8
ENSG00000205189 3       1       3       1e-06   0.0001608148148148148   0.999999        1.0     0       0       3       ZBTB10

Am I okay with so many warnings? Can I believe the results ?

Thanks.

Comments (8)

  1. Iker Reyes

    I am to not sure to which extent the results are reliable if you have lots of warnings. Which scores are you using?

  2. Wang Lin

    I meet the same error when using the package on data that uses hg38 as reference. The version of the scores is 1.3, but I find cadd1.3 doesn’t support hg38. How can I get the CADD score for hg38? Or what should I do if I want to use this package for hg38?

  3. Wang Lin

    The cadd1.0 score files downloaded by bg-data are binary files. Do these cadd files for hg38 need to be converted before use on the package? Or is there any help document about using this package for hg38?

    Thanks.

  4. Iker Reyes

    The CADD 1.0 files are binary files that we created to reduce the size for a first-test use. The CADD files you can download from https://cadd.gs.washington.edu/download are tabix files. You need to change the configuration file appropriately, but tabix is supported. Details are explained in the docs: https://oncodrivefml.readthedocs.io/en/latest/configuration.html#score

    In short, you need something like this in your configuration file:

    [score]
    file = "cadd.tsv.gz"
    format = 'tabix'
    chr = 0
    chr_prefix = ""
    pos = 1
    ref = 2
    alt = 3
    score = 5
    

    Check the column numbers as they might be different. As score, we typically use the ``PHRED``.

  5. Log in to comment