Background mismatch and documentation issues

Issue #27 new
Mariela Cortés-López created an issue

Hi,
I am trying to use oncodrivefml in a set of mutations within ALU elements. I imput my mutations an the coordinates of the elements, it runs well, but I have the warning of a Background mismatch, what does this mean exactly?
Also I am using hg38 as reference, does CADD scores (v1.0) provided in bgdata work with this version too?
Can I download the most recent CADD scores from bgdata (v1.5) and if so, how? I have tried as

bgdata get genomicscores/caddpack/1.5

2020-04-02 13:26:45 bgdata.utils ERROR -- TagNotFound: Tag master for package genomicscores/caddpack/1.5 not found (Missing tag file in remote)

but it does not seems to find it. Also, for the chromosome prefix information in the configuration file, is this in regards to the file that we input? Because in the examples looks like all the files are in Ensembl formatting (without “chr”) but I am not sure if I also need to change this.

Best,

Comments (5)

  1. Iker

    Background mismatch refers to the fact that the reference nucleotide(s) in your input do not match the actual reference genome.

    CADD scores 1.0 are not meant to be used with hg38. You can download CADD v1.5 scores from the original source: https://cadd.gs.washington.edu/download

    You need to adapt the configuration appropriately as indicated in the documentation to use the tabix file you download. Remember to also download the tabix index file. The caddpack version was something we build only for v1.0.

    The chromosome prefix in the configuration refers to the prefix in the scores file, not in your input file.

  2. Amarinder Thind

    I downloaded CADD v1.6 (for hg38 option) and mentioned the folder path in a config file as below..

    [score]
    # Path to score file
    file = "/path/to/scores/file"
    format = 'tabix'
    

    this folder has two files named

    1. whole_genome_SNVs.tsv.gz
    2. whole_genome_SNVs.tsv.gz.tbi

    Do I need to change the name of these downloaded files as well, since it differs from the original CADD.v.1 names (whole_genome_SNVs.fml and whole_genome_SNVs.fml.idx)?

  3. Iker

    You need to set the file to the path the of the file you have downloaded, e.g. /your_path/whole_genome_SNVs.tsv.gz

  4. Log in to comment