Using HG38 Tutorial (or at least more detailed documentation)?

I want to use OncodriveFML on data that uses HG38 as the reference but the documentation explaining how to do this is a little bit scattered and vague. It would be extremely helpful to have a tutorial walking users through the process of setting everything up to run an analysis using HG38 as the reference genome. Alternatively, simply expanding the documentation sections listed below to make them more detailed would also go a long way towards making OncodriveFML more user-friendly.

‌

Reading the “Configuration” page of the documentation makes clear that the reference genome must be downloaded from UCSC’s website but doesn’t specify which files to downloaded or where to store them so that OncodriveFML can access them. Further down on the same page is a warning that the scores file must be compatible with the reference genome being used and to “update all related parameters” but it’s not clear how to make sure the scores file is compatible with the reference genome or what those “related parameters” would be.

‌

On the “Behind the Scenes” page of the documentation, there are bgdata commands that return error messages when run from the command line:

$ bgdata datasets genomereference hg19
Usage: bgdata [OPTIONS] COMMAND [ARGS]...

Error: No such command "datasets".

Is this an indication that the bgdata package is not working the way it is supposed to? Is the OncodriveFML documentation providing bgdata commands that don’t work? Are these commands meant to be called inside of a Python shell - because they produce error messages there too…

Also on the “Behind the Scenes” page of the documentation, there are instructions to modify the code in the "oncodrivefml.signature" module without any clear explanation of what sorts of modifications need to be made. Looking at the code for the signature.py module, one can infer that simply modifying the configuration file to specify build = 'hg38' might be all that’s necessary as long as the hg38 reference genome has been downloaded.

On a related note: the version of bgdata that comes with OncodriveFML does not seem to work as the bgdata documentation says it should. Specifically the "search" command seems to allow searching in directories but not subdirectories; the command bgdata search datasets returns a list of apparent subdirectories while the command bgdata search datasets/genomereference returns nothing.

In case it matters: I am running Ubuntu 18.04 and Python 3.6.8. I have installed using pip3 and the versions of your software I have are: oncodrivefml, version 2.2.0, and bgdata, version 2.0.2

Comments (3)