1. SciLifeLab Bioinformatics LTS
  2. Bioinformatics LTS
  3. lts-workflows-sm-scrnaseq
  4. Pull requests

Changed Docker images and added QC and filtering

Merged
#33 · Created  · Last updated

Merged pull request

Merged in develop (pull request #33)

  • 9ab109f
  • Author:
  • Closed by:
  • 2018-04-13

Description

Various PRs to develop that together probably justify a new release. I have:

 

  • Changed a few rules to be more explicit about which files they generate and clean up better afterwards. Also fixed some rules that could give conflicts when run in parallel.

  • Changed to use gtf2bed from ea-utils to generate bed12 files. The original solution took very long, pulled in a lot of dependencies, and the ranges were offset by 1 bp.

  • Rewritten the Docker images used for testing and running the workflow to use Ubuntu as base image rather than RStudio. This was partly to have better control and partly to only have one R installation (using Conda instead). The old images had also degenerated due to various bugs in Conda packages and Conda itself. This has the drawback that LaTeX isn't included anymore so rendering to PDF isn't possible. It could be added, but would significantly increase the size of the images.

  • Added so that MultiQC can be run per patch by setting for example multiqc: split_by_batch: "PU". This will generate one MultiQC report per plate if that is what the column id "PU" is used for in the sample file. The main reason is that MultiQC can't display very many samples well. Even if the parameter is non-empty, you can still get a report for all samples by targeting `multiqc/multiqc_report.all.html`, e.g. to use as input to the QC scripts.

  • Changed so that the count matrix is in wide format. The previous implementation used a long format that took a forever and generated a very large file when run for many cells.

  • Added the QC and filtering scripts developed by Åsa. Target the rule qc_and_filteringto generate the report. Could be added to all if that makes sense.

  • Added all required R packages to conda-forge or bioconda so that they could be included as dependencies.

  • Added the option to also generate a MultiQC report for only the cells that passed filtering.

 

Some input needed

@percyfal Does this look ok? Do you have any pointers when it comes to making a new release?

@asbj Are you fine with your scripts being included like this? I haven't changed anything in them. I was thinking it might be good to rewrite the QC report a little so that it's more a report about the whole workflow, not just the QC and filtering. Include mapping settings and so on.

I'm spamming you a little here. Is anyone but me using the workflow at the moment?

0 attachments

Loading commits...