Intercepting blocks generated for plots

Create issue
Issue #5 new
Marc Robert de Massy created an issue


Sequenza is part of our pipeline, we are very happy with it ! I am in charge of implementing a phylogenetic reconstruction tool that would use Sequenza output. When looking at the segments.txt file generated by Sequenza, il looks much nosier than the red signal plotted in the chromosome plots. I really need to reduce this .txt file to a smaller, smoother one, merging blocks together. It looks like you are already implementing this in order to generate the plots, could it be possible to intercept the data that is actually used in the plots?

Thank you very much in advance, Marc

Comments (5)

  1. Francesco Favero

    Hi Marc,

    Thanks you for opening this issue, this is one of the crucial flows in the segmentation algorithm/process. I'm actually working on several aspects of handling the segmentation, I had some discrete results (not perfect) by cluster the segments (non-parametric clutering) and merge the adjacent segments that are in the same cluster. But it need a lot more work to be reliable, and I don't expect to insert it in the next release (I'm also considering changing the segmentation algorithm). However you could post-process the segmentation results (after sequenza.extract), I can help you with an example if you want. Do you use multi-samples sequencing?

  2. Marc Robert de Massy reporter

    Hi Francesco,

    Thank you so much for your very quick response. As you can see in the attached files, the signal is very noisy even if distinct blocks appear quite naturally from the plot perspective. For instance on chromosome 1, it seems like we could easily merge the 539 segments called by sequenza into 8 seperate events or even less. How would you proceed with this data (cellularity = 40%) ? Yes we have access to multiple tumour samples for each patient, different tumours though (primary vs metastasis), I guess that is not what you had in mind.

    Thanks you very much for your help, Marc

  3. Francesco Favero

    Hi Marc,

    I've play around with your segments file, and here is what I would do (attached files cluster_segs.*). I know it's not perfect, it reduces the segments number only by 3/400, but it's a start.

    There are also other valid methods, so this is just an example. If you want to try, I suggest to use the segments from the sequenza.extract step (they are separated by chromosomes, so you need to use rbind or similar to obtain a dataframe with all the segments).



  4. Log in to comment