Simulated breakpoint positions in genome

Issue #3 on hold
Former user created an issue

Hi, I got some questions regarding the simulated breakpoint positions for SV calls. I simulated 1200 SV calls and my code is at bottom.

  1. I found 250 of them overlapping gap regions, downloaded from ucsc table browser. There are around 5% of gap regions in human genome, why here I had 20% of SV calls overlapping gap regions? Did you see similar case before? Did I miss any parameters for the simulation?
  2. I checked all the SV calls I simulated and found no SV calls placed after 2,000,000 with different seeds. Should they spread evenly on each chromosome? Are they placed from the beginning of each chromosome, instead of placing randomly?
  3. I simulated another 1200 SV calls with –s hg38_gaps.bed. However, I found 527 of them overlapping the seg-dup regions, downloaded from ucsc table browser. Again it’s only 5-6% seg-dup regions in human genome and here I got much more simulated SV calls overlapping set-dup regions. Did you see this before?

PYTHONPATH=$SVENGINEPATH python -m mf.mutforge -n 16 -f 10000 -e 100000000 -m mySV2.meta -d ./tmp -c 10 --layout --debug \ /share/ScratchGeneral/tingon/Reference_genome/Homo_sapiens_assembly38_chromosome.fasta \ mySeq.par /share/ScratchGeneral/tingon/Reference_genome/Homo_sapiens_assembly38_chromosome.fasta

Comments (3)

  1. Charlie Xia repo owner

    Hi, thanks for using the software. Currently the random as implemented in SVEngine was not truly random. It randoms select a section where the remaining empty stretch of sections can allow the SV to be imputed CAN BE fitted. In that sense it is more likely to select head than tail. The behavior might also because of bug -_-. I will check. But will take some time. For now, one thing you can do is to randomly generate and specify the locations exactly in a var file.

    Also, can you post the meta file that you have tried. Also try to use default -f and -e values see if that helps.

  2. Tingting Gong

    Hi Charlie,

    Thank you for the reply. I will try to randomly generate some SV locations. Here is my meta file.

    DEL 200 fix_100_500_1000_2000_5000_10000 fix_1 fix_3 fix_3 fix_1.0 fix_0.5 #DEL=1DEL, deletion DUP 200 fix_100_500_1000_2000_5000_10000 fix_2 fix_3 fix_3 fix_1.0 fix_1.0 #DUP=COPYINS, duplication INV 200 fix_100_500_1000_2000_5000_10000 fix_1 fix_3 fix_3 fix_1.0 fix_1.0 #INV=1DEL + COPYINS, inversion DINS 200 fix_100_500_1000_2000_5000_10000 fix_1 fix_3 fix_3 fix_0.5 fix_1.0 #DINS=COPYINS, domestic insertion TRA 200 fix_100_500_1000_2000_5000_10000 fix_1 fix_3 fix_3 fix_1.0 fix_1.0 #TRA=1DEL + COPY*INS, transposition

    Thanks, Tingting

  3. Log in to comment