HTTPS SSH

Chromothripsis H/T Alternating Fraction

This code repository provides an implementation of the H/T Alternating Fraction measure of chromothripsis.

Support

For support using this code please contact Dr. Layla Oesper at loesper@carleton.edu.

Dependencies

This code repository is written in Python. It has the following dependencies (version in parenthesis is latest tested version):

References

This approach is described in the following paper (please cite if you use this code):

Layla Oesper, Simone Dantas and Benjamin J. Raphael. Inferring Simultaneous Rearrangments in Cancer Genomes. (In Submission). [Pre-print]

Usage

Data Preprocessing

Before computing the H/T Alternating Fraction you need to pre-process your dataset(s) into a specified format. In particular, for each sample/set of adjacencies, create a text file containing a line for each measured adjacency. Each measured adjacency is defined by the following 6 tab-delimited attributes:

  1. Chrm1 (int): the chromosome of the first breakpoint.
  2. Pos1 (int): the bp of the first breakpoint.
  3. Strand1 (0 or 1): the orientation of the first breakpoint.
    • 0 indicates connects to the + strand
    • 1 indicates connects to the - strand.
  4. Chrm2 (int): the chromosome of the first breakpoint.
  5. Pos2 (int): the bp of the first breakpoint.
  6. Strand2 (0 or 1): the orientation of the first breakpoint.
    • 0 indicates connects to the + strand
    • 1 indicates connects to the - strand.

The two breakpoints should be listed in order according to their position in the genome. Below is an example of an input file where the last adjacency listed is a deletion on chromosome 20.

#Chrm1  Pos1  Strand1 Chrm2 Pos2  Strand2
1       10947080        0       20      40856274        1       11
1       10947195        1       1       227527795       0       5
1       10950450        0       20      40856539        0       10
20      40841779        0       20      40856545        1       5

One these files are created for every sample under consideration a file should be created that lists the name of each such file on a new line.

Computing H/T Alternating fraction

The provided script compute_AF.py allows a user to compute the H/T alternating fraction for a set of samples whose data is in the previously defined format. This program takes the following arguments as input:

python3 compute_AF.py [-d|--DATA_DIR <directory>] [-l|--DATA_LIST <file>] [-o|--OUT_FILE <file>]
    [-s|--SUFFIX <suffix>]

where

Argument Required (Default) Description
--DATA_DIR/-d True Path to directory containing all data files to analyze.
--DATA_LIST/-l True Path to file containing the name of each data file (just the name not the full path) to analyze.
--OUT_FILE/-o False (./AllResults.txt) File used to save the computed H/T Alternating Fraction for all samples.
--SUFFIX/-s False ("") Optional suffix (e.g. ".txt") appended to all data files before trying to open them. Only used when the appropriate suffix is not included in the DATA_LIST file.

Example

We provide in this repository the processed input data from 154 sets of adjacencies previously classified as either one-off (chromothripsis) or step-wise by Malhotra et al., Genome Research, 2012. To compute the HT alternating fraction across all of these samples, run the following command:

python3 -d Malhotra_data/ -l Malhotra_data/AllChains.txt

This will create the file AllResults.txt in your current directory and will contain the computed AF(C) value for all 154 chains of rearrangements. To see the type of rearrangment originally called by Malhotra et al., see the provided file Malhotra_data/AllChainsAndTypes.txt.