# Pancancer DNA methylation trackhub

The trackhub renders the DNA methylation profiles obtained from the TCGA Research Network
as scrutinized by the Illumina Infinium 450k array.

For each cohort (i.e. colon primary tumors), the samples's methylation distribution is
segmented into five bins (tracks) according to their beta values. Namely, beta values below 0.2
indicate the lowest DNA methylation whereas values over 0.8 depict almost full DNA methylation.
Color intensity in each track represents the proportion of samples matching the methylation
status (the darker, the higher). The stacked five tracks can be thought of as a density plot
of the cohort methylation status.

A track depicting differential methylation is available for cohorts with both primary tumors
and adjacent normal data. Significant probes show an absolute increment of methylation beta
value over 0.2 and under a Wilcoxon test p-value cut-off of 0.001. Gray depicts no detectable
differences, red tumor hypermethylations and blue tumor hypomethylations.

The trackhub can be accessed at .

Developed by Alberto Sierco and Izaskun Mallona.

## Technical details

TCGA's DNA methylation data (level 3, already called beta values)
of over 450000 genomic locations as measured by the Illumina's Infinium array
was downloaded using TCGA-Assembler v1.0.3 and stored under a postgres relational db.

Given the relatively large amount data, dealing with approximately 450 thousands CpGs
for each track (5 tracks are needed for each dataset) across 50 datatypes,
plus the tracks relative to statistical tests, we transformed the plain text tracks into
indexed bigBeds, which speed up data retrieval and compact the final file size.
BigBeds allow to display discrete genome regions without transferring the full file to
the UCSC genome browser, which considerably increases visualization speed. Coordinate-based
data (bed and bigBed files) were handled with bedtools v2.25.0 and Kent's UCSC utils v287.

In order to serve the resulting data hub, we uploaded the files to an apache Web server
and added an explicit Hub Track Database Definition (trackDb.txt) which grouped
the data layers in a hierarchical manner. To do so, we annotated the data components as
stanzas pointing to each bigBed and declaring their attributes. For instance, the bigBed
storing the abundance of almost full methylation in adrenocortical carcinoma was described
as follows

track acc_tumor5
  bigDataUrl http://<server_path>/
  shortLabel a acc_tumor 1.0
  longLabel acc_tumor tissue patient methylation between 0.8 and 1.0
  parent acc_tumor on
  visibility dense
  spectrum on
  scoreMin 0
  scoreMax 1000
  type bigBed 5 +

indicating the URL the file is hosted in (the wildcard <server_path>, without the chevrons,
must be substituted by the address or IP of the Web server), its annotation (short and long
labels), its level within the hierarchy as child of the acc_tumor (being the latter child
of the acc supertrack) and details on its renderization (visibility, score range) and data type.

Finally, we set up the Web server hosting the trackhub so when accessing to the URL} the traffic is redirected to the UCSC genome browser
plus the hub configuration file location.

## Folders and tags

- src folder contains the scripts to build the trackhub
 (requires a populated postgres db with the methylation data)
- wanderer archive tag includes side projects such as the integration
 with a gene-free version of TCGA Wanderer (
 - trackhub foder contains the source (hub.txt, trackDb.txt and so on; as well the bigBeds
  to deploy the trackhub)

## Acknowledgements

The trackhub published here is based upon data generated by the TCGA Research Network:

## Contact,