Commits

Anonymous committed c60deff

tutorial

Comments (0)

Files changed (2)

 {{{
 #!bash
 
-mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome
+mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" | tr -d "|" > hg19.genome
 
 }}}
 
 you will get out_dir/f_1.bed, out_dir/f_2.bed, out_dir/f_3.bed. This is much faster than running the bnMapper three times as the alignment file is loaded just once. 
 
 === Examples ===
+The files for the following examples are on the //bx-python/test_data/epo_tests// directory. For example, in my system I have
 
+{{{
+#!bash
+
+odenas@bxlab05:$ pwd
+/Users/odenas/src/all_bx/bx-python
+odenas@bxlab05:$ ls
+MANIFEST.in           distribute_setup.py   lib/                  setup.cfg             test_data/
+build/                distribute_setup.pyc  script_tests/         setup.py
+dist/                 doc/                  scripts/              src/
+odenas@bxlab05:$ ls test_data/epo_tests/
+epo_547_hs_mm_12way_mammals_65.chain  hpeaks.bed                            hpeaks.mapped.nopeak2.bed4
+epo_547_hs_mm_12way_mammals_65.out    hpeaks.mapped.bed12                   mm9.chrom.sizes
+hg19.chrom.sizes                      hpeaks.mapped.bed4
+odenas@bxlab05:$ 
+
+}}}
+
+We will start with the conversion of the EPO alignments (epo_547_hs_mm_12way_mammals_65.out) to the chain format.
+
+{{{
+#!bash
+ 
+out_to_chain.py --species homo_sapiens mus_musculus --chrsizes test_data/epo_tests/hg19.chrom.sizes test_data/epo_tests/mm9.chrom.sizes --output epo.HM.chain test_data/epo_tests/epo_547_hs_mm_12way_mammals_65.out
+
+}}}
+
+This command is saying, extract the alignments of //homo_sapiens// and //mus_musculus// whose chromosome sizes are //test_data/epo_tests/hg19.chrom.sizes// and //test_data/epo_tests/mm9.chrom.sizes// respectively. Build the chain file with //homo_sapiens// as the target species and write the output on //epo.HM.chain//. To test the output check that //epo.HM.chain// is identical to //test_data/epo_tests/epo_547_hs_mm_12way_mammals_65.chain// 
+
+{{{
+#!bash
+
+odenas@bxlab05:$ diff -q test_data/epo_tests/epo_547_hs_mm_12way_mammals_65.chain epo.HM.chain 
+odenas@bxlab05:$ 
+
+}}}
+
+Now we can use //epo.HM.chain// to map features from //homo_sapiens// to //mus_musculus//. If we wanted to map on the other direction we would have to produce another file, say //epo.MH.chain// like so
+
+{{{
+#!bash
+ 
+out_to_chain.py --species mus_musculus homo_sapiens --chrsizes test_data/epo_tests/mm9.chrom.sizes test_data/epo_tests/hg19.chrom.sizes --output epo.MH.chain test_data/epo_tests/epo_547_hs_mm_12way_mammals_65.out
+
+}}}
+
+To map the features on //test_data/epo_tests/hpeaks.bed// we simply type (in BED4 and BED12 respectively)
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py test_data/epo_tests/hpeaks.bed epo.HM.chain 
+INFO:bx.align._epo:parsed 5 elements from epo.HM.chain
+INFO:bx.align.epo:pckling to epo.HM.chain.pkl
+INFO:root:indexing 5 chains ...
+INFO:root:loading from test_data/epo_tests/hpeaks.bed ...
+INFO:root:transforming (5) elements ...
+chr16	85453122	85453194	peak1
+chr16	85454203	85454219	peak2
+chr16	85454227	85454258	peak2
+chr16	85454261	85454286	peak2
+chr16	88576704	88576752	peak4
+INFO:root:DONE!
+odenas@bxlab05:$ bnMapper.py -fBED12 test_data/epo_tests/hpeaks.bed epo.HM.chain 
+WARNING:bx.align.epo:loading pickled file epo.HM.chain.pkl ...
+INFO:root:indexing 5 chains ...
+INFO:root:loading from test_data/epo_tests/hpeaks.bed ...
+INFO:root:transforming (5) elements ...
+chr16	85453122	85453194	peak1	1000	+	85453122	85453194	0,0,0	1	72	0
+chr16	85454203	85454286	peak2	1000	+	85454203	85454286	0,0,0	3	16,31,25	0,24,58
+chr16	88576704	88576752	peak4	1000	+	88576704	88576752	0,0,0	1	48	0
+INFO:root:DONE!
+
+
+}}}
+
+You can supress the logging information by using the -vsilent option
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py -vsilent test_data/epo_tests/hpeaks.bed epo.HM.chain 
+chr16	85453122	85453194	peak1
+chr16	85454203	85454219	peak2
+chr16	85454227	85454258	peak2
+chr16	85454261	85454286	peak2
+chr16	88576704	88576752	peak4
+odenas@bxlab05:$ bnMapper.py -vsilent -fBED12 test_data/epo_tests/hpeaks.bed epo.HM.chain 
+chr16	85453122	85453194	peak1	1000	+	85453122	85453194	0,0,0	1	72	0
+chr16	85454203	85454286	peak2	1000	+	85454203	85454286	0,0,0	3	16,31,25	0,24,58
+chr16	88576704	88576752	peak4	1000	+	88576704	88576752	0,0,0	1	48	0
+
+}}}
+
+or you can increase it with the -vdebug option. In the picture below you can notice the original peaks on the human genome and the mapped ones on the mouse genome. The alignment coverage track indicates matchin blocks and insertions in that species. Since peak3 falls in a human insertion, it is not mapped on mouse. On the other hand, peak2 is mapped in three blocks on mouse. The two gaps correspond to two insertions in mouse, whether the middle block contains two joined parts that spanned an insertion in human.
+
+{{browser_view.png|Browser view of human and mouse peaks}}
+
+To limit the mapping in peaks that have taken only a limited amount of indels, one can use the //--gap// and //--threshold// options like so
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py -g9 -t0.7 ./test_data/epo_tests/hpeaks.bed epo.HM.chain
+WARNING:bx.align.epo:loading pickled file epo.HM.chain.pkl ...
+INFO:root:indexing 5 chains ...
+INFO:root:loading from ./test_data/epo_tests/hpeaks.bed ...
+INFO:root:transforming (5) elements ...
+chr16	85453122	85453194	peak1
+chr16	88576704	88576752	peak4
+INFO:root:DONE!
+
+}}}
+
+This is filtering out all peaks that incurr a gap (insertion in mouse) of more than 9 bases as well as peaks of which less than 70% of bases are mapped.
+

browser_view.png

Added
New image