olgert denas avatar olgert denas committed 13da58a Draft

Overview and options

Comments (0)

Files changed (1)

+== Documentation ==
+
+This page describes how to use bnMapper for mapping genomic features.
+
+=== Setup ===
+If you haven't done so, install numpy and bx-python (instructions on ??). Upon successful  installation you should find, among other scripts, bnMapper.py and out_to_chain.py in the scripts directory. You can run both scripts without arguments or with the -h option to get a usage message. 
+
+=== Out to Chain coversion ===
+The out_to_chain.py script is a simple format converter that expects in input an EPO alignment file and converts it into the UCSC chain format (http://genome.ucsc.edu/goldenPath/help/chain.html). 
+
+If you run the script with th -h option you should get
+{{{
+#!bash
+
+odenas@bxlab05:$ out_to_chain.py -h
+usage: out_to_chain.py [-h] [--species SPECIES SPECIES] --chrsizes CHRSIZES
+                       CHRSIZES [-o FILE]
+                       input
+
+EPO alignments (.out) to .chain converter.
+
+positional arguments:
+  input                 File to process
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --species SPECIES SPECIES
+                        Names of target and query species (respectively) in
+                        the alignment (default: ['homo_sapiens',
+                        'mus_musculus'])
+  --chrsizes CHRSIZES CHRSIZES
+                        Chromosome sizes for the given species. (default:
+                        None)
+  -o FILE, --output FILE
+                        Output file (default: stdout)
+
+
+}}}
+
+The --species option expects two arguments that should match the species denomination on the .out alignment file. The order of the species names will be reflected on the .chain file.
+
+The --chrsizes expects two arguments. These should be paths to chromosome sizes for the first and the second argument of the --species respectively. The files with chromosome sizes have a line per each chromosome/contig with the name of the chromosome/contig and the size in bp. separated by a space. For example, you can use the fetchChromSizes utility to get them from the UCSC database.
+
+The output defaults on the standard output, can be specified with the -o option.
+
+=== Feature mapping ===
+
+The bnMapper.py script uses an alignment file in the chain format to map genomic features from the target species of the chain file to the query species of the chain file. The script is tuned for mapping relatively short features (typically chip-seq peaks), so those features that span multiple chains will be dropped silently. In the future there will be options that can control this behavior.
+
+If you run the script with the -h option you should get
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py -h
+usage: bnMapper.py [-h] [-f {BED4,BED12}] [-o FILE] [-t FLOAT] [-s] [-g GAP]
+                   [-v {info,debug,silent}]
+                   input [input ...] alignment
+
+Map features from the target species to the query species of a chain alignment
+file. This is intended for mapping relatively short features such as Chip-Seq
+peaks on TF binding events. Features that get mapped on different chromosomes
+or that span multiple chains are silently filtered out.
+
+positional arguments:
+  input                 Input to process. If more than a file is specified,
+                        all files will be mapped and placed on --output, which
+                        should be a directory.
+  alignment             Alignment file (.chain or .pkl)
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -f {BED4,BED12}, --format {BED4,BED12}
+                        Output format. (default: BED4)
+  -o FILE, --output FILE
+                        Output file. Mandatory if more than one file in input.
+                        (default: stdout)
+  -t FLOAT, --threshold FLOAT
+                        Mapping threshold i.e., |elem| * threshold <=
+                        |mapped_elem| (default: 0.0)
+  -s, --screen          Only report elements in the alignment (without
+                        mapping). -t has not effect here (TODO) (default:
+                        False)
+  -g GAP, --gap GAP     Ignore elements with an insertion/deletion of this or
+                        bigger size. (default: -1)
+  -v {info,debug,silent}, --verbose {info,debug,silent}
+                        Verbosity level (default: info)
+
+}}}
+
+The script will load the features from the input and use the alignment (in .chain) format to do the mapping //from the target to the query of the alignment file//. The alignment file is typically quite large so it takes a while to load it. To speed this up, the script will create a cached version of the alignment that is faster to load subsequently. So, after the first run 
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py from_features.bed alignment.chain -o to_features.bed
+
+}}}
+
+You will find a file named //alignment.chain.pkl//. The script will warn you every time it loads a cached alignment file just as a reminder that if you re-generated the alignment file without deleting the cache this might not be up to date.
+
+You can choose between a BED12 and BED4 output format. The former will have a single (possibly gapped) //to-element// for each //from-element// that could be mapped. The latter will preserve names, but might have more than one //to-element// for each //from-element//.
+
+The --screen option will simply output the features that can be potentially mapped, without actually mapping them. 
+
+The --threshold and the --gap options are filters. The first, one will suppress all features of which only a small fraction can be mapped. The second, will suppress all features that when mapped contain large gaps (as a consequence of insertions in the query genome).
+
+The --output option specifies the output location. If this is a directory, the mapper will create there an output file with the same for each corresponding input file. If you have files f_1.bed, f_2.bed, f_3.bed to map from human to mouse and run
+
+{{{
+#!bash
+
+odenas@bxlab05:$ bnMapper.py f_1.bed f_2.bed f_3.bed human_mouse_alignment.chain -o out_dir
+
+}}}
+
+you will get f_1.bed, f_2.bed, f_3.bed on the out_dir directory. This is much faster than running the bnMapper three times as the alignment file is loaded just once. 
+
+
+=== Examples ===
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.