Files changed (1)
- Ignore whitespace
+If you haven't done so, install numpy and bx-python (instructions on ??). Upon successful installation you should find, among other scripts, bnMapper.py and out_to_chain.py in the scripts directory. You can run both scripts without arguments or with the -h option to get a usage message.
+The out_to_chain.py script is a simple format converter that expects in input an EPO alignment file and converts it into the UCSC chain format (http://genome.ucsc.edu/goldenPath/help/chain.html).
+The --species option expects two arguments that should match the species denomination on the .out alignment file. The order of the species names will be reflected on the .chain file.
+The --chrsizes expects two arguments. These should be paths to chromosome sizes for the first and the second argument of the --species respectively. The files with chromosome sizes have a line per each chromosome/contig with the name of the chromosome/contig and the size in bp. separated by a space. For example, you can use the fetchChromSizes utility to get them from the UCSC database.
+The bnMapper.py script uses an alignment file in the chain format to map genomic features from the target species of the chain file to the query species of the chain file. The script is tuned for mapping relatively short features (typically chip-seq peaks), so those features that span multiple chains will be dropped silently. In the future there will be options that can control this behavior.
+The script will load the features from the input and use the alignment (in .chain) format to do the mapping //from the target to the query of the alignment file//. The alignment file is typically quite large so it takes a while to load it. To speed this up, the script will create a cached version of the alignment that is faster to load subsequently. So, after the first run
+You will find a file named //alignment.chain.pkl//. The script will warn you every time it loads a cached alignment file just as a reminder that if you re-generated the alignment file without deleting the cache this might not be up to date.
+You can choose between a BED12 and BED4 output format. The former will have a single (possibly gapped) //to-element// for each //from-element// that could be mapped. The latter will preserve names, but might have more than one //to-element// for each //from-element//.
+The --screen option will simply output the features that can be potentially mapped, without actually mapping them.
+The --threshold and the --gap options are filters. The first, one will suppress all features of which only a small fraction can be mapped. The second, will suppress all features that when mapped contain large gaps (as a consequence of insertions in the query genome).
+The --output option specifies the output location. If this is a directory, the mapper will create there an output file with the same for each corresponding input file. If you have files f_1.bed, f_2.bed, f_3.bed to map from human to mouse and run
+you will get f_1.bed, f_2.bed, f_3.bed on the out_dir directory. This is much faster than running the bnMapper three times as the alignment file is loaded just once.