== RepeatExplorer ==

RepeatExplorer is a web-based computational pipeline for discovery and characterization
of repetitive sequences in eukaryotic genomes. The pipeline uses shotgun high-throughput
genome sequencing data and does not require assembled genome. RepeatExplorer was
implemented under Galaxy environment. To see RepeatExplorer in action visit our Galaxy server at RepeatExplorer manual with the installation instruction can be 
found at

=== Licence ===

Copyright (c) 2012 Petr Novak (, Jiri Macas and Pavel Neumann,
Laboratory of Molecular Cytogenetics(
Institute of Plant Molecular Biology, Biology Centre AS CR, Ceske Budejovice, Czech Republic

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program.  If not, see <>.

=== How to install ===

To install to Galaxy server consult help pages at
To use command line version of clustering and reclustering, all dependencies must be installed and file must be correctly set to specify path to directories with executables.
To run clustering use Use -h for help
To test installation - you can run scripts in tests/ directory

=== Dependencies ====

These dependencies are assumed to have executables in path:

R (v >= 2.14 including packages:
foreach, igraph, getopt, R2HTML, lattice, doMC, multicore, ape and Biostrings (available from
Perl and BioPerl  (core)
Python v. >=2.6 
NCBI Basic Local Alignment Search Tool version 2.2.xx
Muscle	(not necessary for clustering
fasty36 (not necessary for clustering)

Included dependencies, does not require setting for command line version:

GNU parallel (included )
Louvain clustering  - now provided with RepeatExplorer, must be compiled from source, see 'louvain' directory
TGICL - copy of tgicl was obtained from (newer version does not work with repatexplorer!)

Paths to below dependencies have to be specified in
if RepeatMasker is not in path, RepeatMasker directory must be specified explicitly in 
Conserved domain database (only necessary when rpsblast search is included)

=== how to use RepeatExplorer on ===
Currently RepeatExplorer is available as module, to use it, type:
  module add repeatexplorer -h

If you wish to use your own installation:
- get copy of RepeatExplorer from bitbucket repository - and unpack,
in repeaexplorer/louvain type 'make' to compile clustering executables

- download legacy blast (File:blast-2.2.26-x64-linux.tar.gz) from, unpack it to repeatexplorer directory

- for configuration use - in repeatexplorer directory type:

- before clustering run:
	module add R-2.14.0 python-2.6.2 bioperl-1.6.1 repeatmasker
  Note- to be able to use repeatmasker, you have to confirm repeatmasker licence agreement ( ). repeatmasker executable should be in your path.

- to test you configuration run scripts in tests/ directory:
	./ run clustering and reclustering without repeatmasker search 
	./ include viridiplantae repeatmasker database
	./  uses only one processor
	./  these tests takes couple hours, include comparative analysis
  outputs from test scripts are located in test_data/test_dir/runx, check also log files in the same directory

- if tests finished without error you can run clustering using script 
  for usage type:
    ./ -h

resources requirements:
reserve at least 8 cpu with 16gb of RAM and select 'long queue' - job needs several day to finish ( qsub -l:nodes=1:ppn=8:mem=16gb -q long). It is probable however that with the real need of RAM will be bigger - this depends on genome so it could be good idea to reserve 32 GB but specify only 16 GB in 

If you want to use Conserved domain database search - download database and set appropriate location of database files in file