Tools for getting users data (i.e. metagenome bins) onto Genome Constellation.
This repository contains the tools required to put the user genomes/bins onto the Genome Constellation browser for viewing.
MASH distances are calculated between 1) each of the users genomes and 2) between the user genomes and the reference genomes (i.e RefSeq genomes).
You need to have:
download the genome constellation repository
git clone https://email@example.com/berkeleylab/jgi-genomeconstellation.git
Optional (to recompile jgi_gc):
- gcc >=4.8
- boost development libraries with program-options
- libz development libraries
sudo apt-get install build-essential libboost-dev libboost-program-options-dev libz-dev
Build jgi_gc executable
cd src make && make install
Install conda & python via miniconda or anaconda
# download miniconda2 (only for Linux x84_64). Refer to documentation for miniconda2 installation for max os. wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh # run instalation /bin/bash ./Miniconda2-latest-Linux-x86_64.sh # follow directions to complete setup. # make sure conda is now in your PATH. # The installation script will ask you if you want to prepend PATH with the miniconda directory or if you want to export PATH manually: 1) if you choose to add the miniconda bin to your PATH in the .bashrc file then remember to source the .bashrc file (i.e "source <path_to>/.bashrc"). 2) otherwise just type "export PATH=<full_path_to>/miniconda2/bin:$PATH"
Note: Make sure you have ownership (i.e. write permissions) for your conda because the next step will install mash into conda's parent directory. Typing
which conda will tell you what conda command you are using, i.e.
<path to your installation>/miniconda2/bin/conda and installs like Mash will be installed under
add channels (places to search for "mash" besides default)
conda config --add channels r conda config --add channels bioconda
conda install --yes mash
create mash sketches
This step will calculate the distances between genomes and create a matrix; actually, a json file of the matrix called *.csv.json.
Before running anything, make sure:
1) your fasta files are in one or more directories. The names of the directories will be used
as the label in Constellation, so if you have one called arctic and another called antarctic, then in the Constellation browser,
you will be able to select different colors for the two sets, i.e both data sets will be discernable.
2) the suffix of your fasta files need to be the same for all files, even if you have more than one directory. You can specify on the command line if you have [fasta|fa|fna|etc]. ".fa" is default.
<path_to_repository>/tools/sketching/generate_mash_sketches.sh [options] <fasta_dir> [<fasta_dir>] Options (defaults are shown in square brackets): <-s suffix regex for your fasta files [*.fa]> <-r path to reference pre-computed sketches [<repository>/tools/sketching/REFERENCE_10K.msh]> <-p threads >
Note the default path to the reference sketch is constellation_mash/tools/sketching/REFERENCE_10K.msh. In theory, you can make your own reference sketch and include it with the "-r" flag; however, for now, we didn't include any script to generate the mash sketches for user references. Please contact Zhong Wang firstname.lastname@example.org if you desire such an option.
The default is set to using minFrac, where minFrac = 1024 and numBits = 131072. This is the 16 KB condensed fingerprint. If you want to use the full fingerprint, set minFrac = 1 and numBits = 1073741824 (or specify desired numBits).
To run using test bins run the following, assuming you are in the constellation_mash dir
tools/sketching/generate_mash_sketches.sh -s *.fa misc/test_data/fastas
Running with options different than default. Note that at this time, we don't supply code for you to generate your own reference mash fingerprints so you need to use default. In other words, you don't need to include the "-r" flag since it is set to a default path.
constellation_mash/tools/sketching/generate_mash_sketches.sh \ -s *.fasta \ -p 32 \ arcticData antarcticData
A file will be generated called: "sketch.csv.json". You will want to upload this to the Genome Constellation browser.
Upload / Use your data to Constellation
Run locally in a browser
Now that your data has been calculated and installed under www/data as the files:
You can simply point your browser to this file path and run Genome Constellation: file:///FULL_PATH/jgi-genomeconstellations/www/index.html
Alternatively you can start a python web server locally:
You need to now upload your data to a browser that is running Genome Constellation.
Start a python server for local port.
1) in your genome-constellation repository, cd into the
constellation_mash/www (where the index.html located)
2) if using version 2 of python, then do:
python -m SimpleHTTPServer 8989 and for python3:
python -m http.server 8989
3) open a browser on the same machine where you are running the python server and type in "localhost:8989"
4) In the browser that should now be showing the reference genomes in Genome Constellation. Wait until all the links have loaded (see progress bar at top. Now you can load your user data by going to the "import" tab in lower right. Choose where you saved your *.csv.json file.