Wiki

The Genome Fisher Wiki

GF_Logo

Genome Fisher: a utility for molecular marker discovery (MMD) from genomics datasets

A quick presentation on MMD using Genome Fisher

please note: this Wiki is a work in progress. Tutorials and training materials will continue to make their way here. Thanks for your patience and do enjoy the genome fishing!

Questions? please e-mail: eduardo.taboada@phac-aspc.gc.ca

What is Genome Fisher?

Genome Fisher (GF) is a utility for identifying statistically significant molecular markers that are differentially distributed across different groups of samples. The program uses Fisher's Exact Test, hence its clever name.

Although originally developed for analyzing gene presence/absence biases from Microarray Comparative Genomic Hybridization (MCGH) data, Genome Fisher can be used with other types of data provided that the underlying data can be binarized.

What kinds of data can I analyze with Genome Fisher?

Thankfully, most molecular data is easy to binarize through thresholding or splining, making Genome Fisher a versatile data mining tool.

Data_binarization

Some examples of binarized data types that we have analyzed using Genome Fisher are:

Microarray data (obviously!)
Tabular sequence identity data from BLAST searches
PCR band presence/absence patterns
Phenotypic microarray data (i.e. Biolog data)
qPCR data
Mass Spec data
MIC data from AMR testing

What about SNPs?

Although DNA sequence has 5 character states (A, G, C, T, -), data can also be binarized by deconstructing each sequence position into the five individual sub-components, one for each character state, each of which can act as binary character state data.

SNP_binarization

Getting Started

To run Genome Fisher, you will need:

A Windows box (Windows XP or higher supported) running the .NET Framework 4.0 or above
- Download the latest .NET Framework
- Follow installation procedure
The Genome Fisher executable (32-bit or 64-bit)
- Download the Genome Fisher executable (64-bit) (commit: 75fbe90)
- Download the Genome Fisher executable (32-bit) (commit: 75fbe90)
- Extract into desired directory
- Double click on GenomeFisher.NET.exe (note: 64-bit version preferred if you wantto analyze large datasets)
Some data!!!
- Download an example dataset: Campylobacter MLST dataset
- Extract into Genome Fisher directory

Running an analysis

Create:
- Binary.txt file, a tab-delimited file containing the binary data for the dataset
- Info.txt file, a tab-delimited file containing the metadata for the dataset (note: see the example dataset above for file formats)
Load Binary.txt and Info.txt files into Genome Fisher by dragging and dropping into the main Genome Fisher window
The data will take a few seconds to load: your patience will be rewarded.

Here is a brief guide to the Genome Fisher interface:

02_GF_interface

Metadata display window: each category and subcategory of data present in the metadata is displayed, along with sample counts. note: the selection of multiple subcategories within a single category has OR boolean behaviour; selection of subcategories across multiple categories has AND boolean behaviour.
Group item selection window: once subcategories have been selected in (1), this window displays the selected fields and the number of strains represented by this selection. The number of strains is based on the boolean operations described above. e.g. country:"Canada" (n=393) AND year:"2000" (n=702) --> n=80
Group collection window: Collections of strain groupings that are created go here. e.g. Under the "country" collection there are 16 strain groups.
Groups window: Selecting a group collection in (3) displays the individual groups in the collection here. This is also the window where statistical comparisons between groups are launched. e.g. the 16 strains groups under the "country" collection are shown here.
FET comparisons window: Any FET statistical test launched from (4) will be shown here. Brief summary statistics are provided to identify any comparisons yielding statistically significant results.
FET results window: Selecting a particular test in (5) will display the FET statistics for each marker in this panel. Markers with a p-value below a user-defined threshold are highlighed in green, non-significant results are shown in a lovely shade of dusty pink.