Wiki

Clone wiki

GenomeFisher / Home

The Genome Fisher Wiki

GF_Logo

Genome Fisher: a utility for molecular marker discovery (MMD) from genomics datasets

please note: this Wiki is a work in progress. Tutorials and training materials will continue to make their way here. Thanks for your patience and do enjoy the genome fishing!

Questions? please e-mail: eduardo.taboada@phac-aspc.gc.ca

What is Genome Fisher?

Genome Fisher (GF) is a utility for identifying statistically significant molecular markers that are differentially distributed across different groups of samples. The program uses Fisher's Exact Test, hence its clever name.

Although originally developed for analyzing gene presence/absence biases from Microarray Comparative Genomic Hybridization (MCGH) data, Genome Fisher can be used with other types of data provided that the underlying data can be binarized.

What kinds of data can I analyze with Genome Fisher?

Thankfully, most molecular data is easy to binarize through thresholding or splining, making Genome Fisher a versatile data mining tool.

Data_binarization

Some examples of binarized data types that we have analyzed using Genome Fisher are:

  • Microarray data (obviously!)
  • Tabular sequence identity data from BLAST searches
  • PCR band presence/absence patterns
  • Phenotypic microarray data (i.e. Biolog data)
  • qPCR data
  • Mass Spec data
  • MIC data from AMR testing

What about SNPs?

Although DNA sequence has 5 character states (A, G, C, T, -), data can also be binarized by deconstructing each sequence position into the five individual sub-components, one for each character state, each of which can act as binary character state data.

SNP_binarization

Getting Started

To run Genome Fisher, you will need:

Running an analysis

  • Create:

    • Binary.txt file, a tab-delimited file containing the binary data for the dataset
    • Info.txt file, a tab-delimited file containing the metadata for the dataset (note: see the example dataset above for file formats)
  • Load Binary.txt and Info.txt files into Genome Fisher by dragging and dropping into the main Genome Fisher window 01_Drag_Drop

  • The data will take a few seconds to load: your patience will be rewarded.

Here is a brief guide to the Genome Fisher interface:

02_GF_interface

  1. Metadata display window: each category and subcategory of data present in the metadata is displayed, along with sample counts. note: the selection of multiple subcategories within a single category has OR boolean behaviour; selection of subcategories across multiple categories has AND boolean behaviour.
  2. Group item selection window: once subcategories have been selected in (1), this window displays the selected fields and the number of strains represented by this selection. The number of strains is based on the boolean operations described above. e.g. country:"Canada" (n=393) AND year:"2000" (n=702) --> n=80
  3. Group collection window: Collections of strain groupings that are created go here. e.g. Under the "country" collection there are 16 strain groups.

  4. Groups window: Selecting a group collection in (3) displays the individual groups in the collection here. This is also the window where statistical comparisons between groups are launched. e.g. the 16 strains groups under the "country" collection are shown here.

  5. FET comparisons window: Any FET statistical test launched from (4) will be shown here. Brief summary statistics are provided to identify any comparisons yielding statistically significant results.

  6. FET results window: Selecting a particular test in (5) will display the FET statistics for each marker in this panel. Markers with a p-value below a user-defined threshold are highlighed in green, non-significant results are shown in a lovely shade of dusty pink.

Updated