Wiki
Clone wikipurity / SubpopulationSpecificMarkers
SubpopSpecificMarkers
SubpopSpecificMarkers identifies a subset of markers that will try to maximize differences among groups of samples and minimize difference within groups. Useful for selecting markers to discriminate subpopulations.
Requirements
Need Java 13 or newer. Check if you have Java, otherwise you can install it from here.
Installation
Through git clone
git clone https://jcignacio@bitbucket.org/jcignacio/purity.git
Then go to the directory
purity/builds/SubpopSpecificMarkers_201201
OR
Download it from SubpopSpecificMarkers_201201.zip then extract.
Usage
java -jar ssm.jar <No. of target markers> <Population size> <Distance bet. dupe samples> <Input HapMap file> <Output file> <Input grouping file>
Example: java -jar ssm.jar 15 10000 0.0 rice_sp.hmp.txt out.txt groups.txt
Parameters | Description |
---|---|
Number of target markers, N (integer) | number of markers to select |
Popoulation size (integer) | number of solutions to consider at a time (better results when higher but uses more RAM) |
Distance bet. dup samples (decimal, 0 to 1) | genetic distance threshold for considering duplicated samples, set to 0 for exact match. Set higher for more polymorphic markers between samples. |
Input hmp.txt (file) | hapmap file where to pick markers from |
Output file (file) | output file, e.g. out.txt |
Input grouping file | input file with grouping of samples formatted as a tab-delimited TASSEL trait file. Group has to be numerical, strings are not supported yet. |
Example of a grouping file with 6 samples and 3 groups.
<Phenotype> taxa factor taxa group sample1 1 sample2 1 sample3 2 sample4 3 sample5 1 sample6 3
Results and outputs | Description |
---|---|
Score ([m1,m2,..mN --> x]) | x = s1 + s2 * 2, lower score means better discrimination of groups |
s1 | "Same genotype with different group", which is the sum of # of samples with genotype i * (# of groups where the samples with genotype i belong to - 1) as i goes to n unique genotypes of duplicated samples |
s2 | "Same group with different genotype", which is the sum of [(# of unique genotypes in group i) - 1] as i goes to n groups |
Text file | contains some information on the selected markers |
Distance matrix (csv) | comma-separated distance matrix generated from the selected markers |
Hapmap file | subset of the input hapmap file containing the selected markers only |
Updated