* weighting, etc. Afterward, it performs a weighted integration of coexpressions
* using the computed dataset weights.
* The search algorithms employed by Seek are designed to be quick and efficient, and
- * they support the
on-the-fly weight calculations for thousands of microarray
+ * they support the weight calculations for thousands of microarray
* \section sec_usage Usage
- * \
subsubsection ssec_usage_basic Basic Usage
+ * \subsection ssec_usage_basic Basic Usage
* SeekMiner -x <dset_platform_map> -i <gene_map> -q <query> -P <platform_dir> -p <prep_dir> -n <num_db>
* Users can select between Pearson correlations (\c -z \c pearson) or z-scores of Pearson (\c -z \c z_score).
* Z-scores is the recommended choice because it normalizes the correlation distribution to a standard normal
* distribution that can be compared across datasets. In addition, SeekMiner provides the following
- * transformations on z-scores:
+ * transformations on z-scores:
- * \li \c --score_cutoff. Cut-off Z-scores at a specified value.
- * \li \c --norm_subavg. Subtracts each gene's average z-score to reduce the influence of hubby genes.
+ * \li \c --score_cutoff. Cuts off z-scores at a specified value. Z-scores that fall below the cut-off are assigned zero.
+ * \li \c --norm_subavg. Subtracts each gene's average z-score to prevent highly connected genes from influencing the z-score of a gene pair
* \li \c --norm_subavg_plat. Normalizes z-score by subtracting the average across the platform and dividing by its standard deviation.
Deals with potential platform biases.
+ * potential platform biases.
* \li \c --square_z. Squaring the z-score to further boost highly correlated gene-pairs.
* It is highly recommended to enable \c --norm_subavg.
* \subsubsection sec_search Search Datasets
* Users may also define the datasets that they wish to use for integrations in a query-specific way, using \c -D argument.
- * If this argument is absent, all datasets in the compendium will be integrated. If \c -D is used, the search datasets
- * must be selected from the available datasets defined in \c dset_platform_map.
+ * If this argument is absent, all datasets in the compendium will be integrated.
+ * If \c -D is used, the search datasets must be selected from the available
+ * datasets defined in \c dset_platform_map.
* \subsubsection sec_files Query-independent search setting files and directories
* The names of the genes must be selected from the genes in the \c gene_map.
* The maximum length of the query depends on the amount of available memory in the system.
* It is recommended to keep each query less than 100 genes.
- * This file defines the list of datasets to be used for integration. It can be query-specific.
+ * This file defines the list of datasets to be used for the query coexpression search.
+ * The file is defined in a query specific way.
* An example is provided below:
* GSE15913.GPL570.pcl GSE16122.GPL2005.pcl GSE16836.GPL570.pcl ...
* GSE14933.GPL570.pcl GSE15162.GPL2005.pcl GSE15566.GPL570.pcl ...
- * where each line, corresponding to a query, is a space-separated dataset list.
+ * where each line, corresponding to a query, is a space-separated dataset list for the query.
* The dataset names must be selected from the file \c dset_platform_map.
* Directory that contains the following 3 files:
* \li \c all_platforms.gplatavg. the platform average z-scores
* \li \c all_platforms.gplatstdev. the platform z-score standard deviation
- * \li \c all_platforms.gplatorder. The order of platforms
+ * \li \c all_platforms.gplatorder. the order of platforms
* These binary files are generated by SeekPrep. The specification of this directory is
* necessary for \c --norm_subavg_plat.
* Directory that contains the gene presence files and the gene average files:
* \li Gene presence (GPRES files): indicates the presence/absence of genes in a dataset
* \li Gene average (GAVG files): indicates the average z-score of each gene in a dataset
* There should be one pair of these files for <b>every</b> dataset that is specified
- * in \c dset_platform_map.
Also generated by SeekPrep.
+ * in \c dset_platform_map. enerated by SeekPrep.
- * The quant file specifies how the z-scores are binned. This is necessary for properly reading
+ * The quant file specifies how the z-scores are binned. This is necessary for properly reading
* the z-scores, because the z-scores are stored as binned values on disk. This quant file is used
- * to convert them back to z-scores. Currently, the maximum number of bins supported is 255.
- * An snapshot of the quant file is below:
+ * to convert them back to z-scores when they are read from disk.
+ * Currently, the maximum number of bins supported is 255.
+ * A snapshot of the \c quant file is below:
* -5.00 -4.96 -4.92 -4.88 -4.84 -4.80 -4.76 -4.72 -4.68 -4.64 -4.60 -4.56 -4.52 ...
* Directory that will contain the search results.
+ * Directory that contains the SINFO files, which list a dataset's average z-score between all pairs of genes
+ * and the standard deviation. If this directory is provided, there should be one SINFO file for <b>
+ * every</b> dataset in \c dset_platform_map. Generated by SeekPrep.
* \subsection ssec_usage_detailed Detailed Usage