Qian Zhu avatar Qian Zhu committed 918b587

Added SeekServer usage doc

Comments (0)

Files changed (3)

src/seeknetwork.h

  * client and the Seek server. In order to allow this exchange to occur, all messages
  * must conform to a uniform standard.
  *
+ * \section sec_out Outgoing messages
+ *
  * On the sending end, all outgoing messages must first begin with a message header that specifies the
  * length and the type of the message. Then the body of the message follows.
  *
  * otherwise the size of the array)
  * \li Byte #9 and onward: \a S times \a N bytes specifying the array content
  *
+ * \section sec_in Incoming messages
+ *
  * On the receiving end, CSeekNetwork also supports the receiving of a \c char array (or a \c string) or a \c float array.
  *
  * In order to be properly recognized, the incoming message should be structured as follows:

tools/SeekMiner/stdafx.cpp

  * weighting, etc. Afterward, it performs a weighted integration of coexpressions
  * using the computed dataset weights.
  * The search algorithms employed by Seek are designed to be quick and efficient, and
- * they support the on-the-fly weight calculations for thousands of microarray 
+ * they support the real-time weight calculations for thousands of microarray
  * datasets.
  *
  * \section sec_usage Usage
  * 
- * \subsubsection ssec_usage_basic Basic Usage
+ * \subsection ssec_usage_basic Basic Usage
  * 
  * \code
  * SeekMiner -x <dset_platform_map> -i <gene_map> -q <query> -P <platform_dir> -p <prep_dir> -n <num_db>
  * Users can select between Pearson correlations (\c -z \c pearson) or z-scores of Pearson (\c -z \c z_score).
  * Z-scores is the recommended choice because it normalizes the correlation distribution to a standard normal
  * distribution that can be compared across datasets. In addition, SeekMiner provides the following
- * transformations on z-scores:
+ * transformations on z-scores to allow further boosting of signals:
  *
- * \li \c --score_cutoff. Cut-off Z-scores at a specified value.
- * \li \c --norm_subavg. Subtracts each gene's average z-score to reduce the influence of hubby genes.
+ * \li \c --score_cutoff. Cuts off z-scores at a specified value. Z-scores that fall below the cut-off are assigned zero.
+ * \li \c --norm_subavg. Subtracts each gene's average z-score to prevent highly connected genes from influencing the z-score of a gene pair
  * \li \c --norm_subavg_plat. Normalizes z-score by subtracting the average across the platform and dividing by its standard deviation.
- * Deals with potential platform biases.
+ * This is designed to handle potential platform biases on the z-scores.
  * \li \c --square_z. Squaring the z-score to further boost highly correlated gene-pairs.
  * It is highly recommended to enable \c --norm_subavg.
  *
  * \subsubsection sec_search Search Datasets
  *
  * Users may also define the datasets that they wish to use for integrations in a query-specific way, using \c -D argument.
- * If this argument is absent, all datasets in the compendium will be integrated. If \c -D is used, the search datasets
- * must be selected from the available datasets defined in \c dset_platform_map.
+ * If this argument is absent, all datasets in the compendium will be integrated.
+ * If \c -D is used, the search datasets must be selected from the available
+ * datasets defined in \c dset_platform_map.
  *
  * \subsubsection sec_files Query-independent search setting files and directories
  *
  * \code
  * 10003 10002 10001
  * 634 6265
- * \encode
+ * \endcode
  * The names of the genes must be selected from the genes in the \c gene_map.
  * The maximum length of the query depends on the amount of available memory in the system.
  * It is recommended to keep each query less than 100 genes.
  *
  * \c -D \c search_dset
  *
- * This file defines the list of datasets to be used for integration. It can be query-specific.
+ * This file defines the list of datasets to be used for the query coexpression search.
+ * The file is defined in a query specific way.
  * An example is provided below:
  * \code
  * GSE15913.GPL570.pcl GSE16122.GPL2005.pcl GSE16836.GPL570.pcl ...
  * GSE14933.GPL570.pcl GSE15162.GPL2005.pcl GSE15566.GPL570.pcl ...
  * ...
- * \encode
- * where each line, corresponding to a query, is a space-separated dataset list.
+ * \endcode
+ * where each line, corresponding to a query, is a space-separated dataset list for the query.
  * The dataset names must be selected from the file \c dset_platform_map.
  *
  * \c -P \c platform_dir
  * Directory that contains the following 3 files:
  * \li \c all_platforms.gplatavg. the platform average z-scores
  * \li \c all_platforms.gplatstdev. the platform z-score standard deviation
- * \li \c all_platforms.gplatorder. The order of platforms
+ * \li \c all_platforms.gplatorder. the order of platforms
+ *
  * These binary files are generated by SeekPrep. The specification of this directory is
  * necessary for \c --norm_subavg_plat.
  *
  * Directory that contains the gene presence files and the gene average files:
  * \li Gene presence (GPRES files): indicates the presence/absence of genes in a dataset
  * \li Gene average (GAVG files): indicates the average z-score of each gene in a dataset
+ *
  * There should be one pair of these files for <b>every</b> dataset that is specified
- * in \c dset_platform_map. Also generated by SeekPrep.
+ * in \c dset_platform_map. Generated by SeekPrep.
  *
  * \c -d \c db_dir
  *
  *
  * \c -Q \c quant
  *
- * The quant file specifies how the z-scores are binned. This is necessary for properly reading
+ * The \c quant file specifies how the z-scores are binned. This is necessary for properly reading
  * the z-scores, because the z-scores are stored as binned values on disk. This quant file is used
- * to convert them back to z-scores. Currently, the maximum number of bins supported is 255.
- * An snapshot of the quant file is below:
+ * to convert them back to z-scores when they are read from disk.
+ * Currently, the maximum number of bins supported is 255.
+ * A snapshot of the \c quant file is below:
  * \code
  * -5.00 -4.96 -4.92 -4.88 -4.84 -4.80 -4.76 -4.72 -4.68 -4.64 -4.60 -4.56 -4.52 ...
  * \endcode
  *
  * Directory that will contain the search results.
  * 
+ * \c -u \c sinfo_dir
+ *
+ * Directory that contains the SINFO files, which list a dataset's average z-score between all pairs of genes
+ * and the standard deviation. If this directory is provided, there should be one SINFO file for <b>
+ * every</b> dataset in \c dset_platform_map. Generated by SeekPrep.
+ *
  * 
  * \subsection ssec_usage_detailed Detailed Usage
  * 

tools/SeekServer/stdafx.cpp

 /*!
  * \page SeekServer SeekServer
  * 
+ * SeekServer runs the coexpression mining algorithm using a multithreaded TCP/IP interface.
+ * When it is running, SeekServer services requests over the network from multiple connected clients
+ * for genes that co-express with the client's query genes.
+ * A list of genes that are found by the algorithm to be coexpressed with the query genes and a list of datasets
+ * where this coexpression with the query is found to be occurring are sent back to the client.
  * 
  * \section sec_usage Usage
  * 
  * \subsection ssec_usage_basic Basic Usage
  * 
  * \code
- * SeekServer -i <genes.txt> -x <db list> -d <input directory> -D <output_dir>
+ * SeekServer -t <port> -x <dset_platform_map> -i <gene_map> -d <db_dir> -p <prep_dir> -P <platform_dir>
+ * -Q <quant> -n <num_db> -u <sinfo_dir>
  * \endcode
  * 
- * 
+ * This starts an instance of SeekServer on the indicated port and begins accepting client requests.
+ *
+ * \subsubsection ssec_cl Client Request Format
+ *
+ * When a client request comes in, SeekServer looks for the following sequence of 4 strings that are sent by the client:
+ *
+ * \li \c strSearchDataset. Dataset names, as referred by the \c dset_platform_map, to be used for the search.
+ * Delimited by " ".
+ *
+ * \li \c strQuery. Query gene names, as referred by the \c gene_map, separated by " ".
+ *
+ * \li \c strOutputDir. Output directory where intermediate results are generated. Must be a directory that the running user of
+ * SeekServer has access to. \c /tmp is recommended.
+ *
+ * \li \c strSearchParameter. A string of the form "1_2_3_4" where each number denotes the following:
+ * 1 - the search method, one of \c RBP, \c OrderStatistics, \c EqualWeighting <br>
+ * 2 - rbp parameter p (a float 0.90 - 0.99). Recommended 0.99. <br>
+ * 3 - minimum fraction of query required to score each dataset (0 - 1.0). Recommended 0 (no minimum). <br>
+ * 4 - distance measure, one of \c Correlation, \c Zscore, \c ZscoreHubbinessCorrected. <br>
+ *
+ * See Sleipnir::CSeekNetwork for the specification of the format of an incoming string message.
+ *
+ * Once SeekServer correctly receives the above 4 strings, a search instance using the provided search parameters will
+ * be initiated on the server side.
+ *
+ *
+ * \subsubsection ssec_out Outgoing Message Format
+ *
+ * Each outgoing message is generated upon finishing searching the client's query. In general, if the search is successful,
+ * the client expects two arrays from the SeekServer in sequence: a binary float array of dataset weights, and a binary float array
+ * of gene scores. An element at index \a i in the dataset array represents the weight of the dataset with ID = \a i.
+ * An element at index \a j in the gene array represents the score of the gene with ID = \a j.
+ *
+ * See Sleipnir::CSeekNetwork for the specification of the format of an outgoing float array.
+ *
+ *
+ * \subsubsection ssec_search Query-independent search setting files and directories
+ *
+ * These include the following: \c dset_platform_map, \c gene_map, \c db_dir, \c prep_dir, \c platform_dir, \c quant,
+ * \c sinfo_dir.
+ * For a discussion of these files and directories, please refer to the SeekMiner page in section:
+ * Query-independent search setting files and directories.
+ *
+ *
  * \subsection ssec_usage_detailed Detailed Usage
  * 
  * \include SeekServer/SeekServer.ggo
  * 
- * <table><tr>
- *	<th>Flag</th>
- *	<th>Default</th>
- *	<th>Type</th>
- *	<th>Description</th>
- * </tr><tr>
- *	<td>-i</td>
- *	<td>stdin</td>
- *	<td>Text file</td>
- *	<td>Tab-delimited text file containing two columns, numerical gene IDs (one-based) and unique gene
- *		names (matching those in the input DAT/DAB files).</td>
- * </tr><tr>
- *	<td>-d</td>
- *	<td>.</td>
- *	<td>Directory</td>
- *	<td>Input directory containing DB files</td>
- * </tr><tr>
- *	<td>-D</td>
- *	<td>.</td>
- *	<td>Directory</td>
- *	<td>Output directory in which database files will be stored.</td>
- * </tr><tr>
- *	<td>-x</td>
- *	<td>.</td>
- *	<td>Text file</td>
- *	<td>Input file containing list of CDatabaselets to combine</td>
- * </tr></table>
  */
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.