********               ********
    README file for NetHiKe
********               ********

--- Requirement 
64bit operating system with more than 8G RAM recommended.
64bit C++ complier with Boost libraries (
Python2.6.x or Python2.7.x
NetworkX, python package of a collection of complex network algorithms (
(Optional) matplotlib, python plotting library (
(Optional) Cytoscape for visualization of analysis results (

--- Obtaining data file from PathwayCommons

Go to the following URL
or from, go to "Download".

Download the following file.
Pathway Commons.6.All.BINARY_SIF.tsv.gz or something like this.

After unzipping these files, construct the network data files for NetHiKe as follows.
$ python -i PathwayCommons.6.All.BINARY_SIF.tsv -o [output prefix]

You'll get the two output files.

--- Compile NetHiKe C++ core program (confirm that you have utm_array.hpp file in the same directory and change the compile command, depends on your boost library location.)
$ g++ -m64 -O3 -I /opt/local/include/boost/ -o nethike  bc_pathcomm.cpp /opt/local/lib/libboost_program_options-mt.a

--- Preparing your input file
NetHike's input file format is two columns and no header. In each row, the first column is official symbol of genes and the second column is weight value. When you does not need weight values, you can set all the weight values to 1.0 or omit them.


--- Execute NetHiKe
$ python -d [prefix]_data.txt -n [prefix]_attr -i [input gene list] -p [number of permutation] -o [prefix for output files]
(Optional) -v option generates boxplot. You need matplotlib.

python -d PC_data.txt -n PC_attr.txt -i P53_SIGNALING.txt -p 5000 -o p53_out

--- Results
View the [prefix for output]_result.txt, which is a tab delimited file of the analysis results, with a text editor or spreadsheet software such as Microsoft Excel.

[prefix for output]_log.txt: you can know the excluded input gene due to inconsistency with the Pathway Commons data.

[prefix for output].gml is a file for network visualization with Cytoscape.

If you specify -v option when you execute the script, following three files are additionally generated.
***_rob_res.txt : contains nlBC values of each node calculated by Leave One One method.
***_perm_result.txt : raw data for estimating simulated p-values of ncBC
***_boxplot.png : boxplot of random background, leave one out results and nlBC values of top 10 genes (order by p-values) 

--- Cytoscape visualization
Open [File]-[Import]-[Network (Multiple File Types)...], and select the [prefix for output].gml
Open [File]-[Import]-[Vismap Property File], and select NetHiKe_viz.props 

In the "Control Panel" (left side), goto "VizMapper(TM)", and set "Current Visual Style" to the NetHikeViz

Finally, choose [Layout]-[yFiles]-[Organic] to layout the network.

Red node line means the key molecule having the p-vlaue less than 0.01, and the bule node are in the input list.