How to cite
Aksoy et al. "PiHelper: an open source framework for drug-target and antibody-target data". Bioinformatics (2013) 29 (16): 2071-2072. PMID:23766416
PiHelper is a drug- and antibody-target information aggregator and provider service. The data aggregated by PiHelper can be accessed either programmatically via a Java API or through a web user interface. The framework works in a human-gene centric manner and model drug-target and antibody-target relationships accordingly. PiHelper collectively provides:
- a command line user interface for importing gene-centric drug- and antibody-target data from multiple sources and for exporting data in various formats
- a Java API, for programmers interested in building applications that query, modify or analyze drug- and antibody-target data
- a RESTful web service API, to allow developers utilize the framework without the need of coding in Java
- a web-based user interface for easy querying of the data and visualizing it in an interactive manner
PiHelper is structred as a multi-module Maven project of which three main submodules are described below:
PiHelper is supported by funding from the U.S. National Institutes of Health, National Human Genome Research Institute, grant number U41 HG006623, National Resource for Network Biology, grant number P41 GM103504, and Cancer Technology Discovery and Development Network, grant number U01 CA168409.
- built with Spring Roo
- contains core model classes such as Drug, Gene, Antiboy, DrugTarget, and etc.
- provides Roo-based JSON/HTML formatted web services
- supports persistence using MySQL and Hibernate technologies
- provides various finder methods in order to ease accessing/querying data
The following two screenshots show the Spring Roo-powered web service component:
- depends on the
- provides main exporter/importer facilities via a command-line interface
PiHelper Administrator: Importer
Provides support for importing data from the following resources:
- KEGG Drug
- CCLE Drugs
- CancerRx Gene Drugs
- NCI Cancer Drugs (via CancerDrugScraper project)
You can also import tabulated drug and/or antibody data from a tab-delimited file -- either from a local file or an URL. Please see the relevant sections within the import section of the document for more details.
Notice about the usage of these data sets
Please be aware of the different data usage policies of each of the data resources we support as part of the framework if you plan to re-distribute the data aggregated by PiHelper.
PiHelper Administrator: Exporter
Provides support for exporting aggregated data in a tab-delimited plain text format.
This exported file can then be used in other tools, such as Cytoscape, for further analyses.
For importing drug-target data into Cytoscape 3.x, please use the
File > Import > Network > File... menu short-cut and then select
drugtargets.tsv file exported by PiHelper.
The following is a screenshot of the Cytoscape Import Dialog showing the necessary configuration for a sucessful import:
- built with Twitter Bootstrap, Backbone.js, CytoscapeWeb
- provides a web-based interface for exploring drugs and drug-target interactions through interactive network graphs.
- reqiures a running instance of the
coremodule for queries.
The following screenshots show how users can query the data and visualize it as an interactive network (via Cytoscape Web):
Getting the source code
PiHelper is a free software project and the code is currently being hosted on the following BitBucket repository: https://bitbucket.org/armish/pihelper. The source code is provided as a Maven 3 project and it is required to install PiHelper and invoke the admin interface.
In order to get the latest stable code, please use the following command:
hg clone http://bitbucket.org/armish/pihelper -b stable
Configuring database connection
All database related configuration options can be found in
This provided configuration is an example file.
install and/or preparing WAR/JAR packages, this file should be edited and copied to the followung destination:
cp -f core/src/main/resources/META-INF/spring/database.properties.example core/src/main/resources/META-INF/spring/database.properties
database.properties, please make sure that the username, password and database name are correct for a succesful database connection.
Configuring the cache folder
PiHelper's importer component takes advantage of a local cache folder where it tries to download the files only once and put them under a folder using m5sum of their full URL as a name.
This folder is by default is
/tmp/pihelper_cache/, but it can be configured within the
If the data resources update their data sets, since the import, simply removing this folder will reset the PiHelper cache:
rm -rf /tmp/pihelper_cache
(Optional) Allocating more memory for the Java Virtual Machine
Importing some of the data resources may require increasing the maximum memory size that the JVM can use.
The importer and exporter scripts mentioned below are run through
so if the maximum allocatable memory can be configured via setting
MAVEN_OPS environment variable before the importer/exporter utilities are used:
The command above, for example, will set the maximum allowable memory size that maven can use to
(Optional) Configuring the service URL
Users willing to use the web interface should configure the URL that the
core web service will be running at.
This value can be modified within
... var CORE_API_URL = "../pihelper-core/"; var WEB_API_URL = "/pihelper-web/"; ...
The default values assume that both the
core and the
web modules are deployed to the same directory;
any different configuration will require to adjust these variables accordingly.
Running unit tests
Before trying to setup an instance of PiHelper on your local machine, please make sure that all the JUnit tests succeed.
This will not only make sure the database setup is correct, but it also prevents any potential errors that might arise due to system setup.
In order to run the tests, please use the following command in the main
mvn clean test
In order to use/deploy the PiHelper files, first compile and install the project with the following maven command:
mvn clean install
This will create the necessary war files both for the
core and the
Users can directly copy these war files for deploying the core and the web modules, e.g. into the Tomcat
cp -f core/target/core-VERSION.war ~/Library/Tomcat/webapps/pihelper-core.war cp -f web/target/web-VERSION.war ~/Library/Tomcat/webapps/pihelper-web.war
The install command will also compile and prepare the necessary classes for admin operations, including importing and exporting data.
In order to invoke the command-line admin interface, please run the following the command in the
bash src/main/scripts/pihelper-admin.sh usage: pihelper-admin.sh [export|import] [...]
Each of these subcommands, import and export, will give different lists of options along with their help text:
bash src/main/scripts/pihelper-admin.sh import
bash src/main/scripts/pihelper-admin.sh export
Usage: Import interface
The importer admin interface readily supports multiple data-sources, listed above categorized by the data type they provide. In order to list these resources, please use the following the command:
bash src/main/scripts/pihelper-admin.sh import -l
Before running any importers, the admin interface can be used to create the database schema:
bash src/main/scripts/pihelper-admin.sh import -CREATE dbname
The command above will reset the database, if it already exists, and create the necessary tables without any data in them.
The importer interface can be used to import either all the data at once or portions of data. The following command will import data from all supported data resources:
bash src/main/scripts/pihelper-admin.sh import -a
or for only gene-related data, for example, the following command can be used:
bash src/main/scripts/pihelper-admin.sh import -r
optionally the importers can be run individually:
bash src/main/scripts/pihelper-admin.sh import -e GeneImporter,DrugBankImporter,KEGGDrugImporter,CancerDrugImporter
All drug and antibody importers should be run after the
GeneImporter because this importer provides the minimal background gene data.
Finally, importer provides the option to merge the drug-targets that represent the same drug-gene relationship -- or similarly the same antibody-gene relationship.
Normally, each data source will create its own drug-gene relationship and all these will be kept as separate entities in the database.
It is sometimes more advantegous to merge these entities into a single one, and in order to this, the
-m switch is used:
bash src/main/scripts/pihelper-admin.sh import -m
Custom Data Importers
Additionaly, users can import custom drug and or antibody data from a tabulated text file using the corresponding
For example, the following command will pull a sample drug data sheet from the URL and will import it into the database:
bash src/main/scripts/pihelper-admin.sh import -D "https://docs.google.com/spreadsheet/pub?key=0AlB4d4qUEn-1dEtIVmtRcnZ4N1pmZ0VpZjBlbndfN3c&output=txt"
The same import can also be done via a local file, e.g.:
wget -O /path/to/drugs.tsv "https://docs.google.com/spreadsheet/pub?key=0AlB4d4qUEn-1dEtIVmtRcnZ4N1pmZ0VpZjBlbndfN3c&output=txt" bash src/main/scripts/pihelper-admin.sh import -D "file:///path/to/drugs.tsv"
The above syntax is also valid for the custom antibody data importer:
bash src/main/scripts/pihelper-admin.sh import -T "file:///path/to/antibodies.tsv"
The custom drug data importer utility assumes the following tab-delimited format for the file it is trying to import:
#Drug Name Synonym(s) Description (max 1024 chars) Target Gene Symbols is Cancer Drug? is FDA Approved? is Nutraceutical? External References Example1 S11|S12 Some description 1 AKT*|TP53|EGFR TRUE TRUE FALSE PubChem:1|NCI Drug:3 Example2 Some description 2 BRAF FALSE FALSE TRUE KEGG:2
For antibody data, also a similar format is expected:
#Antibody Name Description Target Gene Symbols External References Example1 Some description 1 AKT1|AKT2 ProteinAtlas:1 Example2 Some description 2 ACTN ProteinAtlas:2
For target gene names, wild-card characters are allowed at the end of the gene symbol -- e.g. AKT* will match all gene symbols that start with AKT (AKT1, AKT2, AKT3, ...).
All lines starting with the character
# are ignored to leave room for comments in the input file.
Additional ID Importer Scripts
Additional Groovy scripts are provided (in scripts/) to retrieve PubChem Compound IDs (CIDs) for Board Cancer Cell Encyclopedia (CCLE) and Sanger Cancer Genome Project (CGP) using the the PubChem API (https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html); PubChem is searched for CIDs where a given name matches through a lowercase/exact match in the synonyms for PubChem compounds.
Usage: Export interface
When the data gets loaded into the database, it is possible to create tab-delimited export files for different categories.
-l switch will list which exporters are available for use, but for simplicity the following command can be utilized to export all aggreagted data at once:
bash src/main/scripts/pihelper-admin.sh export -a -o /path/to/output/folder
The default file names for each data type are as follows:
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.