The script takes as its input the OMIM morbid map, a field-delimited file of OMIM's synopsis of the human gene map (see http://www.omim.org/downloads) and disease classification into 22 primary disorder classes (see Barabasi paper on the human disease network, PNAS 2007, SI Table 1). The classification is based on the physiological system affected by the disorder. The result consists of three networks in Pajek-based format:
A) diseasome bipartite network,
and two biologically relevant network projections:
B) human disease network (HDN), C) disease gene network (DGN).
In the HDN nodes represent disorders and two disorders are connected if they share at least one gene in which mutations are associated with both disorders.
In the DGN nodes represent disease genes and two genes are connected if they are associated with the same disorder. The association of gene with the disorder is supported by various evidence: (1) confirmed association with unknown underlying effect, (2) linkage, (3) confirmed molecular basis with found mutation in the genes, (4) a contiguous gene deletion or duplication syndrome, multiple genes are deleted or duplicated causing the phenotype.
The most complete and best-curated list of known disorder-gene associations is maintained by the Morbid Map (MM) of the Online Mendelian Inheritance in Man (OMIM). Each entry of the MM is composed of four fields, the name of the disorder, the associated gene symbols, its corresponding OMIM ID, and the chromosomal location.
The script saves information on vertices into .net files, and disease class assignment into .clu file.
- OMIM (http://www.omim.org/downloads),
- The human disease network (Goh et al., PNAS 2007).
Script requires Python-levenshtein module (see https://pypi.python.org/pypi/python-Levenshtein) for fast computation of string similarities. This is a C extension module. If it is not available user can use a slower Python implementation contained in script (_levenshtein).