Issue #17 resolved

Hubber: Incorrect calculation of gene-set 'cohesiveness'

arjunkrish
created an issue

Hubber has two major functionalities: 1. Calculating "cohesiveness" of input gene sets in a functional network; 2. Calculating the association of all the genes in the network to each of the input gene sets.

Functionality (2) appears to be OK, but not (1). The incorrect calculation of cohesiveness comes from counting edges between genes in the input gene set twice, leading to wrong values for 'hubbiness'. ['cliquiness', on the other hand, is looks fine.]

Comments (7)

  1. Casey Greene

    Arjun, if this still exists can you either fix this or provide sample files showing the bug (sample input + current output and correct output)? I'll look into it, but I'd like to know what the correct result is as I'm not 100% sure that I understand the issue with its cohesiveness.

  2. arjunkrish reporter

    Please find sample files to illustrate the issue.

    • 'abcd.dat' contains the toy network.

    • 'abc.genes' contains the list of input genes.

    • 'abc.hubber' contains the current hubber output upon running "Hubber -i abcd.dat abc.genes > abc.hubber".

    • the last file also contains my comments about what the calculation should be.

  3. Casey Greene

    This looks like a significant amount of work to update Hubber.

    A background distribution is stored for each gene (lines 319/320).

    To calculate hubbiness of the genes, the background distributions for each gene are summed (line 366), and then calculated upon (lines 644 and 645).

    Maybe there's a quick and easy fix, but it's escaping me at the moment. At the moment I am thinking that there's not a simple way to adjust adding together the backgrounds to correct for the multiple addition.

  4. Casey Greene

    Ok, I removed the use of the precomputed background for that part (it's still used overall to rank new genes if it outputs a list of genes). Because the overall number of within edges won't be too large, I don't think performance will be heavily affected and it should return correct results. Current output for the test set is:

    name    size    hubbiness       hubbiness std.  hubbiness n     cliquiness      cliquiness std. cliquiness n
    total   4       0.666667        0.596285        6       0.666667        0.596285        6
    abc.genes       3       0.666667        0.596285        6       1       0.707107        3
    
  5. Log in to comment