Increasing contact map filter size doesn't reduce community size

Issue #501 resolved
Jacob Bowman created an issue

Hello,

I have been using the cmap.xyz to generate a contact map for my protein of interest with molecular dynamics information. I have been altering the dcut and pcut values from 10-30, and 0.25-0.75 respectively. I had anticipated that I would be reducing the community size as the pcut value became lower and the dcut value went higher, but that is not the case. The same number of communities exist from go from the pcut values of 0.50 and dcut of 15 up to a pcut of 0.75 at a dcut of 30. Is that supposed to be the case? The DCCM is altered to filter out/include more contacts in the contact map itself, but the communities aren't changed. I also have another system, that is very similar with stable MD trajectories, that has many residues in their own individual community, but they show that they are clearly within the distance cutoff to be considered in the same community in a helix. Is there a way to control how the communities themselves are formed in the analysis? How are they determined for the program?

Thank you,

Jacob Bowman

Comments (4)

  1. Jacob Bowman reporter

    Hey Xin-Qiu,

    This answered a couple of my questions about how to modify my contact filters and community methods. However, I'm still confused on how the communities are determined and how if I increase the dcut parameter and decrease the pcut parameter that I don't decrease the number of communities in my network.

    Thanks,

    Jacob

  2. Xinqiu Yao

    The hierarchical tree that defines how close each pair of subsets of nodes are and that is used to determine the communities is from the Girvan-Newman algorithm (http://www.ncbi.nlm.nih.gov/pubmed/12060727). The final community partition, i.e. how to "cut" the tree, is based on a maximization of 'modularity' (https://en.wikipedia.org/wiki/Modularity_(networks)).

    Although in some cases, the maximal-modularity solution looks fine, it is not necessary the "best" result. For example, sometimes the partition with a near-maximal modularity but smaller number of communities gives a better result in terms of consistency across simulation replicates, etc. It is system dependent and needs your own expertise to decide what partition is the most suitable.

    As I said, by default, the number of communities is determined by the maximal modularity, which could be either increase or decrease upon changing dcut/pcut, depending on how the new edges are added across the network.

    I recommend inspect the network using e.g. pymol.dccm() before running cna(), to make sure you don't have too many "long-range" connections, which usually causes problems in community detection (and also slows down the calculation).

    Hope it helps...

  3. Log in to comment