How the negative cross-correlation values handled in cna()?

Issue #485 resolved

Former user created an issue 2017-07-23

Hi All

I am new to using Bio3D so my question might seem trivial. I want to understand how the negative values present in the cross-correlation matrix (cij) obtained from dccm() are handled when a cut-off.cij is a positive number, say 0.3 in cna(). By this I mean that in the tutorial it is mentioned that Cij is "A numeric array with 2 dimensions (nXn) containing atomic correlation values, where "n" is the residue number. The matrix elements should be in between 0 and 1 (atomic correlations). " But the cross-correlation values lie between -1 and +1. So this arises a question in my mind that how does this function places an edge between two nodes when the cross-correlation value is negative? Is it done by taking an absolute of all negative values such that e.g. -0.3 and +0.3 is same? I hope I am clear with my question. Any information on this matter will be helpful.

Thanks

Comments (12)

Xinqiu Yao
Hi,

cna() takes absolute values and so '-0.3' and '0.3' would be the same. What you mentioned in the tutorial is likely taking about 'LMI' correlatoins, which are in the range 0-1. Otherwise, it could be a typo and we will update it in next release.
- 2017-07-24T01:39:41+00:00
Barry Grant
In this case the function takes the absolute values. Let us know if you need anything else here.
- 2017-07-24T02:05:39+00:00
HimaniTN
Hello Xin and Barry

Thank you for the clarification.

As a beginner in network analysis, I also have confusions like others as to what cut-offs to use for correlations and contact-map filtering. I hope this is the right platform to ask such questions. I request you to please shed some light on deciding the cut-offs while doing such analysis. Are these cut-offs dependent on the size of the protein? Which approach is better to use - a contact map of 4.5A and then weighing the network with cross-correlations without filtering the strong or weak correlations like in Sethi et al, 2009 PNAS paper or using a 10A contact map to filter out low correlations (<0.6) like in your papers (G-alpha paper and ensemble NMA for allostery)? Or should it ideally give the similar results? I shall be highly thankful if any kind of help is provided on this.
- 2017-07-27T20:43:00+00:00
Xinqiu Yao
Hi,

This is a good question (and not a one easy to answer). From what I have learned, there is no unique answer for all systems. For MD simulations, we found all values between 0.3 to 0.7 could be good cutoffs for correlation depending on how flexible your system is. If the correlation is calculated from NMA, the cutoff range will shift down a bit because in general NMA generates weaker correlation than MD. A similar issue exists in contact map. Although 4.5 is a popular choice for distance cutoff, there are studies using values ranging from 4 to 6. Unfortunately, the cutoff value is critical and does affect the final results of network analysis, such as the communities and suboptimal paths.

In the method described in PNAS 2009, the network edges are completely determined by contact maps. There is a issue using a single distance cutoff for all amino acid pairs - the interaction range between e.g. Arg and Glu is apparently longer than that between Ala and Gly, and the single-cutoff contact map ignores this difference and may lose some TRUE contacts (or bring FALSE contacts if using a larger cutoff). (See for example Ribeiro and Ortiz, JCTC 2014, 10:1762 for a deeper discussion about this issue). Our method was designed to alleviate the dependency of network edges on geometric contacts - if two residues have constant strong correlation tested from multiple simulations, they will be added an edge no matter how far they are from each other. Also, if correlation is constantly low there will be no edge even though the residues are very close. In this sense, our method may better capture the "heterogeneity" of residue interactions, although we haven't quantitatively tested how much it is actually improved.

I would suggest play around with different methods or cutoff values and see which is better according to your own expertise of your system. There are several tools in bio3d that can assist you for this purpose. For example, with pymol.dccm() you can visually inspect what the network looks like using various correlation cutoff values. Check our tutorial for protein structure networks or the Documentation on our website (http://thegrantlab.org/bio3d/) for more useful tools.

Hope it helps.
- 2017-07-28T03:01:07+00:00
HimaniTN
Hi Xin

Thank you for such a detailed answer. It indeed is very helpful in understanding the concept better. It seems an interesting idea to use correlation as a metric to add or remove edges. I have few more naive questions. I would be obliged if you can provide answer for these too.

As you mentioned, "For MD simulations, we found all values between 0.3 to 0.7 could be good cutoffs for correlation depending on how flexible your system is". So is there any direct relation between flexibility and cut-off such that highly flexible proteins should be given high cut-off and vice-versa? As a side question, I want to know how you define the flexibility of proteins? Is it based on RMSF values? You have also mentioned "If the correlation is calculated from NMA, the cutoff range will shift down a bit because in general NMA generates weaker correlation than MD". I observe that for my system NMA gives higher correlation values than multiple MD simulations. Is it normal?

Secondly for my system of interest, when I see the cross-correlation map without filter I observe that apart from diagonal there are very few regions which show correlation values between 0.3-0.6 and most of the map is white. As you mentioned, I tried different options for cut-offs for correlation. When I use a cut-off of 0.6, most of the network is disconnected and when I decrease the cut-off, more and more nodes get connected which is obvious. I would like to know your views on how well connected a network should be? I understand that if a network is disconnected some paths cannot be traversed. But if I keep a less stringent cut-off, practically everything can be included.

Thirdly, how these cut-offs depend on multi-chain proteins?
- 2017-07-28T06:33:59+00:00
Xinqiu Yao
By flexibility, I mean for example multidomain (or multichain) protein versus single-domain protein - multidomain protein has overall higher correlation values because of the domain motion. There is no direct relation between flexibility and cutoff, or an equation that can help to map one from the other. As I said, the best way is to plot the correlations and visually check what is the most suitable.

Did you do all-atom NMA or elastic network model (ENM) based NMA? When I said "NMA gives weaker correlation" I mean ENM-NMA. Of course, it does not mean the NMA is wrong when it generates stronger correlation than MD - it might be system dependent. The point there is that cutoff for correlation varies from system to system, and even between different methods for the same system.

I would stop at a point when the system is well connected but not too many "long-range" edges. Note that you can set the "step" of increasing/decreasing correlation levels in pymol.dccm(), which may help find a suitable cutoff in a narrow range of values, and so you won't jump from disconnected parts to everything being included.
- 2017-07-29T02:35:52+00:00
HimaniTN
Thank you very much Xin for the insightful answers. I think I understand the matter now. Thanks again.
- 2017-07-29T11:43:43+00:00
Olivia Debnath
Hello! I am Olivia and I have the same doubt for days. I am working on Protein structure network analysis and for that, I need the cut-off cij value which is crucial. To my knowledge, the cij or the cross-correlation value should not be dependent on the protein size or the folding topology. Also, the "hub frequency" i.e., the amino acids preferences to remain highly connected must be associated with this interaction cut-off values. Thus the cij cut-off should be chosen in such a way so that we are choosing only the "non-covalent interactions". I am performing the network analysis from NMA (and not from MD trajectories) and it's mentioned above that the cij cut-off goes a little down for the NMA method. What value am I suppose to choose?

Also, for network analysis- is it a good idea to consider the absolute values ignoring all the negative correlation values? (Sorry, I am a beginner and I am not very clear about this).

Best regards, Olivia
- 2017-10-26T09:58:58+00:00
Xinqiu Yao
As we have discussed several times above and in other issues, you have to play around with different cutoff values and decide which one is the most suitable for your system by yourself. I have no idea what values are good without seeing the "network".

For your second question, negative correlations have to be converted because they will represent "distance" between nodes in the network (so, negative values are nonsense here). We use the same algorithm as in the 'dynamical network analysis' method (Sethi et al, PNAS 2009).
- 2017-10-30T23:30:49+00:00
Olivia Debnath
Hello Xin! Thank you so much for the detailed answering. I am tuning the cij values and looking at the networks. However, while considering the geometric distance between the two C-alpha carbon atoms (of two residues), I want to consider the distance between 2 to 5 Angstroms. We can provide an upper limit by specifying the 'dcut' value in the filter.dccm() but I don't understand how to provide the lower limit? I am interested in this distance range because I only want to look at the non-covalent interactions among the amino acid residues and want to eliminate all possible covalent interactions.

And one more question is, how are you treating the aromatic-aromatic interactions here?

Thanking you, Olivia
- 2017-11-06T05:55:57+00:00
Xinqiu Yao
To exclude covalent pairs, try scut=2 (see help(cmap) for more detail), which ignores all neighboring pairs in sequence.

We don't have a particular treatment of aromatic-aromatic interactions. What is your consideration here?
- 2017-11-06T20:09:14+00:00
Lars Skjærven
- changed status to resolved
reopen if needed
- 2017-12-06T19:51:39+00:00
Log in to comment

Assignee: –

Type: task

Priority: trivial

Status: resolved

Component: –

Version: –

Votes: 0

Watchers: 6