correct workflow for weight recalibration

Create issue
Issue #12 resolved
Elisabetta Manduchi created an issue

Hello, I have 2 biological replicates and would like to use them to recalibrate weights. I use the runChicago.R wrapper and would like to clarify if the below is the correct (and most efficient) workflow:

  1. Run separately runChicago.R on each of the 2 replicates, that is two runs: in one run the input is only rep1.chinput and in the other run the input is only rep2.chinput
  2. Get the 2 separate rds objects, rep1.rds and rep2.rds and run fitDistCurve.R with --input rep1.rds,rep2.rds
  3. Update the settings File with the new weights from (2)
  4. Use the updated settings File to do one more run of runChicago.R this time providing the comma-separated list rep1.chinput,rep2.chinput

Also, if the above is valid, can you advice on whether the attached diagnostics indicate that the estimates from (2) are reliable?

Thanks

Comments (7)

  1. Jonathan Cairns

    Hi Elisabetta,

    Yes, what you've done is the correct and most efficient workflow at this time.

    weights_curveFit.pdf is a bit concerning, though not terrible. Most likely this is down to using only 2 replicates, or perhaps undersampling issues, though this could also be down to e.g. an unusual cell type that CHiCAGO's strategy isn't suitable for. For example, contrasting weights_curveFit.pdf with Figures 5A and Supplementary 5A in the CHiCAGO paper, we see a worse fit in your data.

    weights_medianCurveFit.pdf looks fine overall (though some of the data subsets, i.e. the coloured lines, also show an unusual pattern).

  2. Elisabetta Manduchi reporter

    I've run Chicago both with the recalibrated weights as above and with the default weights and I noticed an increase in the percentage of significant trans interactions when I use the recalibrated weights. I find this somewhat worrisome, what's your take on it? Thanks.

  3. Jonathan Cairns

    Looks like the curve in weights_curveFit.pdf doesn't reach as low as it perhaps should, thus meaning that distal/trans interactions aren't penalized as much as they should be. I suspect that you'll struggle to get a better curve out of this data set without more replicates or more coverage - you could try decreasing the "--subsets" parameter, but I imagine that will have bad side effects.

  4. Elisabetta Manduchi reporter

    Thanks for your feedback. I'm really only interested in cis interactions, so I wouldn't mind throwing away the trans as long as the cis I get are reliable. Do you think the latter is the case?

  5. Log in to comment