Help understanding diversity tests at a fixed diversity order
Thanks for the package, it’s been helpful to me as I learn to analyse TCR data from the 10X Genomics platform.
I’ve got a few questions about diversity tests at a fixed diversity order, some based on the vignette (https://alakazam.readthedocs.io/en/stable/vignettes/Diversity-Vignette/#view-diversity-tests-at-a-fixed-diversity-order) and some more general questions:
- The q=0 and q=2 plots look identical to my eye; should they be different?
- Do the three points on the plot correspond to (mean - sd, mean, mean + sd) and the error bars around each point indicate some measure of bootstrap variation?
- Do you have any references for how the significance testing is actually implemented and/or justified? E.g., my understand the boostrapping is done to estimate the per-sample variability of the diversity curves but not how they values are actually compared between samples (e.g., t-test, Wilcoxon test, etc.).
- How do you recommend that the significance testing be extended to a situation with replicates? For example, say I have 2 conditions (infected and uninfected) and 3 biological replicates per condition (infected_1, infected_2, infected_3, uninfected_1, uninfected_2, uninfected_3; samples are not paired). Seemingly, the current implementation allows me to do pairwise tests between each replicate by specifying
group = “sample”
(e.g., infected_1 vs. infected_2, …, infected_1 vs. infected_6, …., infected_5 vs. infected_6) or to compare the conditions by aggregating the data across the replicates by specifyinggroup = “sample”
, but this essentially ignores the biological variability (at least that’s my initial impression coming from a background of analysing gene expression data).
Thanks for any help or guidance you can provide
Pete
Comments (4)
-
-
I fixed the plotting function and
alphaDiversity
documentation in ea86a07. You can install the latest development version from Bitbucket via:library(devtools) install_bitbucket("kleinstein/alakazam@master")
Someone else is currently working on fixing some TCR support bugs in
groupGenes
, so that may not be entirely stable at the moment. -
reporter Thanks for your reply, Jason.
-
- changed status to resolved
Fixed the relevant bugs in ea86a07. We will do a release soon with the fixes (likely this week).
- Log in to comment
Greetings @Peter Hickey ,
1. No, they shouldn’t be the same. That looks like a bug in the plotting function. It looks like it’s plotting all three values of
q
in the data(0, 1, 2)
instead of just the one specified. There should only be one point per isotype. The test results are in the@tests
slot of the output object, if you want to plot them manually while we work on fixing the bug.2. Each point should be the mean diversity and each bar should be +/- 1 sd, based on the variance of the bootstrap. The plotting bug is confusing how this is represented.
3. It looks like we accidentally lost the documentation for how the test is calculated when we deprecated the old testing function (https://alakazam.readthedocs.io/en/stable/topics/testDiversity)). In brief, the p-value is calculated using the delta of the bootstrap distributions. The missing methods text is:
4. If you have biological replicates, then I suggest just doing a standard significance test (t-test, Wilcoxon, etc) on the point estimates for each sample (the means of the bootstrap distribution for each replicate/condition). The significance test is really just for dealing with the two sample case.
(I think we experimented with some error propagation methods at one point, but they didn’t perform well. I’d have to dig around to find them.)