Help understanding diversity tests at a fixed diversity order

Comments (4)

Jason Vander Heiden

Greetings @Peter Hickey ,

1. No, they shouldn’t be the same. That looks like a bug in the plotting function. It looks like it’s plotting all three values of q in the data (0, 1, 2) instead of just the one specified. There should only be one point per isotype. The test results are in the @tests slot of the output object, if you want to plot them manually while we work on fixing the bug.

2. Each point should be the mean diversity and each bar should be +/- 1 sd, based on the variance of the bootstrap. The plotting bug is confusing how this is represented.

3. It looks like we accidentally lost the documentation for how the test is calculated when we deprecated the old testing function (https://alakazam.readthedocs.io/en/stable/topics/testDiversity)). In brief, the p-value is calculated using the delta of the bootstrap distributions. The missing methods text is:

Significance of the difference in diversity index (D) between groups is tested by constructing a bootstrap delta distribution for each pair of unique values in the group column. The bootstrap delta distribution is built by subtracting the diversity index Da in group-a from the corresponding value Db in group-b, for all bootstrap realizations, yielding a distribution of nboot total deltas; where group-a is the group with the greater mean D. The p-value for hypothesis Da != Db is the value of P(0) from the empirical cumulative distribution function of the bootstrap delta distribution, multiplied by 2 for the two-tailed correction.

This method may inflate statistical significance when clone sizes are uniformly small, such as when most clones sizes are 1, sample size is small, and max_n is near the total count of the smallest data group. Use caution when interpreting the results in such cases. We are currently investigating this potential problem.

4. If you have biological replicates, then I suggest just doing a standard significance test (t-test, Wilcoxon, etc) on the point estimates for each sample (the means of the bootstrap distribution for each replicate/condition). The significance test is really just for dealing with the two sample case.

(I think we experimented with some error propagation methods at one point, but they didn’t perform well. I’d have to dig around to find them.)

‌

2020-07-13T19:46:04+00:00

Jason Vander Heiden

I fixed the plotting function and alphaDiversity documentation in ea86a07. You can install the latest development version from Bitbucket via:

library(devtools)
install_bitbucket("kleinstein/alakazam@master")

Someone else is currently working on fixing some TCR support bugs in groupGenes, so that may not be entirely stable at the moment.

2020-07-13T21:03:40+00:00

Peter Hickey reporter

Thanks for your reply, Jason.

2020-07-15T03:06:37+00:00

Jason Vander Heiden

changed status to resolved

Fixed the relevant bugs in ea86a07. We will do a release soon with the fixes (likely this week).

2020-07-16T16:29:19+00:00

Jason Vander Heiden
Greetings @Peter Hickey ,

1. No, they shouldn’t be the same. That looks like a bug in the plotting function. It looks like it’s plotting all three values of q in the data (0, 1, 2) instead of just the one specified. There should only be one point per isotype. The test results are in the @tests slot of the output object, if you want to plot them manually while we work on fixing the bug.

2. Each point should be the mean diversity and each bar should be +/- 1 sd, based on the variance of the bootstrap. The plotting bug is confusing how this is represented.

3. It looks like we accidentally lost the documentation for how the test is calculated when we deprecated the old testing function (https://alakazam.readthedocs.io/en/stable/topics/testDiversity)). In brief, the p-value is calculated using the delta of the bootstrap distributions. The missing methods text is:

Significance of the difference in diversity index (D) between groups is tested by constructing a bootstrap delta distribution for each pair of unique values in the group column. The bootstrap delta distribution is built by subtracting the diversity index Da in group-a from the corresponding value Db in group-b, for all bootstrap realizations, yielding a distribution of nboot total deltas; where group-a is the group with the greater mean D. The p-value for hypothesis Da != Db is the value of P(0) from the empirical cumulative distribution function of the bootstrap delta distribution, multiplied by 2 for the two-tailed correction.

This method may inflate statistical significance when clone sizes are uniformly small, such as when most clones sizes are 1, sample size is small, and max_n is near the total count of the smallest data group. Use caution when interpreting the results in such cases. We are currently investigating this potential problem.

4. If you have biological replicates, then I suggest just doing a standard significance test (t-test, Wilcoxon, etc) on the point estimates for each sample (the means of the bootstrap distribution for each replicate/condition). The significance test is really just for dealing with the two sample case.

(I think we experimented with some error propagation methods at one point, but they didn’t perform well. I’d have to dig around to find them.)

‌
- 2020-07-13T19:46:04+00:00
Jason Vander Heiden
I fixed the plotting function and alphaDiversity documentation in ea86a07. You can install the latest development version from Bitbucket via:
```
library(devtools)
install_bitbucket("kleinstein/alakazam@master")
```
Someone else is currently working on fixing some TCR support bugs in groupGenes, so that may not be entirely stable at the moment.
- 2020-07-13T21:03:40+00:00
Peter Hickey reporter
Thanks for your reply, Jason.
- 2020-07-15T03:06:37+00:00
Jason Vander Heiden
- changed status to resolved
Fixed the relevant bugs in ea86a07. We will do a release soon with the fixes (likely this week).
- 2020-07-16T16:29:19+00:00
Log in to comment