Help understanding diversity tests at a fixed diversity order

Issue #84 resolved
Peter Hickey created an issue

Thanks for the package, it’s been helpful to me as I learn to analyse TCR data from the 10X Genomics platform.

I’ve got a few questions about diversity tests at a fixed diversity order, some based on the vignette (https://alakazam.readthedocs.io/en/stable/vignettes/Diversity-Vignette/#view-diversity-tests-at-a-fixed-diversity-order) and some more general questions:

  1. The q=0 and q=2 plots look identical to my eye; should they be different?
  2. Do the three points on the plot correspond to (mean - sd, mean, mean + sd) and the error bars around each point indicate some measure of bootstrap variation?
  3. Do you have any references for how the significance testing is actually implemented and/or justified? E.g., my understand the boostrapping is done to estimate the per-sample variability of the diversity curves but not how they values are actually compared between samples (e.g., t-test, Wilcoxon test, etc.).
  4. How do you recommend that the significance testing be extended to a situation with replicates? For example, say I have 2 conditions (infected and uninfected) and 3 biological replicates per condition (infected_1, infected_2, infected_3, uninfected_1, uninfected_2, uninfected_3; samples are not paired). Seemingly, the current implementation allows me to do pairwise tests between each replicate by specifying group = “sample” (e.g., infected_1 vs. infected_2, …, infected_1 vs. infected_6, …., infected_5 vs. infected_6) or to compare the conditions by aggregating the data across the replicates by specifying group = “sample”, but this essentially ignores the biological variability (at least that’s my initial impression coming from a background of analysing gene expression data).

Thanks for any help or guidance you can provide

Pete

Comments (4)

  1. Jason Vander Heiden

    Greetings @Peter Hickey ,

    1. No, they shouldn’t be the same. That looks like a bug in the plotting function. It looks like it’s plotting all three values of q in the data (0, 1, 2) instead of just the one specified. There should only be one point per isotype. The test results are in the @tests slot of the output object, if you want to plot them manually while we work on fixing the bug.

    2. Each point should be the mean diversity and each bar should be +/- 1 sd, based on the variance of the bootstrap. The plotting bug is confusing how this is represented.

    3. It looks like we accidentally lost the documentation for how the test is calculated when we deprecated the old testing function (https://alakazam.readthedocs.io/en/stable/topics/testDiversity)). In brief, the p-value is calculated using the delta of the bootstrap distributions. The missing methods text is:

    Significance of the difference in diversity index (D) between groups is tested by constructing a bootstrap delta distribution for each pair of unique values in the group column. The bootstrap delta distribution is built by subtracting the diversity index Da in group-a from the corresponding value Db in group-b, for all bootstrap realizations, yielding a distribution of nboot total deltas; where group-a is the group with the greater mean D. The p-value for hypothesis Da != Db is the value of P(0) from the empirical cumulative distribution function of the bootstrap delta distribution, multiplied by 2 for the two-tailed correction.

    This method may inflate statistical significance when clone sizes are uniformly small, such as when most clones sizes are 1, sample size is small, and max_n is near the total count of the smallest data group. Use caution when interpreting the results in such cases. We are currently investigating this potential problem.

    4. If you have biological replicates, then I suggest just doing a standard significance test (t-test, Wilcoxon, etc) on the point estimates for each sample (the means of the bootstrap distribution for each replicate/condition). The significance test is really just for dealing with the two sample case.

    (I think we experimented with some error propagation methods at one point, but they didn’t perform well. I’d have to dig around to find them.)

  2. Jason Vander Heiden

    I fixed the plotting function and alphaDiversity documentation in ea86a07. You can install the latest development version from Bitbucket via:

    library(devtools)
    install_bitbucket("kleinstein/alakazam@master")
    

    Someone else is currently working on fixing some TCR support bugs in groupGenes, so that may not be entirely stable at the moment.

  3. Log in to comment