- edited description
Clonal diversity
Can alakazam be used to quantify clonal diversity after clonal assignment from other packages such as Scirpy ?
**Data: Single-cell BCR sequences (scBCR-seq)
Comments (13)
-
reporter -
reporter - edited description
-
Greetings @Benjamaporn Sriwilai , Yes, it can. There shouldn’t be any problem with that as long as the AIRR tsv contains a column for the clonal assignment IDs. If the clone IDs are in a non-standard column, you’ll just need to specify the column name via the
clone
argument toalphaDiversity
(defaults toclone = "clone_id"
) -
reporter Thanks! So, should each row of the input represent one cell ? Because, my clonal assignment IDs are defined by “both heavy and light chain”. I wonder its during observing the smallest number of sequences in alphaDiversity function, so in my situation, it observe the smallest number of “cell”, right ?
-
Yes, you’ll want to subset to 1 row per cell with the default parameters. I do this by subsetting to
locus="IGH”
because of how clones are defined, but whatever is appropriate for your data is fine. The default behavior is to treat each row as an observation. However, if you have a summary table (count of cells per clone) you can also specify thecopy
argument to define the column with clonal abundance counts, instead of treating each row as a cell.I just looked at the docs and some of this is hidden in the
...
which is passed to estimateAbundance and is therefore documented in that function. Which is a bit confusing, given thatclone
,copy
, andgroup
are pretty important arguments. I think we can fix that. -
reporter Thank you so much, I will choose the most appropiate parameter to my data. Next, I am curious about alphaDiversity at q=1. In your coding, you use
D <- sapply(q, function(x) sum(p^x)^(1 / (1 - x)))
to calculate diversity index, right? However, we cannot use this equation to calculate it at q=1, so you use q=0.9999 to approximate the diversity index of q=1. After that, I try to check the result manually calculated by the mentioned equation at q=0.9999, but the results represent the very small value of D, and unequal to the result from alphaDiversity function (q=1). Why is it ? Feel free to let me know if I am missunderstanding. -
There’s a few other steps that happen in
alphaDiversity
. It’s first passed toestimateAbundance
to infer unobserved clonal abundance, smooth the distribution, and uniformly downsample all of the input groups so they can be compared. You should see some decrease in D compared to a raw calculation on an uncorrected abundance distribution, but it shouldn’t be major. If you’re seeing a large increase it is likely the downsampling, particularly if you have one group with small number of observations. That’s the first thing I would check. If so, you can correct it my upping themin_n
value, which will selectively filter out groups with too few observations. The diversity calculations tend to suffer from discreteness artifacts below ~1000 observations anyway. (I can’t recall why we set the default atmin_n=30
though - that seems rather low.)And, yes,
q=0.9999
is because of the division by zero problem at q=1. -
reporter So, the D value at q = 0.9999 is then calculated following this equation, right ?
-
It should be pretty close, but not identical because we just plug
0.9999
into the Hill calculation instead of using the Shannon entropy formula:> p <- c(1/15, 1/15, 1/5, 2/3) > q <- .9999 > sapply(q, function(x) sum(p^x)^(1 / (1 - x))) [1] 2.594272 > exp(-sum(p*log(p))) [1] 2.594181
-
reporter Thank you so much from all clear explanations!
-
reporter Dear Jason,
I have a more question. If I estimate abundance by choosing your default (copy=null), and my data structure is “one row represents one cell”. During resampling by bootstrapping, n is then drawn from the estimate complete abundance distribution. So, in my situation I draw n from all cells (detected and unseen cell). Is it correct ? Finally, I can calculate relative abundance of them (clone_id).
-
Yes, though
n
will be restricted to the size of the smallest group in the data. -
- changed status to resolved
This seems resolved. Open if needed.
- Log in to comment