support lognormal model fed by exp(mu) instead of mean of unlogged counts
the original lognormal models (used for variance fitting and pvalue calling) assume that they will be passed unlogged observed values and expected values that are true expected values of those unlogged observed values
they then use simple properties of the lognormal distribution to convert these to the mean and variance of the related normal distribution lnX ~ N(mu, sigma2)
, using the functions in lib5c.algorithms.variance.lognorm_dispersion
during variance fitting, and lib5c.util.distributions.log_parameters()
during pvalue calling (these functions differ only in what parameters they take as input and what they return)
when the expected value comes from a model that is fitted in log space (e.g., log-counts 1-D expected models and/or expected models corrected with the donut passing donut=True
and log_donut=True
or -D/--donut_correction
and -O/--log_donut
), the expected is not actually the expected value of the unlogged counts; instead it is exp(mu)
, the exponential of the mean of the unlogged counts (also the geometric mean and the median of the unlogged counts)
in this case, the lognormal models (both for variance and pvalue calling) incorrectly assume that the expected being passed to them is actually a true expected value, and incorrectly apply the lognormal <-> normal conversion, when they should instead compute mu = ln(exp(mu))
instead
the short term solution is to never feed a log-space expected into the variance or pvalue steps when using the lognormal model
in the short run, the workaround is to log the counts up-front (before the expected model computation) and then compute the expected model in unlogged space (but on the logged counts), and then finally finish by computing variance and pvalues using the normal model; this is equivalent and simpler in the sense that it avoids "unusual" conversion via exp(mu)
and also avoids performing e.g. the donut expected in log space (no easy interpretation)
in the long run, it might be nice to allow the variance and pvalue steps to accept either exp(mu)
or the true expected value as input (probably triggered by the addition of a new kwarg/flag on these functions/tools)
the base functionality to accomplish this could use this function:
def convert(mu=None, sigma2=None, exp=None, var=None, emu=None):
"""
Convert parameters between normal and lognormal models.
Parameters
----------
mu : float or np.ndarray, optional
Mean of normal distribution.
sigma2 : float or np.ndarray, optional
Variance of normal distribution.
exp : float or np.ndarray, optional
Mean of lognormal distribution.
var : float or np.ndarray, optional
Variance of lognormal distribution.
emu : float or np.ndarray, optional
Geometric mean of the lognormal distribution, ``np.expm1(mu)``.
Returns
-------
float or np.ndarray
The variance of the normal distribution if ``var`` was passed, or the
variance of the lognormal distribution if ``disp`` was passed.
Examples
--------
>>> import numpy as np
>>> import scipy.stats as stats
>>> from lib5c.algorithms.variance.lognorm_dispersion import convert
>>> mu = 5
>>> sigma2 = 1
>>> emu = np.expm1(mu)
>>> dist_norm = stats.norm(mu, np.sqrt(sigma2))
>>> dist_lognorm = stats.lognorm(s=np.sqrt(sigma2), scale=np.exp(mu))
>>> exp, var = dist_lognorm.stats('mv')
>>> np.allclose(convert(mu=mu, sigma2=sigma2), var, rtol=0, atol=1e-10)
True
>>> np.allclose(convert(emu=emu, sigma2=sigma2), var, rtol=0, atol=1e-10)
True
>>> np.allclose(convert(exp=exp, var=var), sigma2, rtol=0, atol=1e-15)
True
>>> np.equal(convert(emu=emu, var=var), sigma2)
True
>>> np.equal(convert(emu=emu, var=convert(emu=emu, sigma2=sigma2)), sigma2)
True
"""
if exp is not None and sigma2 is not None:
mu = np.log(exp) - sigma2 / 2
if emu is not None:
mu = np.log1p(emu)
if mu is not None and sigma2 is not None:
return (np.exp(sigma2) - 1) * np.exp(2 * mu + sigma2)
if var is not None and mu is not None:
# mu presumably available from emu, no access to exp
# https://www.wolframalpha.com/
# input/?i=solve+v%3D(exp(s)-1)*exp(2*m%2Bs)+for+s
return np.log(
0.5 * (np.sqrt(np.exp(2*mu) + 4*var) / np.sqrt(np.exp(2*mu)) + 1))
if var is not None and exp is not None:
return np.log(1 + var / exp ** 2)
an incomplete list of code steps that would need to be adapted to handle either exp(mu)
or a real expected value are
- conversion from disp to var under lognormal/loglogistic models in
estimate_variance()
,local_variance()
,mle_variance()
,local_variance()
,cross_rep_variance()
- conversion from exp to mu under lognormal/loglogistic models in
local_variance()
convert_parameters(..., log=True)
callslog_parameters()
, but whenexp(mu)
is passed it should usesigma_2 = convert(emu=mu, sigma2=sigma_2); mu = np.log1p(mu)
- all entrypoints would need to accept and pass through a flag to signal whether their exp parameter should be interpreted as
exp(mu)
or as a real expected value
Comments (2)
-
reporter -
reporter - marked as trivial
- Log in to comment
this is unlikely to get a fix since this code path is not currently used and isn’t expected to be in demand in the future
if this does get an overhaul in the future, it’s likely that we will have to think through the implications of using log-counts expected models with non-logged observed counts from the ground up