support lognormal model fed by exp(mu) instead of mean of unlogged counts

the original lognormal models (used for variance fitting and pvalue calling) assume that they will be passed unlogged observed values and expected values that are true expected values of those unlogged observed values

they then use simple properties of the lognormal distribution to convert these to the mean and variance of the related normal distribution lnX ~ N(mu, sigma2), using the functions in lib5c.algorithms.variance.lognorm_dispersion during variance fitting, and lib5c.util.distributions.log_parameters() during pvalue calling (these functions differ only in what parameters they take as input and what they return)

when the expected value comes from a model that is fitted in log space (e.g., log-counts 1-D expected models and/or expected models corrected with the donut passing donut=True and log_donut=True or -D/--donut_correction and -O/--log_donut), the expected is not actually the expected value of the unlogged counts; instead it is exp(mu), the exponential of the mean of the unlogged counts (also the geometric mean and the median of the unlogged counts)

in this case, the lognormal models (both for variance and pvalue calling) incorrectly assume that the expected being passed to them is actually a true expected value, and incorrectly apply the lognormal <-> normal conversion, when they should instead compute mu = ln(exp(mu)) instead

the short term solution is to never feed a log-space expected into the variance or pvalue steps when using the lognormal model

in the short run, the workaround is to log the counts up-front (before the expected model computation) and then compute the expected model in unlogged space (but on the logged counts), and then finally finish by computing variance and pvalues using the normal model; this is equivalent and simpler in the sense that it avoids "unusual" conversion via exp(mu) and also avoids performing e.g. the donut expected in log space (no easy interpretation)

in the long run, it might be nice to allow the variance and pvalue steps to accept either exp(mu) or the true expected value as input (probably triggered by the addition of a new kwarg/flag on these functions/tools)

the base functionality to accomplish this could use this function:

def convert(mu=None, sigma2=None, exp=None, var=None, emu=None):
    """
    Convert parameters between normal and lognormal models.

    Parameters
    ----------
    mu : float or np.ndarray, optional
        Mean of normal distribution.
    sigma2 : float or np.ndarray, optional
        Variance of normal distribution.
    exp : float or np.ndarray, optional
        Mean of lognormal distribution.
    var : float or np.ndarray, optional
        Variance of lognormal distribution.
    emu : float or np.ndarray, optional
        Geometric mean of the lognormal distribution, ``np.expm1(mu)``.

    Returns
    -------
    float or np.ndarray
        The variance of the normal distribution if ``var`` was passed, or the
        variance of the lognormal distribution if ``disp`` was passed.

    Examples
    --------
    >>> import numpy as np
    >>> import scipy.stats as stats
    >>> from lib5c.algorithms.variance.lognorm_dispersion import convert
    >>> mu = 5
    >>> sigma2 = 1
    >>> emu = np.expm1(mu)
    >>> dist_norm = stats.norm(mu, np.sqrt(sigma2))
    >>> dist_lognorm = stats.lognorm(s=np.sqrt(sigma2), scale=np.exp(mu))
    >>> exp, var = dist_lognorm.stats('mv')
    >>> np.allclose(convert(mu=mu, sigma2=sigma2), var, rtol=0, atol=1e-10)
    True
    >>> np.allclose(convert(emu=emu, sigma2=sigma2), var, rtol=0, atol=1e-10)
    True
    >>> np.allclose(convert(exp=exp, var=var), sigma2, rtol=0, atol=1e-15)
    True
    >>> np.equal(convert(emu=emu, var=var), sigma2)
    True
    >>> np.equal(convert(emu=emu, var=convert(emu=emu, sigma2=sigma2)), sigma2)
    True
    """
    if exp is not None and sigma2 is not None:
        mu = np.log(exp) - sigma2 / 2
    if emu is not None:
        mu = np.log1p(emu)
    if mu is not None and sigma2 is not None:
        return (np.exp(sigma2) - 1) * np.exp(2 * mu + sigma2)
    if var is not None and mu is not None:
        # mu presumably available from emu, no access to exp
        # https://www.wolframalpha.com/
        # input/?i=solve+v%3D(exp(s)-1)*exp(2*m%2Bs)+for+s
        return np.log(
            0.5 * (np.sqrt(np.exp(2*mu) + 4*var) / np.sqrt(np.exp(2*mu)) + 1))
    if var is not None and exp is not None:
        return np.log(1 + var / exp ** 2)

an incomplete list of code steps that would need to be adapted to handle either exp(mu) or a real expected value are

conversion from disp to var under lognormal/loglogistic models in estimate_variance(), local_variance(), mle_variance(), local_variance(), cross_rep_variance()
conversion from exp to mu under lognormal/loglogistic models in local_variance()
convert_parameters(..., log=True) calls log_parameters(), but when exp(mu) is passed it should use sigma_2 = convert(emu=mu, sigma2=sigma_2); mu = np.log1p(mu)
all entrypoints would need to accept and pass through a flag to signal whether their exp parameter should be interpreted as exp(mu) or as a real expected value

Comments (2)