Check for Independence in Joint Distributions; Two independent variables are not independent?

Issue #50 invalid
Former user created an issue

Hello,

I don't know if this has been covered before, but from searching I could not find anything.

I'm playing around with Lea (which is great!) and I'm trying to determine if some of the variables in a dataset I'm playing with are independent.

Right now, I'm just trying to apply the formula P(AB) = P(A)*P(B) to check for independence. If it returns true, the two variables are independent else false.

Here is the example I'm using, which should return TRUE because both of the dice are independent.

Am I approaching this correctly, or am I missing something?

Thanks,

die_1 = lea.vals(1,2,3,4,5,6)
die_2 = lea.vals(1,2,3,4,5,6)

lea.joint(die_1, die_2) == die_1 * die_2

>> False : 1.0

Comments (4)

  1. Pierre Denis repo owner

    Hi, Sorry for this late answer. For some unknown reason, I was not notified by e-mail about this specific issue.

    Actually, the formula P(AB) = P(A)*P(B) is about comparing probability values, where A, B are events. Your last statement compare probability distributions, which is something else. Furthermore, you try to compare probability distribution, which have no support value in common:

    • lea.joint(die_1, die_2) is the probability distribution of each of the 36 combinations (1,1), (1,2), …, (6,6), each having a probability 1/36
    • die_1 * die_2 is the probability distribution of the product of two dice, the values going from 1x1=1 to 6x6=36, with non-uniform prob. distribution

    The values within the two probability distributions being different in format (tuples vs integers), they can never be equal, hence the answer False : 1.0

    Now, you have several ways to get evidences of independence. Here are some checks based on specific values (you can change these values as you wish):

    >>> P = lea.P
    >>> P((die_1==5)&(die_2==6)) == P(die_1==5) * P(die_2==6)
    True
    >>> P((die_1==4).given(die_2==3)) == P(die_1==4)
    True
    

    If you want to avoid such tedious tests, you can also use information theory (see wiki page 3) :

    >>> lea.mutual_information(die_1,die_2)
    2.6645352591003757e-15
    >>> lea.joint_entropy(die_1,die_2) - die_1.entropy - die_2.entropy
    -2.6645352591003757e-15
    

    These small values are to be interpreted as 0.0 (rounding errors unavoidable with float representation). As an exercise, if you want to show what happen with dependent variables, you could replace die_2 as

    die_2 = 7 – die_1
    

    Re-evaluating then the previous expressions shall exhibit different results since the two dice are now (strongly) interdependent.

    Hope this helps, despite the delay! Do not hesitate to submit more complex cases.

  2. Log in to comment