-
assigned issue to
Mutual information doesn't seem to work for Markov Chain
Please see the simple example below.
=====
from lea import markov
weather = markov.chain_from_matrix(('sunny','rainy'),
('sunny',(0.95, 0.05)),
('rainy',(0.8, 0.2)))
today=weather.next_state('sunny') # weather today, yesterday was sunny
tomorrow=today.next_state() # weather tomorrow
lea.mutual_information(today,tomorrow)
====
The computed mutual information is nearly zero. But the weathers for today and tomorrow should be highly correlated.
Comments (13)
-
repo owner -
repo owner Hi Samuel,
Interesting issue! I agree that this is not the expected result. The root cause is that, in
lea.markov
design, the calculated distributions have no interdependency. In the test above,today
andtomorrow
are independent distributions. So, their mutual information is zero (or something close to zero, due to float rounding errors). It’s agreed that this is undocumented and counterintuitive, considering other Lea’s constructs.Here should be a provisional workaround.
>>> tomorrow2 = weather._next_state_tlea.given(weather.state==today) >>> tomorrow2 rainy : 0.0575 sunny : 0.9425
This is actually the same distribution as
tomorrow
, but with an internal representation that keeps the dependency withtoday
. You get now:>>> lea.mutual_information(today, tomorrow2) 0.00926634104120394
This is still low but much higher than the previous result. This value is a bit surprising but it may be correct provided that there is not much uncertainty about sunny tomorrow, whatever today weather. Actually, the expanded formula is the following
>>> today.entropy + tomorrow2.entropy - lea.joint(today,tomorrow2).entropy 0.00926634104120394
Could you please verify this value on your side, by other means?
I need some time to think about a sensible fix in Lea. This is not that easy because it involves some design choices in
lea.markov
module. By the way, thetomorrow2
object above is not aware of any Markov chain and, in particular, it does not supportnext_state
method. -
reporter Pierre, thanks for the quick reply!
Somehow, it seems the number is off somehow. I tried to create a steady weather state by
today=weather.next_state('sunny',1000)
Then explicitly compute tomorrow’s weather as
tomorrow = today.switch({'sunny':lea.pmf({'sunny': 0.95, 'rainy': 0.05}),'rainy':lea.pmf({'sunny': 0.8, 'rainy': 0.2})})
Then
>>> lea.mutual_information(today,tomorrow) 0.010740523089006304
I also tried to compute the conditional entropy explicitly and compute mutual information I(W1;W0) as H(W1) - H(W1|W0). The result seems to be consistent as above.
from lea import P today=weather.next_state('sunny',1000) # H(W1|W0)=P('sunny')H(W1|W0='sunny')+P('rainy')H(W1|W0='rainy') cond_entropy = P(today=='sunny')*lea.event(0.95).entropy \ + P(today=='rainy')*lea.event(0.2).entropy # I(W1;W0)=H(W1)-H(W1|W0) print(f'Mutual information = {today.entropy - cond_entropy}')
-
repo owner Great! You found another workaround and also a different way to calculate the MI (I was not aware of this formula). BTW, should you ignore it, Lea provides out of the box a conditional entropy method.
>>> today.entropy - today.cond_entropy(tomorrow) 0.010740523089006304
All these results are consistent, which is good news! As stated in my reply above, I'm currently thinking about a definitive fix, without side effects nor backward incompatibilities. Stay tuned...
-
reporter That’s great! Looking forward to the fix.
-
repo owner Redesign markov module to keep dependencies in next_state methods; add StateLea, StateTlea and StateIlea classes (refs
#63)→ <<cset 17ed1a137071>>
-
repo owner Add toolbox.gen_all_slots function and change Lea.internal method to visit all slots of Lea instance (refs
#63)→ <<cset 94a528a9d61f>>
-
repo owner Add/update tests in markov_test.py (refs
#63)→ <<cset 07eb9c25f584>>
-
repo owner This is now fixed, … after quite deep changes in lea.markov module (this was expected).
weather = markov.chain_from_matrix(('sunny','rainy'), ('sunny',( 0.95 , 0.05 )), ('rainy',( 0.80 , 0.20 ))) today = weather.next_state('sunny',1000) tomorrow = today.next_state() lea.mutual_information(today,tomorrow) # -> 0.010740523089006304
The two variables now keep their dependency. This can be seen also, for example, by calculating the joint distribution of transitions:
today + " -> " + tomorrow #-> rainy -> rainy : 0.011764705882352943 # rainy -> sunny : 0.04705882352941177 # sunny -> rainy : 0.04705882352941177 # sunny -> sunny : 0.8941176470588236
(this expression give different result in Lea 3.4.0 or below).
This fix will be included in Lea 3.4.1, to be released very soon.
-
repo owner - changed status to resolved
-
repo owner Note that the fix introduces a side-effect however. For some
next_state
calls, ifn
is too large, a RecursionError may be raised.weather.next_state(weather.state,1000) #-> lea.lea.Error: RecursionError raised - HINT: decrease the value of n in next_state() call or add argument keeps_dependency=False, provided that keeping dependency with initial state is not required
There are two algorithms for
next_state
, the default one keeping dependencies when required, which is more demanding in resources. The previous example withtoday
started from a certain initial state'sunny'
, allowing to use a simple algorithm because no dependency is involved with a certain event. Here, we haveweather.state
as initial state, which may require to keep dependency (as for calculatingtomorrow
above).If the distribution has just to be calculated, without keeping track of dependency (i.e. no need to calculate mutual information, joints, conditional probabilities, etc.), the workaround is to add argument
keeps_dependency=False
:weather.next_state(weather.state,1000,keeps_dependency=False) sunny : 0.9411764705882353 rainy : 0.058823529411764705
or simply:
weather.next_state(weather.state,1000,False)
This gives the correct result… but calling afterwards
mutual_information
with this distribution will 0 (or close to), as before the bug fix. -
repo owner Add tests for information theory (refs
#63)→ <<cset 1b874639460b>>
-
repo owner Lea 3.4.1 just released on https://pypi.org/project/lea/3.4.1/
- Log in to comment