- edited description
Scaling exact inference (optimization)
Lea scales poorly on several queries. Consider the following example, adapted from this paper:
N = 20
x = event(0.1)
y = if_(x, event(0.2), event(0.3))
for _ in range(N):
y = lea.if_(y, event(0.4), event(0.5))
P(y)
# -> 0.45454545454924317
This takes more than 10 s on a recent CPU, with exponential time on N. Actually, Lea's exact algorithm ("Statues") generates atomic probabilities for each case in the sample space, which is quite big here; then, it aggregates all these probabilities to get final probabilities of True
and False
. This algorithm, which treats probabilistic DAG in full generality, takes the query as a whole and does not perform factorization (see above-cited paper). This can be emphasized also by using symbolic computation on a smaller problem:
x = event('a')
y = if_(x, event('b'), event('c'))
z = if_(y, event('d'), event('e'))
P(z)
# -> a*b*d - a*e*(b - 1) - c*d*(a - 1) + e*(a - 1)*(c - 1)
The obtained formula is indicative of calculations made on usual probability numbers. A factorized formula could be
# -> d*(a*b - c*(a - 1)) - e*(a*(b - 1) - (a - 1)*(c - 1))
which contains 6 multiplications instead of 8. The benefit could be much larger for bigger problems, as the one above: the number of multiplications could grow linearly with N, instead of exponentially.
The calculations in such examples could be dramatically reduced by using a divide-and-conquer approach, calculating each y
one by one. This is feasible here due to the tree structure of the graph (a tree is a special case of DAG). More generally, any DAG which contains independent sub-DAG could be pre-processed to evaluate such independent sub-DAG one by one. Lea’s exact algorithm could be optimized by making such preprocessing, which could result on faster calculations, as well as factorized formula in the context of symbolic computing.
Comments (7)
-
reporter -
reporter - edited description
-
reporter Add methods in Lea class for optimisation: determination of pivotal, anti-pivotal, dependent and independent nodes, pre-calculation of independent nodes (refs
#60)→ <<cset ee16e36a7478>>
-
reporter Change sympy simplification function from 'factor' to 'simplify' and updates simplification strategy (refs
#60)Signed-off-by: Cocoden7 corentin.denis@live.be
→ <<cset 66b8d478bc3e>>
-
reporter - edited description
-
reporter Minor editorial updates (refs
#60)→ <<cset 949433a5a63c>>
-
reporter - changed status to resolved
- Log in to comment