py3 compatibility and consistency bug

Merged
#9 · Created  · Last updated

Merged pull request

Merged in vene/best_evaluation/py3 (pull request #9)

7cf1fd3·Author: ·Closed by: ·2017-07-11

Description

When trying to run the best evaluator on py3, after doing the minor changes needed (in the first commit of this PR), I ran into an issue where I got slightly different scores between py2 and py3.

Tracing it down, I found that the source is due to a hidden reliance on the order of the pst lists. In particular, when the same gold tuple appears in multiple matches_for_pst with the same priority, it can end up assigned to different predicted tuples in the optimal_match_for_predict dictionary, depending on the order of iteration.

The order of the generated private state tuple lists is currently undefined, because the list is constructed first as a dictionary. Prior to python v3.6, python dictionaries iterate in undefined order, and even in py36 the fixed order behaviour should not be relied upon, as it may change.

In this PR I sort all private state tuple lists, upon creation, by the string representation of their first three fields. This leads to slightly different scores compared to master when the issue is hit, but, in exchange, the scores should now be consistent on all architectures, operating systems, and python versions.

A better solution might be to explore all branches when such cases occur, and make sure that the highest score is returned, but I am not familiar enough with the scorer to implement this yet.

0 attachments

0 comments

Loading commits...