when running the tests on python 3.3, there is a fluctuating number of test failures (from no failures up to 2 failures).

usually caused by different ordering, so I guess there is some place in the code relying on order for datastructures that have no specific order defined (like dicts or sets).

