CartPole Learning visualization changed

The figures in the tutorial do not match the current code. I actually prefer the old visualizations which

chooses at random an optimal action instead of the one with the lowest index
has a higher resolution than the discretization

The old version made it very easy to see which areas of the value function / policy are the same for all actions and which not. I suggest trying to restore the old behavior but I am open for other proposals.

Comments (1)