-
assigned issue to
- edited description
Unify Pendulum and Cartpole base classes
Pendulum*
and Cartpole*
classes should be unified to use a single Cartpole
base class.
Comments (6)
-
reporter -
will address for sunday
-
reporter What about the following class structure:
-
CartPoleBase
: contains- all the dynamics (step, all constants etc)
- a function plot_policy(pi) and plot_valfun(pi) which take 2-dimensional inputs
- a plot_cart(state, action) function which plots the state and takes a 4-dim. state
We use the following subclasses
InfTrackCartPole
:- state property which omits 2 dimensions
- showLearning that computes V and pi and calls the helper functions to plot
- redefine GROUND_VERTS to not show boundaries
- showDomain: call plot_cart with omited state dimensions set to 0 (the view is always centered around the cart on the infinite track)
FiniteTrackCartPole
- showLearning computes V and pi for a centered cart with 0 velocity, calls helper functions + adds a text to the plot which states which slice of V and pi is shown
- showDomain: calls helper functions to plot
The actual classes then just inherit from
InfTrackCartPole
orFiniteTrackCartPole
.What do you think? This way we avoid lots of code duplication and it should be clearer what the
Pendulum
domain actually is. -
-
That's possible. The original reason we split into separate domains though was that different sources actually used different constants for cartpole vs. pendulum.
EG "Cartpole" domain from Lagoudakis + Parr (2003) calls out 50N for force on cart http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.8770&rep=rep1&type=pdf while the "pendulum" domain from Sutton + Barto (1998) calls out 1 N*m torques http://code.google.com/p/rl-library/wiki/CartpoleJava.
These 'constants' are arbitrarily different for the different tasks (swingup vs inverted balance) as well.
This has been an ongoing headache with these domains.
For now how about we go with your scheme and override domain constants for
Inf
vs.Finite
using the 2 standards above, ignoring the arbitrary differences for the different tasks? -
reporter I think this is a good idea, Bobby. We can overwrite constants when necessary in the second or third inheritance level. We can even overwrite the integration method (see
Cartpole.int_type
) without touching the dynamics implementation itself.Btw: we should refer to Rich Sutton's original code at http://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c instead of the rl-library implementation.
-
- changed status to resolved
addressed by merge of cartpole_unified
- Log in to comment