Commits

Robert Klein committed ff5adce

added domain, policy, rep tutorials and examples

Comments (0)

Files changed (9)

    install
    tutorial
    make_agent
+   make_rep
+   make_domain
+   make_policy
    faq
    api/index
 

doc/make_agent.rst

 timesteps.
 At each Experiment timestep the Agent receives some observations from the Domain
 which it uses to update the value function Representation of the Domain
-(ie, on each call to its :py:meth:`~Agents.Agent.Agent.learn` function).
+(ie, on each call to its :func:`~Agents.Agent.Agent.learn` function).
 The Policy is used to select an action to perform.
 This process (observe, update, act) repeats until some goal or fail state,
 determined by the Domain, is reached. At this point the
     __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", 
                     "William Dabney", "Jonathan P. How"]
     __license__ = "BSD 3-Clause"
-    __author__ = "Christoph Dann"
+    __author__ = "Tim Beaver"
 
 Fill in the appropriate ``__author__`` name and ``__credits__`` as needed.
 Note that RLPy requires the BSD 3-Clause license.
   Policy, and Domain XX Remove additional params eg boyan? XX in the 
   ``__init__()`` function
 
-* Your code should be appropriately handle the case where ``logger=None`` is 
+* Your code should appropriately handle the case where ``logger=None`` is 
   passed to ``__init__()``.
 
-* The new learning agent need only define the :func:`~Agents.Agent.Agent.learn` function, (see
-  linked documentation) which is called on every timestep.
+* Once completed, the className of the new agent must be added to the
+  ``__init__.py`` file in the ``Agents/`` directory.
+  (This allows other files to import the new agent).
+
+* After your agent is complete, you should define a unit test XX Add info here XX
+
+REQUIRED Instance Variables
+"""""""""""""""""""""""""""
+---
+
+REQUIRED Functions
+""""""""""""""""""
+:func:`~Agents.Agent.Agent.learn` - called on every timestep (see documentation)
 
   .. Note:: 
 
       at its end.  This allows adaptive representations to add new features
       (no effect on fixed ones).
 
-* Once completed, the className of the new agent must be added to the
-  ``__init__.py`` file in the ``Agents/`` directory.
-  (This allows other files to import the new agent).
-
-* After your agent is complete, you should define a unit test XX Add info here XX
-
 
 Additional Information
 ----------------------
   Your code should be appropriately handle the case where ``logger=None`` is 
   passed to ``__init__()``.
 
-* You should write values assigned to custom parameters when ``__init__()`` is called.
+* You should log values assigned to custom parameters when ``__init__()`` is called.
 
-* See :class:`~Agents.Agent.Agent` for functions provided by the ``Agent`` superclass.
+* See :class:`~Agents.Agent.Agent` for functions provided by the superclass.
 
 
 
 --------------------------------------
 In this example, we will create the standard SARSA learning agent (without 
 eligibility traces (ie the λ parameter= 0 always)).
-This algorithm first computes the Temporal Difference Error
-(see, `Sutton and Barto's *Reinforcement Learning* (1998) <http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html>`_ 
-or `Wikipedia: <http://en.wikipedia.org/wiki/Temporal_difference_learning>`_),
+This algorithm first computes the Temporal Difference Error,
 essentially the difference between the prediction under the current 
-value function and what was actually observed.
+value function and what was actually observed
+(see e.g. `Sutton and Barto's *Reinforcement Learning* (1998) <http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html>`_ 
+or `Wikipedia <http://en.wikipedia.org/wiki/Temporal_difference_learning>`_).
 It then updates the representation by summing the current function with 
 this TD error, weighted by a factor called the *learning rate*.
 
        from Agent import Agent
        import numpy
 
-#. Declare the class, create needed members variables, and write a 
-   docstring description::
+#. Declare the class, create needed members variables (here a learning rate),
+   described above) and write a docstring description::
+
        class SARSA0(Agent):
            """
            Standard SARSA algorithm without eligibility trace (ie lambda=0)
            """
            learning_rate = 0 # The weight on TD updates ('alpha' in the paper)
-#. Copy the __init__ declaration from ``Agent.py``, add the learning_rate
-   parameter, and log the passed value::
+
+#. Copy the __init__ declaration from ``Agent.py``, add needed parameters
+   (here the learning_rate) and log them.  Then call the superclass constructor::
+
        def __init__(self, logger, representation, policy, domain, learning_rate=0.1):
            self.learning_rate = learning_rate
            super(SARSA0,self).__init__(representation,policy,domain,logger,initial_alpha,alpha_decay_mode, boyan_N0)
            if logger:
                self.logger.log("Learning rate:\t\t%0.2f" % learning_rate)
 
-#. Copy the learn() declaration, compute the td-error, and use it to update
+#. Copy the learn() declaration and implement accordingly.
+   Here, compute the td-error, and use it to update
    the value function estimate (by adjusting feature weights)::
+
       def learn(self,s,p_actions, a, r, ns, np_actions, na,terminal):
    
            # The previous state could never be terminal
            if terminal:
                self.episodeTerminated()
 
+.. note::
+
+    You can and should define helper functions in your agents as needed, and 
+    arrange class heirarchy. (See eg TDControlAgent.py)
+
 
 That's it! Now add your new agent to ``Agents/__init__.py``::
 
-    ``from SARSA0 import SARSA0``
+    from SARSA0 import SARSA0
 
 Finally, create a unit test for your agent XX XX.
 

doc/make_domain.rst

+.. _make_domain:
+
+.. this is a comment. see http://sphinx-doc.org/rest.html for markup instructions
+
+Creating a New Domain
+=====================
+
+This tutorial describes the standard RLPy 
+:class:`~Domains.Domain.Domain` interface,
+and illustrates a brief example of creating a new problem domain.
+
+.. Below taken directly from Domain.py
+
+The Domain controls the environment in which the
+:class:`~Agents.Agent.Agent` resides as well as the reward function the
+Agent is subject to.
+
+The Agent interacts with the Domain in discrete timesteps called
+*episodes* (see :func:`~Domains.Domain.Domain.step`).
+At each step, the Agent informs the Domain what indexed action it wants to
+perform.  The Domain then calculates the effects this action has on the
+environment and updates its internal state accordingly.
+It also returns the new state (*ns*) to the agent, along with a reward/penalty, (*r*)
+and whether or not the episode is over (*terminal*), in which case the agent
+is reset to its initial state.
+
+This process repeats until the Domain determines that the Agent has either
+completed its goal or failed.
+The :py:class:`~Experiments.Experiment.Experiment` controls this cycle.
+
+Because Agents are designed to be agnostic to the Domain that they are
+acting within and the problem they are trying to solve, the Domain needs
+to completely describe everything related to the task. Therefore, the
+Domain must not only define the observations that the Agent receives,
+but also the states it can be in, the actions that it can perform, and the
+relationships between the three.
+
+.. warning::
+    While each dimension of the state *s* is either *continuous* or *discrete*,
+    discrete dimensions are assume to take nonnegative **integer** values 
+    (ie, the index of the discrete state).
+        
+.. note ::
+    You may want to review the namespace / inheritance / scoping 
+    `rules in Python <https://docs.python.org/2/tutorial/classes.html>`_.
+
+
+Requirements 
+------------
+
+* At the top of the file (before the class definition), include the heading::
+
+    __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+    __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", 
+                    "William Dabney", "Jonathan P. How"]
+    __license__ = "BSD 3-Clause"
+    __author__ = "Tim Beaver"
+
+Fill in the appropriate ``__author__`` name and ``__credits__`` as needed.
+Note that RLPy requires the BSD 3-Clause license.
+
+* If available, please include a link or reference to the publication associated 
+  with this implementation (and note differences, if any).
+
+* Each Domain must be a subclass of 
+  :class:`~Domains.Domain.Domain` and call the 
+  :func:`~Domains.Domain.__init__` function of the 
+  Domain superclass.
+
+* Accordingly, each Domain must be instantiated with a Logger (or None)
+  in the ``__init__()`` function. Your code should appropriately handle 
+  the case where ``logger=None`` is passed to ``__init__()``.
+
+* Once completed, the className of the new agent must be added to the
+  ``__init__.py`` file in the ``Domains/`` directory.
+  (This allows other files to import the new Domain).
+
+* After your Domain is complete, you should define a unit test XX Add info here XX
+
+
+REQUIRED Instance Variables
+"""""""""""""""""""""""""""
+The new Domain *MUST* set these variables *BEFORE* calling the
+superclass ``__init__()`` function:
+
+#. ``self.statespace_limits`` - Bounds on each dimension of the state space. 
+   Each row corresponds to one dimension and has two elements [min, max].
+   Used for discretization of continuous dimensions.
+
+#. ``self.continuous_dims`` - array of integers; each element is the index 
+   (eg, row in ``statespace_limits`` above) of a continuous-valued dimension.
+   This array is empty if all states are discrete.
+
+#. ``self.DimNames`` - array of strings, a name corresponding to each dimension
+   (eg one for each row in ``statespace_limits`` above)
+
+#. ``self.episodeCap`` - integer, maximum number of steps before an episode
+   terminated (even if not in a terminal state).
+
+#. ``actions_num`` - integer, the total number of possible actions (ie, the size
+   of the action space).  This number **MUST** be a finite integer - continuous action
+   spaces are not currently supported.
+
+#. ``gamma`` - float, the discount factor by which rewards are reduced.
+
+
+REQUIRED Functions
+""""""""""""""""""
+#. :func:`~Domains.Domain.Domain.s0`,
+   (see linked documentation), which returns a (possibly random) state in the 
+   domain, to be used at the start of an *episode*.
+
+#. :func:`~Domains.Domain.Domain.step`,
+   (see linked documentation), which returns the tuple ``(r,ns,terminal, pa)`` 
+   that results from taking action *a* from the current state (internal to the Domain).
+
+   * *r* is the reward obtained during the transition
+   * *ns* is the new state after the transition
+   * *terminal*, a boolean, is true if the new state *ns* is a terminal one to end the episode
+   * *pa*, an array of possible actions to take from the new state *ns*.
+
+
+SPECIAL Functions
+"""""""""""""""""
+In many cases, the Domain will also override the functions:
+
+#. :func:`~Domains.Domain.Domain.isTerminal` - returns a boolean whether or
+   not the current (internal) state is terminal. Default is always return False.
+#. :func:`~Domains.Domain.Domain.possibleActions` - returns an array of
+   possible action indices, which often depend on the current state.
+   Default is to enumerate **every** possible action, regardless of current state.
+
+
+OPTIONAL Functions
+""""""""""""""""""
+Optionally, define / override the following functions, used for visualization:
+
+#. :func:`~Domains.Domain.Domain.showDomain` - Visualization of domain based
+   on current internal state and an action, *a*.
+   Often the header will include an optional argument *s* to display instead 
+   of the current internal state.
+   RLPy frequently uses `matplotlib <http://matplotlib.org/>`_
+   to accomplish this - see the example below.
+#. :func:`~Domains.Domain.Domain.showLearning` - Visualization of the "learning"
+   obtained so far on this domain, usually a value function plot and policy plot.
+   See the introductory tutorial for an example on :class:`~Domains.Gridworld.GridWorld`
+
+XX expectedStep(), XX
+
+
+Additional Information
+----------------------
+
+* As always, the Domain can log messages using ``self.logger.log(<str>)``, see 
+  :func:`Tools.Logger.log`. 
+  Your code should be appropriately handle the case where ``logger=None`` is 
+  passed to ``__init__()``.
+
+* You should log values assigned to custom parameters when ``__init__()`` is called.
+
+* See :class:`~Domains.Domain.Domain` for functions 
+  provided by the superclass, especially before defining 
+  helper functions which might be redundant.
+
+
+
+Example: Creating the ``ChainMDP`` Domain
+-----------------------------------------------------------
+In this example we will recreate the simple ``ChainMDP`` Domain, which consists
+of *n* states that can only transition to *n-1* or *n+1*:
+``s0 <-> s1 <-> ... <-> sn`` \n
+The goal is to reach state ``sn`` from ``s0``, after which the episode terminates.
+The agent can select from two actions: left [0] and right [1] (it never remains in same state).
+But the transitions are noisy, and the opposite of the desired action is taken 
+instead with some probability.
+Note that the optimal policy is to always go right.
+
+#. Create a new file in the ``Domains/`` directory, ``ChainMDPTut.py``.
+   Add the header block at the top::
+
+       __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+       __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
+                      "William Dabney", "Jonathan P. How"]
+       __license__ = "BSD 3-Clause"
+       __author__ = "Ray N. Forcement"
+       
+       from Tools import *
+       from Domain import Domain
+       import numpy as np
+
+#. Declare the class, create needed members variables (here several objects to
+   be used for visualization and a few domain reward parameters), and write a 
+   docstring description::
+
+   class ChainMDPTut(Domain):
+       """
+       Tutorial Domain - nearly identical to ChainMDP.py
+       """
+    #: Reward for each timestep spent in the goal region
+    GOAL_REWARD = 0
+    #: Reward for each timestep
+    STEP_REWARD = -1
+    # Used for graphical normalization
+    MAX_RETURN  = 1
+    # Used for graphical normalization
+    MIN_RETURN  = 0
+    # Used for graphical shifting of arrows
+    SHIFT       = .3
+    #:Used for graphical radius of states
+    RADIUS      = .5
+    # Stores the graphical pathes for states so that we can later change their colors
+    circles     = None
+    #: Number of states in the chain
+    chainSize   = 0
+    # Y values used for drawing circles
+    Y           = 1
+
+#. Copy the __init__ declaration from ``Domain.py``, add needed parameters
+   (here the number of states in the chain, ``chainSize``), and log them.
+   Assign ``self.statespace_limits, self.episodeCap, self.continuous_dims, self.DimNames, self.actions_num,`` 
+   and ``self.gamma``.
+   Then call the superclass constructor::
+
+       def __init__(self, chainSize=2,logger = None):
+           """
+           :param chainSize: Number of states \'n\' in the chain.
+           """
+           self.chainSize          = chainSize
+           self.start              = 0
+           self.goal               = chainSize - 1
+           self.statespace_limits  = array([[0,chainSize-1]])
+           self.episodeCap         = 2*chainSize
+           self.continuous_dims    = []
+           self.DimNames           = [`State`]
+           self.actions_num        = 2
+           self.gamma              = 0.9
+           super(ChainMDP,self).__init__(logger)
+
+#. Copy the ``step()`` and function declaration and implement it accordingly
+   to return the tuple (r,ns,isTerminal,possibleActions), and similarly for ``s0()``.
+   We want the agent to always start at state *[0]* to begin, and only achieves reward 
+   and terminates when *s = [n-1]*::
+
+       def step(self,a):
+           s = self.state[0]
+           if a == 0: #left
+               ns = max(0,s-1)
+           if a == 1: #right
+               ns = min(self.chainSize-1,s+1)
+           self.state = array([ns])
+
+           terminal = self.isTerminal()
+           r = self.GOAL_REWARD if terminal else self.STEP_REWARD
+           return r, ns, terminal, self.possibleActions()
+
+       def s0(self):
+           self.state = np.array([0])
+           return self.state, self.isTerminal(), self.possibleActions()
+
+#. In accordance with the above termination condition, override the ``isTerminal()``
+   function by copying its declaration from ``Domain.py``::
+
+       def isTerminal(self):
+           s = self.state
+           return (s[0] == self.chainSize - 1)
+
+#. For debugging convenience, demonstration, and entertainment, create a domain
+   visualization by overriding the default (which is to do nothing).
+   With matplotlib, generally this involves first performing a check to see if
+   the figure object needs to be created (and adding objects accordingly),
+   otherwise merely updating existing plot objects based on the current ``self.state``
+   and action *a*::
+
+
+    def showDomain(self, a = 0):
+        #Draw the environment
+        s = self.state
+        s = s[0]
+        if self.circles is None: # We need to draw the figure for the first time
+           fig = pl.figure(1, (self.chainSize*2, 2))
+           ax = fig.add_axes([0, 0, 1, 1], frameon=False, aspect=1.)
+           ax.set_xlim(0, self.chainSize*2)
+           ax.set_ylim(0, 2)
+           ax.add_patch(mpatches.Circle((1+2*(self.chainSize-1), self.Y), self.RADIUS*1.1, fc="w")) #Make the last one double circle
+           ax.xaxis.set_visible(False)
+           ax.yaxis.set_visible(False)
+           self.circles = [mpatches.Circle((1+2*i, self.Y), self.RADIUS, fc="w") for i in arange(self.chainSize)]
+           for i in arange(self.chainSize):
+               ax.add_patch(self.circles[i])
+               if i != self.chainSize-1:
+                    fromAtoB(1+2*i+self.SHIFT,self.Y+self.SHIFT,1+2*(i+1)-self.SHIFT, self.Y+self.SHIFT)
+                    if i != self.chainSize-2: fromAtoB(1+2*(i+1)-self.SHIFT,self.Y-self.SHIFT,1+2*i+self.SHIFT, self.Y-self.SHIFT, 'r')
+               fromAtoB(.75,self.Y-1.5*self.SHIFT,.75,self.Y+1.5*self.SHIFT,'r',connectionstyle='arc3,rad=-1.2')
+               pl.show()
+
+        [p.set_facecolor('w') for p in self.circles]
+        self.circles[s].set_facecolor('k')
+        pl.draw()
+
+.. note::
+
+    When first creating a matplotlib figure, you must call pl.show(); when
+    updating the figure on subsequent steps, use pl.draw().
+
+That's it! Now add your new Domain to ``Domains/__init__.py``::
+
+    ``from ChainMDPTut import ChainMDPTut``
+
+Finally, create a unit test for your Domain XX XX.
+
+Now test it by creating a simple settings file on the domain of your choice.
+An example experiment is given below:
+
+.. literalinclude:: ../examples/tutorial/ChainMDPTut_example.py
+   :language: python
+   :linenos:
+
+What to do next?
+----------------
+
+In this Domain tutorial, we have seen how to 
+
+* Write a Domain that inherits from the RLPy base ``Domain`` class
+* Override several base functions
+* Create a visualization
+* Add the Domain to RLPy and test it
+
+If you would like to add your new Domain to the RLPy project, email ``rlpy@mit.edu``
+or create a pull request to the 
+`RLPy repository <https://bitbucket.org/rlpy/rlpy>`_.
+

doc/make_policy.rst

+.. _make_policy:
+
+.. this is a comment. see http://sphinx-doc.org/rest.html for markup instructions
+
+Creating a New Policy
+=====================
+
+This tutorial describes the standard RLPy 
+:class:`~Policies.Policy.Policy` interface,
+and illustrates a brief example of creating a new problem domain.
+
+.. Below taken directly from Policy.py
+
+The Policy determines the discrete action that an
+:py:class:`~Agents.Agent.Agent` will take  given its current value function
+:py:class:`~Representations.Representation.Representation`.
+
+The Agent learns about the :py:class:`~Domains.Domain.Domain`
+as the two interact.
+At each step, the Agent passes information about its current state
+to the Policy; the Policy uses this to decide what discrete action the
+Agent should perform next (see :py:meth:`~Policies.Policy.Policy.pi`) \n
+
+
+.. warning::
+    While each dimension of the state *s* is either *continuous* or *discrete*,
+    discrete dimensions are assume to take nonnegative **integer** values 
+    (ie, the index of the discrete state).
+        
+.. note ::
+    You may want to review the namespace / inheritance / scoping 
+    `rules in Python <https://docs.python.org/2/tutorial/classes.html>`_.
+
+
+Requirements 
+------------
+
+* At the top of the file (before the class definition), include the heading::
+
+    __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+    __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", 
+                    "William Dabney", "Jonathan P. How"]
+    __license__ = "BSD 3-Clause"
+    __author__ = "Tim Beaver"
+
+Fill in the appropriate ``__author__`` name and ``__credits__`` as needed.
+Note that RLPy requires the BSD 3-Clause license.
+
+* If available, please include a link or reference to the publication associated 
+  with this implementation (and note differences, if any).
+
+* Each Policy must be a subclass of :class:`~Policies.Policy.Policy` and call 
+  the :func:`~Policies.Policy.__init__` function of the 
+  Policy superclass.
+
+* Accordingly, each Policy must be instantiated with a Logger (or None)
+  in the ``__init__()`` function. Your code should be appropriately handle 
+  the case where ``logger=None`` is passed to ``__init__()``.
+
+* Once completed, the className of the new agent must be added to the
+  ``__init__.py`` file in the ``Policies/`` directory.
+  (This allows other files to import the new Policy).
+
+* After your Policy is complete, you should define a unit test XX Add info here XX
+
+REQUIRED Instance Variables
+"""""""""""""""""""""""""""
+---
+
+REQUIRED Functions
+""""""""""""""""""
+#. :py:meth:`~Policies.Policy.Policy.pi` - accepts the current state *s*,
+   whether or not *s* is *terminal*, and an array of possible actions 
+   indices *p_actions* and returns an action index for the Agent to take.
+
+
+SPECIAL Functions
+"""""""""""""""""
+Policies which have an explicit exploratory component (eg epsilon-greedy)
+**MUST** override the functions below to prevent exploratory behavior
+when evaluating the policy (which would skew results)
+
+#. :py:meth:`~Policies.Policy.Policy.turnOffExploration`
+#. :py:meth:`~Policies.Policy.Policy.turnOnExploration`
+
+
+Additional Information
+----------------------
+
+* As always, the Policy can log messages using ``self.logger.log(<str>)``, see 
+  :func:`Tools.Logger.log`. 
+  Your code should be appropriately handle the case where ``logger=None`` is 
+  passed to ``__init__()``.
+
+* You should log values assigned to custom parameters when ``__init__()`` is called.
+
+* See :class:`~Policies.Policy.Policy` for functions 
+  provided by the superclass, especially before defining 
+  helper functions which might be redundant. \n
+
+* Note the useful functions provided by 
+  the :class:`~Representations.Representation.Representation``,
+  e.g. :func:`~Representations.Representation.bestActions` 
+  and :func:`~Representations.Representation.bestAction`
+  to get the best action(s) with respect to the value function (greedy).
+
+
+
+Example: Creating the ``Epsilon-Greedy`` Policy
+-----------------------------------------------------------
+In this example we will recreate the ``eGreedy`` Policy.
+From a given state, it selects the action with the highest expected value
+(greedy with respect to value function), but with some probability ``epsilon``,
+takes a random action instead.  This explicitly balances the exploration/exploitation
+tradeoff, and ensures that in the limit of infinite samples, the agent will
+have explored the entire domain.
+
+#. Create a new file in the ``Policies/`` directory, ``eGreedyTut.py``.
+   Add the header block at the top::
+
+       __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+       __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
+                      "William Dabney", "Jonathan P. How"]
+       __license__ = "BSD 3-Clause"
+       __author__ = "Ray N. Forcement"
+       
+        from Policy import *
+        import numpy as np
+
+#. Declare the class, create needed members variables, and write a 
+   docstring description.  See the role of member variables in comments::
+
+       class eGreedy(Policy):
+           """
+           From the tutorial in policy creation.  Identical to eGreedy.py.
+           """
+
+           # Probability of selecting a random action instead of greedy
+           epsilon         = None
+           # Temporarily stores value of ``epsilon`` when exploration disabled
+           old_epsilon     = None 
+           # bool, used to avoid random selection among actions with the same values
+           forcedDeterministicAmongBestActions = None
+
+#. Copy the ``__init__()`` declaration from ``Policy.py`` and add needed parameters. 
+   In the function body, assign them and log them.
+   Then call the superclass constructor.
+   Here the parameters are the probability of 
+   selecting a random action, ``epsilon``, and how to handle the case where 
+   multiple best actions exist, ie with the same 
+   value, ``forcedDeterministicAmongBestActions``::
+
+       def __init__(self,representation,logger,epsilon = .1,
+                     forcedDeterministicAmongBestActions = False):
+           self.epsilon = epsilon
+           self.forcedDeterministicAmongBestActions = forcedDeterministicAmongBestActions
+           super(eGreedy,self).__init__(representation,logger)
+           if self.logger:
+               self.logger.log("=" * 60)
+               self.logger.log("Policy: eGreedy")
+               self.logger.log("Epsilon\t\t{0}".format(self.epsilon))
+
+
+#. Copy the ``pi()`` declaration from ``Policy.py`` and implement it to return
+   an action index for any given state and possible action inputs.
+   Here, with probability epsilon, take a random action among the possible.
+   Otherwise, pick an action with the highest expected value (depending on
+   ``self.forcedDeterministicAmongBestActions``, either pick randomly from among
+   the best actions or always select the one with lowest index::
+
+       def pi(self,s, terminal, p_actions):
+           coin = np.random.rand()
+           #print "coin=",coin
+           if coin < self.epsilon:
+               return np.random.choice(p_actions)
+           else:
+               b_actions = self.representation.bestActions(s, terminal, p_actions)
+               if self.forcedDeterministicAmongBestActions:
+                   return b_actions[0]
+               else:
+                   return np.random.choice(b_actions)
+
+#. Because this policy has an exploratory component, we must override the
+   ``turnOffExploration()`` and ``turnOnExploration()`` functions, so that when
+   evaluating the policy's performance the exploratory component may be
+   automatically disabled so as not to influence results::
+
+       def turnOffExploration(self):
+           self.old_epsilon = self.epsilon
+           self.epsilon = 0
+       def turnOnExploration(self):
+           self.epsilon = self.old_epsilon
+
+
+.. warning::
+
+    If you fail to define ``turnOffExploration()`` and ``turnOnExploration()``
+    for functions with exploratory components, measured algorithm performance
+    will be worse, since exploratory actions by definition are suboptimal based
+    on the current model.
+
+That's it! Now add your new Policy to ``Policies/__init__.py``::
+
+    ``from eGreedyTut import eGreedyTut``
+
+Finally, create a unit test for your Policy XX XX.
+
+Now test it by creating a simple settings file on the domain of your choice.
+An example experiment is given below:
+
+.. literalinclude:: ../examples/tutorial/eGreedyTut_example.py
+   :language: python
+   :linenos:
+
+What to do next?
+----------------
+
+In this Policy tutorial, we have seen how to 
+
+* Write a Policy that inherits from the RLPy base ``Policy`` class
+* Override several base functions, including those that manage exploration/exploitation
+* Add the Policy to RLPy and test it
+
+If you would like to add your new Policy to the RLPy project, email ``rlpy@mit.edu``
+or create a pull request to the 
+`RLPy repository <https://bitbucket.org/rlpy/rlpy>`_.
+
+.. _make_rep:
+
+.. this is a comment. see http://sphinx-doc.org/rest.html for markup instructions
+
+Creating a New Representation
+=============================
+
+This tutorial describes the standard RLPy 
+:class:`~Representations.Representation.Representation` interface,
+and illustrates a brief example of creating a new value function representation.
+
+.. Below taken directly from Representation.py
+
+The Representation is the approximation of the
+value function associated with a :py:class:`~Domains.Domain.Domain`,
+usually in some lower-dimensional feature space.
+
+The Agent receives observations from the Domain on each step and calls 
+its :func:`~Agents.Agent.Agent.learn` function, which is responsible for updating the
+Representation accordingly.
+Agents can later query the Representation for the value of being in a state
+*V(s)* or the value of taking an action in a particular state
+( known as the Q-function, *Q(s,a)* ).
+
+.. note::
+    At present, it is assumed that the Linear Function approximator
+    family of representations is being used.
+        
+.. note ::
+    You may want to review the namespace / inheritance / scoping 
+    `rules in Python <https://docs.python.org/2/tutorial/classes.html>`_.
+
+
+Requirements 
+------------
+
+* At the top of the file (before the class definition), include the following 
+  heading. Fill in the appropriate ``__author__`` name and ``__credits__`` as needed.
+  Note that RLPy requires the BSD 3-Clause license.::
+
+    __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+    __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", 
+                    "William Dabney", "Jonathan P. How"]
+    __license__ = "BSD 3-Clause"
+    __author__ = "Tim Beaver"
+
+
+
+* If available, please include a link or reference to the publication associated 
+  with this implementation (and note differences, if any).
+
+* Each Representation must be a subclass of 
+  :class:`~Representations.Representation.Representation` and call the 
+  :func:`~Representations.Representation.__init__` function of the 
+  Representation superclass.
+
+* Accordingly, each Representation must be instantiated with a Logger (or None)
+  and a Domain in the ``__init__()`` function.  Note that an optional
+  ``discretization`` parameter may be used by discrete Representations 
+  attempting to represent a value function over a continuous space.
+  It is ignored for discrete dimensions.
+
+* Your code should appropriately handle the case where ``logger=None`` is 
+  passed to ``__init__()``.
+
+* Once completed, the className of the new agent **must be added** to the
+  ``__init__.py`` file in the ``Representations/`` directory.
+  (This allows other files to import the new Representation).
+
+* After your Representation is complete, you should define a unit test XX Add info here XX
+
+
+REQUIRED Instance Variables
+"""""""""""""""""""""""""""
+
+The new Representation *MUST* set the variables *BEFORE* calling the
+superclass ``__init__()`` function:
+
+#. ``self.isDynamic`` - bool: True if this Representation can add or 
+   remove features during execution
+
+#. ``self.features_num`` - int: The (initial) number of features in the representation
+
+
+REQUIRED Functions
+""""""""""""""""""
+The new Representation *MUST* define two functions:
+
+#. :func:`~Representations.Representation.Representation.phi_nonTerminal`,
+   (see linked documentation), which returns a vector of feature function 
+   values associated with a particular state.
+
+#. :func:`~Representations.Representation.Representation.featureType`,
+   (see linked documentation), which returns the data type of the underlying
+   feature functions (eg "float" or "bool").
+
+SPECIAL Functions
+"""""""""""""""""
+Representations whose feature functions may change over the course of execution
+(termed **adaptive** or **dynamic** Representations) should override 
+one or both functions below as needed.
+Note that ``self.isDynamic`` should = ``True``.
+
+#. :func:`~Representations.Representation.Representation.pre_discover`
+
+#. :func:`~Representations.Representation.Representation.post_discover`
+
+Additional Information
+----------------------
+
+* As always, the Representation can log messages using ``self.logger.log(<str>)``, see 
+  :func:`Tools.Logger.log`. 
+  Your code should be appropriately handle the case where ``logger=None`` is 
+  passed to ``__init__()``.
+
+* You should log values assigned to custom parameters when ``__init__()`` is called.
+
+* See :class:`~Representations.Representation.Representation` for functions 
+  provided by the superclass, especially before defining 
+  helper functions which might be redundant.
+
+
+
+Example: Creating the ``IncrementalTabular`` Representation
+-----------------------------------------------------------
+In this example we will recreate the simple :class:`~Representations.IncrementalTabular.IncrementalTabular`  Representation, which 
+merely creates a binary feature function f\ :sub:`d`\ () that is associated with each
+discrete state ``d`` we have encountered so far.
+f\ :sub:`d`\ (s) = 1 when *d=s*, 0 elsewhere, ie, the vector of feature 
+functions evaluated at *s* will have all zero elements except one.
+Note that this is identical to the :class:`~Representations.Tabular.Tabular` 
+Representation, except that feature functions are only created as needed, not 
+instantiated for every single state at the outset.
+Though simple, neither the ``Tabular`` nor ``IncrementalTabular`` representations
+generalize to nearby
+states in the domain, and can be intractable to use on large domains (as there
+are as many feature functions as there are states in the entire space).
+Continuous dimensions of ``s`` (assumed to be bounded in this Representation) 
+are discretized.
+
+#. Create a new file in the ``Representations/`` directory, ``IncrTabularTut.py``.
+   Add the header block at the top::
+
+       __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
+       __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
+                      "William Dabney", "Jonathan P. How"]
+       __license__ = "BSD 3-Clause"
+       __author__ = "Ray N. Forcement"
+       
+       from Representation import *
+
+#. Declare the class, create needed members variables (here an optional hash
+   table to lookup feature function values previously computed), and write a 
+   docstring description::
+
+       class IncrTabularTut(Representation):
+           """
+           Tutorial representation: identical to IncrementalTabular
+
+           """
+           hash = None
+
+#. Copy the __init__ declaration from ``Representation.py``, add needed parameters
+   (here none), and log them.
+   Assign self.features_num and self.isDynamic, then
+   call the superclass constructor::
+
+       def __init__(self, domain, logger, discretization=20):
+           self.hash           = {}
+           self.features_num   = 0
+           self.isDynamic      = True
+           super(IncrTabularTut, self).__init__(domain, logger, discretization)
+
+#. Copy the ``phi_nonTerminal()`` function declaration and implement it accordingly
+   to return the vector of feature function values for a given state.
+   Here, lookup feature function values using self.hashState(s) provided by the 
+   parent class.
+   Note here that self.hash should always contain hash_id if ``pre_discover()``
+   is called as required::
+
+       def phi_nonTerminal(self, s):
+           hash_id = self.hashState(s)
+           id  = self.hash.get(hash_id)
+           F_s = np.zeros(self.features_num, bool)
+           if id is not None:
+               F_s[id] = 1
+           return F_s
+
+#. Copy the ``featureType()`` function declaration and implement it accordingly
+   to return the datatype returned by each feature function.
+   Here, feature functions are binary, so the datatype is boolean::
+
+       def featureType(self):
+           return bool
+
+#. Override parent functions as necessary; here we require a ``pre_discover()``
+   function to populate the hash table for each new encountered state::
+
+       def pre_discover(self, s, terminal, a, sn, terminaln):
+           return self._add_state(s) + self._add_state(sn)
+
+#. Finally, define any needed helper functions::
+
+       def _add_state(self, s):
+           hash_id = self.hashState(s)
+           id  = self.hash.get(hash_id)
+           if id is None:
+               #New State
+               self.features_num += 1
+               #New id = feature_num - 1
+               id = self.features_num - 1
+               self.hash[hash_id] = id
+               #Add a new element to the feature weight vector
+               self.addNewWeight()
+               return 1
+           return 0
+
+       def __deepcopy__(self, memo):
+           new_copy = IncrementalTabular(self.domain, self.logger, self.discretization)
+           new_copy.hash = deepcopy(self.hash)
+           return new_copy
+
+That's it! Now add your new Representation to ``Representations/__init__.py``::
+
+    ``from IncrTabularTut import IncrTabularTut``
+
+Finally, create a unit test for your representation XX XX.
+
+Now test it by creating a simple settings file on the domain of your choice.
+An example experiment is given below:
+
+.. literalinclude:: ../examples/tutorial/IncrTabularTut_example.py
+   :language: python
+   :linenos:
+
+What to do next?
+----------------
+
+In this Representation tutorial, we have seen how to 
+
+* Write an adaptive Representation that inherits from the RLPy base 
+  ``Representation`` class
+* Add the Representation to RLPy and test it
+
+If you would like to add your new Representation to the RLPy project, email ``rlpy@mit.edu``
+or create a pull request to the 
+`RLPy repository <https://bitbucket.org/rlpy/rlpy>`_.
+

examples/tutorial/ChainMDPTut_example.py

+#!/usr/bin/env python
+"""
+Domain Tutorial for RLPy
+=================================
+
+Assumes you have created the ChainMDPTut.py domain according to the
+tutorial and placed it in the Domains/ directory.
+Tests the agent using SARSA with a tabular representation.
+"""
+__author__ = "Robert H. Klein"
+from Domains import ChainMDPTut
+from Tools import Logger
+from Agents import SARSA
+from Representations import Tabular
+from Policies import eGreedy
+from Experiments import Experiment
+import os
+
+
+def make_experiment(id=1, path="./Results/Tutorial/ChainMDPTut-SARSA"):
+    """
+    Each file specifying an experimental setup should contain a
+    make_experiment function which returns an instance of the Experiment
+    class with everything set up.
+
+    @param id: number used to seed the random number generators
+    @param path: output directory where logs and results are stored
+    """
+    logger = Logger()
+
+    ## Domain:
+    chainSize = 50
+    domain = ChainMDPTut(chainSize=chainSize, logger=logger)
+
+    ## Representation
+    # discretization only needed for continuous state spaces, discarded otherwise
+    representation  = Tabular(domain, logger)
+
+    ## Policy
+    policy = eGreedy(representation, logger, epsilon=0.2)
+
+    ## Agent
+    agent = SARSA(representation=representation, policy=policy,
+                       domain=domain, logger=logger,
+                       learning_rate=0.1)
+    checks_per_policy = 100
+    max_steps = 2000
+    num_policy_checks = 10
+    experiment = Experiment(**locals())
+    return experiment
+
+if __name__ == '__main__':
+    experiment = make_experiment(1)
+    experiment.run(visualize_steps=False,  # should each learning step be shown?
+                   visualize_learning=True,  # show policy / value function?
+                   visualize_performance=1)  # show performance runs?
+    experiment.plot()
+    experiment.save()

examples/tutorial/IncrTabularTut_example.py

+#!/usr/bin/env python
+"""
+Representation Tutorial for RLPy
+================================
+
+Assumes you have created the IncrTabularTut.py agent according to the tutorial and
+placed it in the Representations/ directory.
+Tests the Representation on the GridWorld domain usin SARSA
+"""
+__author__ = "Robert H. Klein"
+from Domains import GridWorld
+from Tools import Logger
+from Agents import SARSA
+from Representations import IncrTabularTut
+from Policies import eGreedy
+from Experiments import Experiment
+import os
+
+
+def make_experiment(id=1, path="./Results/Tutorial/gridworld-IncrTabularTut"):
+    """
+    Each file specifying an experimental setup should contain a
+    make_experiment function which returns an instance of the Experiment
+    class with everything set up.
+
+    @param id: number used to seed the random number generators
+    @param path: output directory where logs and results are stored
+    """
+    logger = Logger()
+
+    ## Domain:
+    maze = os.path.join(GridWorld.default_map_dir, '4x5.txt')
+    domain = GridWorld(maze, noise=0.3, logger=logger)
+
+    ## Representation
+    # discretization only needed for continuous state spaces, discarded otherwise
+    representation  = IncrTabularTut(domain, logger)
+
+    ## Policy
+    policy = eGreedy(representation, logger, epsilon=0.2)
+
+    ## Agent
+    agent = SARSA(representation=representation, policy=policy,
+                       domain=domain, logger=logger,
+                       learning_rate=0.1)
+    checks_per_policy = 100
+    max_steps = 2000
+    num_policy_checks = 10
+    experiment = Experiment(**locals())
+    return experiment
+
+if __name__ == '__main__':
+    experiment = make_experiment(1)
+    experiment.run(visualize_steps=False,  # should each learning step be shown?
+                   visualize_learning=True,  # show policy / value function?
+                   visualize_performance=1)  # show performance runs?
+    experiment.plot()
+    experiment.save()

examples/tutorial/SARSA0_example.py

 import os
 
 
-def make_experiment(id=1, path="./Results/Tutorial/gridworld-qlearning"):
+def make_experiment(id=1, path="./Results/Tutorial/gridworld-sarsa0"):
     """
     Each file specifying an experimental setup should contain a
     make_experiment function which returns an instance of the Experiment
     domain = GridWorld(maze, noise=0.3, logger=logger)
 
     ## Representation
+    # discretization only needed for continuous state spaces, discarded otherwise
     representation  = Tabular(domain, logger, discretization=20)
 
     ## Policy

examples/tutorial/eGreedyTut_example.py

+#!/usr/bin/env python
+"""
+Policy Tutorial for RLPy
+=================================
+
+Assumes you have created the eGreedyTut.py agent according to the tutorial and
+placed it in the Policies/ directory.
+Tests the policy on the GridWorld domain, with the policy and value function
+visualized.
+"""
+__author__ = "Robert H. Klein"
+from Domains import GridWorld
+from Tools import Logger
+from Agents import SARSA
+from Representations import Tabular
+from Policies import eGreedyTut
+from Experiments import Experiment
+import os
+
+
+def make_experiment(id=1, path="./Results/Tutorial/gridworld-eGreedyTut"):
+    """
+    Each file specifying an experimental setup should contain a
+    make_experiment function which returns an instance of the Experiment
+    class with everything set up.
+
+    @param id: number used to seed the random number generators
+    @param path: output directory where logs and results are stored
+    """
+    logger = Logger()
+
+    ## Domain:
+    maze = os.path.join(GridWorld.default_map_dir, '4x5.txt')
+    domain = GridWorld(maze, noise=0.3, logger=logger)
+
+    ## Representation
+    # discretization only needed for continuous state spaces, discarded otherwise
+    representation  = Tabular(domain, logger, discretization=20)
+
+    ## Policy
+    policy = eGreedyTut(representation, logger, epsilon=0.2)
+
+    ## Agent
+    agent = SARSA(representation=representation, policy=policy,
+                       domain=domain, logger=logger,
+                       learning_rate=0.1)
+    checks_per_policy = 100
+    max_steps = 2000
+    num_policy_checks = 10
+    experiment = Experiment(**locals())
+    return experiment
+
+if __name__ == '__main__':
+    experiment = make_experiment(1)
+    experiment.run(visualize_steps=False,  # should each learning step be shown?
+                   visualize_learning=True,  # show policy / value function?
+                   visualize_performance=1)  # show performance runs?
+    experiment.plot()
+    experiment.save()