Wiki
Clone wikitutorial / Home
#The Sigma Cognitive Architecture: Online Tutorial# Sigma (Σ) is a nascent cognitive system intended to support the real-time needs of intelligent agents, robots and virtual humans. Generally speaking, much of the field of AI can be classified into the branches of "low level pattern recognition" or "high level symbolic reasoning", with very few systems which address both sides. Sigma is an attempt to model both cognitive and graphical architectures. These cognitive and graphical architectures form the two halves of the cognitive architecture hypothesis, the dynamic tension, interplay and constraint between these two architectures, in both their development and execution, is the source of much of Sigma’s unique contributions to cognitive architectures, artificial intelligence, and cognitive science.
In this tutorial, we are going to use the structure of Sigma to accomplish some basic tasks. Many basic operations for a virtual agent may be defined on a grid structure. Here, we are going to slowly build up a Sigma model that controls the movements of a virtual agent on a grid. We will start with a single dimensional empty grid and will introduce new Sigma operations and concepts as we build up the model cumulatively. The goal of this tutorial is to enable you to accomplish progressively harder/greater tasks. This tutorial involves:
- Initializing and creating a Sigma agent
- Initializing and defining a world, including the actions which can be taken
- Initializing perceptual information
- Learning in Sigma
- Templates in Sigma
At the end of this tutorial, you will have an agent which can, albeit simply, perceive the world, model the world in memory, learn about the world, and act. This isn’t a lightweight task, as what is going on behind the scene (probabilistic models, SLAM, etc.) is complex, but for this tutorial much of this complexity is abstracted from you.
As a final note before we get started with the exciting stuff, this tutorial will define, redefine, and re-re-re-redefine a function called random-walk. The goal is for each random-walk to be self-contained and for each iteration to teach you a little more about Sigma and how it operates. Generally speaking, you can copy-and-paste the “(defun random-walk …” text wherever you see it or download the lisp file, which contains all of the code.
Setup
Lispworks
- This tutorial requires that you have a Common Lisp implementation installed. The preferred system is Lispworks. There is a free trial version available for download here which is sufficient for all the exercises in this tutorial.
- The source code needed for this tutorial can be downloaded here
- Next, start up Lispworks, select 'open' & navigate to the location of sigma-tutorial.lisp on your filesystem and double click to open.
- From the top menu select buffers -> compile
- You will now have all of the sigma functionality & the tutorial code loaded into your system so you can run any of the commands listed below.
- For help or to report issues: sigma@ict.usc.edu
CLISP/SLIME
- An alternative, non-proprietary Common Lisp implementation can be installed using the instructions here
- The tutorial version of sigma will NOT run on an open-source implementation, you will need to download the source code for the official sigma release
- After installation, fire up the SLIME REPL. You'll need to tell quicklisp where to find the sigma source code you just downloaded, this will then allow you to load the sigma package:
CL-USER> (pushnew #p"/PATH/TO/sigma-release/Sigma38/src/" asdf:*central-registry*) CL-USER> (ql:quickload :sigma)
CL-USER> (in-package :sigma) #<PACKAGE SIGMA> SIGMA>
SIGMA> (load "/PATH/TO/rwtutorial.lisp")
Operators (+conditionals & functions)
First, let's introduce the (init)
Sigma function. (init)
is required to initialize a Sigma model and it needs to be called each time a new Sigma model is being initiated. Check the reference sheet for the full definition of (init)
. In general, clicking on any name or phrase in a blue-outline box like this one will take you to the reference sheet entry for it. At this point, we are only interested in one of the optional arguments for (init)
, which is the <operator-names>
. This argument allows us to define the operators that are available to our virtual agent. Let's assume there are only three operators available to the virtual agent to move on the 1D grid. These operators are left, right, and none (stand still). Consequently, the call (init '(left right none))
initializes a Sigma model with these operators.
Let's assume that the virtual agent is going to randomly navigate this 1D grid. Then the virtual agent needs to randomly select and apply these operators. Actions
are used here to provide input for operator selection. Actions are one part of the Sigma construct conditional
, which structures the long-term memory. For example, the conditional acceptable
below makes all three operators equally likely to be selected:
(conditional 'acceptable :actions '((selected (operator left)) (selected (operator right)) (selected (operator none)) ) )
selected
is a system generated predicate
that frames the decision process among the defined operators. Predicates
specify relations among sets of arguments. They will be introduced in more depth shortly.
A single cognitive – or decision – cycle is run by calling (decide 1)
or (d 1)
. So putting these together, the simple Sigma model is initialized, created and run by the function:
(defun random-walk-1() (init '(left right none)) (conditional 'acceptable :actions '((selected (operator left)) (selected (operator right)) (selected (operator none)) ) ) (d 1) )
The Sigma function (print-pred-function)
or (ppfn)
prints the contents of a single predicate working memory function. If we call the function (ppfn 'selected)
after running the (random-walk-1)
function above, we will see an output similar to:
WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:100> [LEFT] 0 [RIGHT] 1 [NONE] 0
In this case, the operator right had been selected. As all of the operators are equally likely, different calls to the function (random-walk-1)
may yield different selected operators. The state
argument exists to support possible reflective processing. The base-level is state 0, with each higher metalevel (or lower subgoal) assigned a number one more than its predecessor. The state
argument will be discussed in detail when we are providing examples for reflective processing.
Operator selection
The Sigma parameter post-d
can be used to set the forms to evaluate after the end of a cognitive cycle. In random-walk-2
, defined below, (ppfn 'selected)
is automatically called after each decision.
In the Sigma cognitive language *
denotes the entire domain of an domain argument. Therefore, the modified acceptable
conditional here behaves exactly as in the previous case, where separate actions were provided for each element of the domain.
By default, Sigma selects randomly among all operators with the highest rating – here all three operators share a default rating of 1 – and maintains the one selected until either it is rejected or a more highly rated operator becomes available. This is the best
selection rule. If we want the virtual agent to truly random walk, we must change this selection rule to either prob-match
(probability-match) or boltzmann
using Sigma's (operator-selection)
function. The former chooses randomly each cycle with a probability proportional to the relative ratings, while the latter does something similar after first exponentially transforming the ratings. With either of these, the modified Sigma model selects a random operator at each decision cycle:
(defun random-walk-2() (init '(left right none)) (operator-selection 'boltzmann) (setq post-d '((ppfn 'selected))) (conditional 'acceptable :actions '((selected (operator *)) ) ) (d 1) )
Internal action execution (+ types &predicates)
None of the models seen so far actually perform any actions, all they do is select operators to be executed. Now, let's start with the case where all the movements are mentally simulated on 1-D grid model. For this mental simulation, we need a representation for the 1-D grid and a data structure that points to our current location on the grid. Types
The representation of the 1-D grid in this example is captured by defining a type
named 1D-grid
:(new-type '1D-grid :numeric t :discrete t :min 1 :max 8)
. Type location is discrete numeric and its scope is from 1 to 8 (8 not included). Each of these 7 digits correspond to a cell in our hypothetical 1-D grid.
The agent's current location is captured via a predicate
. Predicates
specify relations among sets of typed arguments. In this case, we only need a single argument x
of type 1D-grid
for the predicate
location
:
(predicate 'location :world 'closed :arguments '((x 1D-grid !)))
One key distinction among types of predicates is whether they are unique or universal, which is itself grounded in whether the predicate’s arguments are unique or universal. Universal arguments are like variables in rule systems, where any or all of the elements in the variable’s domain may be valid. Unique arguments are like random variables in probabilistic systems, where a distribution is provided over all of the elements of variable’s domain but only a single value is actually correct. The location
predicate
is a unique predicate as the its only argument x
is annotated with !
. Use of !
implies that a best alternative (i.e., the most likely grid location) is to be selected.
Another distinction among types of predicates concerns whether they are open world versus closed world, and thus whether unspecified values are assumed to be unknown (as in probabilistic networks and many logics) or false (as in rules). The location
predicate
is defined as a closed world predicate
. EXPLANATION:Non-persistent predicates don't do selection .
Now, we do need to define how the selected operators are applied onto the mental representation of the grid. Conditionals
are the appropriate constructs to do the necessary modification on the location
predicate
. For example, the conditional
move-left
performs the action left
by decreasing the value of location by 1. NEED TO EXPLAIN HOW THIS TRANSLATION WORKS
(conditional 'move-left :conditions '( (selected (operator left)) (location (x (value))) ) :actions '( (location (x (value -1))) ) )
Similarly,
(conditional 'move-right :conditions '( (selected (operator right)) (location (x (value))) ) :actions '( (location (x (value 1))) ) )
Initial location of the virtual agent can be specified by defining an evidence
. Let's assume that the agent is initially located at the grid location 4: Evidence is used for closed world predicates
(evidence '((location (x 4)) ))
(defun random-walk-3() (init '(left right none)) (operator-selection 'boltzmann) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (predicate 'location :world 'closed :arguments '((x 1D-grid !))) (setq post-d '( (ppfn 'selected) (ppfn 'location) )) (conditional 'move-left :conditions '( (selected (operator left)) (location (x (value))) ) :actions '( (location (x (value -1))) ) ) (conditional 'move-right :conditions '( (selected (operator right)) (location (x (value))) ) :actions '( (location (x (value 1))) ) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (evidence '((location (x 4)) )) (d 5) )
Trials
Next, let's introduce the concept of running trials
. A trial
continues to run until the Sigma model is terminated by a halt
call. We can modify our model such that it terminates when the agent reaches one of the end locations, 1 or 7, of the grid. For example:
(conditional 'halt-at-location-1 :conditions '( (location (x 1)) ) :actions '( (halt) ) )
Here, we introduce a new predicate pattern: conditions
. This is conceptually similar to the conditions and actions found in rule based systems. In the example above, the conditional
is trying to match the condition where the agent's location is the 1st grid location. If that is the case, halt
is invoked as an action.
The Sigma parameter pre-t
can be used to set the forms to evaluate before a trial starts. We can use the pre-t
parameter to set the initial location of the agent. Another setting that is applied here is that the Sigma variable max-fraction-pa-regions
is set to 0. This variable affects the print functions and it changes the region based representation to a more explicit representation. After these modifications, our Sigma model looks like:
(defun random-walk-4() (init '(left right none)) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (predicate 'location :world 'closed :arguments '((x 1D-grid !))) (setq pre-t '((evidence '((location (x 4)) )))) (setq post-d '( (ppfn 'selected) (ppfn 'location) )) (conditional 'move-left :conditions '( (selected (state 0) (operator left)) (location (x (value))) ) :actions '( (location (x (value -1))) ) ) (conditional 'move-right :conditions '( (selected (state 0) (operator right)) (location (x (value))) ) :actions '( (location (x (value 1))) ) ) (conditional 'halt-at-location-1 :conditions '( (location (x 1)) ) :actions '( (halt) ) ) (conditional 'halt-at-location-7 :conditions '( (location (x 7)) ) :actions '( (halt) ) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
A sample output for the above model is provided below. The first decision cycle selects the left
operator and the agent is at location 4. At the beginning of the 2nd decision, the agent is at location 4 and the left
operator is applied so at the end of the 2nd decision cycle, the agent is now at location 3. The new operator selected is left
. In the 3rd decision cycle, left
operator is applied so the agent moves to location 2 and now the none
operator is selected. In the fourth decision cycle, agent stays at location 2 and another left
operator is selected. In the 5th decision cycle, agent moves to location 1 and halt is called. Halt call overrides any other operator so the model terminates at the end of the 6th decision cycle.
>>> Trial 1 <<< <<< Decision 1 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 1 0 0 0 <<< Decision 2 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 1 0 0 0 0 <<< Decision 3 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 0 [RIGHT] 0 [NONE] 1 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 1 0 0 0 0 0 <<< Decision 4 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 1 0 0 0 0 0 <<< Decision 5 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 1 0 0 0 0 0 0 <<< Decision 6 >>> WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 WM for LOCATION Factor [4_LOCATION-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 1 0 0 0 0 0 0
##External action execution (+ perception & action) ##
Until now, all the operators were applied on a mental representation. Let's define a world external to the model and let's make our Sigma model to interact with this world through perceptions and actions. The location of the agent in this external world is captured by the Lisp variable 1d-grid-location
.
; True location in world (defvar 1d-grid-location)
perceive-location
provides information about the agent's location to the Sigma's perception function and the function execute-operator
executes the Sigma's selected operators and modifies the location of the agent on the external world.
The perceive-location
function utilizes the Sigma's perceive
function. In general, the perceive
function is used to provide probabilistic evidence to open-world predicates
. MAY NEED FURTHER INFO ON OPEN WORLD HERE. The perceive-location
function two parameters: (1) correct-prob is the probability of perceiving the correct location of the agent and (2) correct-mass is the probabilistic mass for the location perceived. The Sigma's perceive
function is called from perceive-location
function. For example,
lisp (perceive ((location 0.9 (x 4))))
is the perception of agent's location being position 4 with a probability of 0.9. The perceive-location
function is:
; Perceive results of operator with some noise ; Correct-prob is probability that peak is at correct location ; Correct-mass is how much of the probability mass is at the perceived location (defun perceive-location (&optional correct-prob correct-mass) (unless correct-prob (setq correct-prob 1.0)) (unless correct-mass (setq correct-mass 1.0)) (let ((rand (random 1.0)) location ; Perceived location 1-cm ) (setq 1-cm (- 1 correct-mass)) (setq location 1d-grid-location) ; Perceive new location with correct-prob of getting right (and otherwise on one side) (cond ((= 1d-grid-location 1) (when (>= rand correct-prob) (setq location 2)) ) ((= 1d-grid-location 7) (when (>= rand correct-prob) (setq location 6)) ) (t (when (>= rand correct-prob) (if (< (random 1.0) .5) (setq location (1- 1d-grid-location)) (setq location (1+ 1d-grid-location))) ) ) ) ; Zero out all perception for predicate location (perceive '((location 0))) ; Generate noisy perceptions based on correct-mass (perceive `((location ,correct-mass (x ,location)))) ; Correct-mass at location ; Divide incorrect mass among adjacent locations when they exist (cond ((= location 1) (perceive `((location ,1-cm (x 2)))) ) ((= location 7) (perceive `((location ,1-cm (x 6)))) ) (t (perceive `((location ,(/ 1-cm 2) (x ,(1- location))) (location ,(/ 1-cm 2) (x ,(1+ location))) ) ) ) ) ) )
The execute-operator
function reads the selected the Sigma operator and applies it to the agent's external world. The correc-prob parameter determines the probability whether the operators left or right are successfully applied and the agent has moved either left or right. If not, the agent stays at its current location.
; Execute operator with some noise (defun execute-operator (&optional correct-prob) (unless correct-prob (setq correct-prob 1.0)) (let ((operator (operator-in-state base-level-state)) (rand (random 1.0)) ) (when (and operator (not (haltp))) ; Make correct move action-noise percent of time (case operator (left (when (< rand correct-prob) (setq 1d-grid-location (max (- 1d-grid-location 1) 1)))) (right (when (< rand correct-prob) (setq 1d-grid-location (min (+ 1d-grid-location 1) 7)))) ) ) (format trace-stream "~&~&Operator:~S Next location: ~S~%" operator 1d-grid-location) ) )
There are also changes required in the Sigma model. First, the perceive-location
and execute-operator
functions need to be introduced to the Sigma model. This is achieved by two predefined Sigma lists: (1) perceive-list and (2) action-list. Adding the perceive-location
and execute-operator
functions to the appropriate lists will result in initializing these functions within the Sigma's decision cycle. Perceive
functions are called within the elaboration phase and the action
functions are called within the adaptation phase.
The other change that needs to be done is to change the location
predicate from closed-world to open-world. Since location
refers to an external representation, it is captured probabilistically. open-world predicates do not have working memory function nodes so ppfn
can not be used. (ppvn)
prints out the posterior for variable nodes and the call (ppvn 'location)
will provide us the desired functionality. WHEN PPWM AND PWM ARE SIMPLIFIED, THE EXPLANATIONS FOR PPVN AND PPFN SHOULD BE CLEARED Finally, the conditionals move-right
and move-left
are dropped from the Sigma model as the operators are applied to the external world.
(defun random-walk-5 (&optional perception-prob perception-mass action-prob) (init '(left right none)) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (predicate 'location :perception t :arguments '((x 1D-grid %))) (setq perceive-list `((perceive-location ,perception-prob ,perception-mass))) (setq action-list `((execute-operator ,action-prob))) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-d '( (ppvn 'location) (ppfn 'selected) )) (conditional 'halt-1 :conditions '( (location (x 1)) ) :actions '( (halt) ) ) (conditional 'halt-7 :conditions '( (location (x 7)) ) :actions '( (halt) ) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
Value selection
One addition that we can do the Sigma model is a location-selected
predicate, which selects the most probable location as the current location. The location-selected
predicate can be defined as closed-world with ! (select best) as the unique symbol:
(predicate 'location-selected :world 'closed :arguments '((x 1D-grid !)))
The conditional
select-location
manages the interaction between the location
and location-selected
predicates.
(conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) )
halt
conditions can be defined in the forms to be evaluated in the pre-run
so that halt conditions are checked in the external world. pre-run
is before messages are sent within a decision.
(setq pre-run '((when (or (= 1d-grid-location 1) (= 1d-grid-location 7)) (halt))))
So our model looks like:
(defun random-walk-6 (&optional perception-prob perception-mass action-prob) (init '(left right none)) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (predicate 'location :perception t :arguments '((x 1D-grid %))) (predicate 'location-selected :world 'closed :arguments '((x 1D-grid !))) (setq perceive-list `((perceive-location ,perception-prob ,perception-mass))) (setq action-list `((execute-operator ,action-prob))) (setq pre-t '((setf 1d-grid-location 4) )) (setq pre-run '((when (or (= 1d-grid-location 1) (= 1d-grid-location 7)) (halt)))) (setq post-d '( (ppvn 'location) (ppfn 'location-selected) (ppfn 'selected) )) (conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
External objects
Next, let's assume there is an object at each grid location and the agent perceives these objects. First, let's define a type
named obj-type
:(new-type 'obj-type :constants '(walker table dog human))
. Type obj-type is symbolic and its scope includes objects: walker, table, dog, and human.
The object
predicate
and the object-perceived
predicate
specifies the relation between the grid locations and object types:
(predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'object-perceived :world 'closed :arguments '((location 1D-grid) (object obj-type !)))
This relation is established by the following conditional:
(conditional 'perceived-objects :conditions '( (object (object (obj))) (location (x (loc))) ) :actions '((object-perceived (object (obj)) (location (loc)))) )
As the objects are situated in the external world, we need a perceptual function that interfaces with the external world. The perceive-object
function below perceives objects depending on the actual location of the agent on the grid. The spread of the objects on the grid is 1-dog, 2-table, 3-walker, 4-human, 5-dog, 6-table, and 7-walker.
(defun perceive-object () (case 1d-grid-location (1 (perceive '((object (object dog))))) (2 (perceive '((object (object human))))) (3 (perceive '((object (object walker))))) (4 (perceive '((object (object table))))) (5 (perceive '((object (object dog))))) (6 (perceive '((object (object table))))) (7 (perceive '((object (object walker))))) ) )
The perceive-list
then needs to be updated for perceptions about objects. The updated model is:
(defun random-walk-7(&optional perception-prob perception-mass action-prob) (init '(left right none)) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(walker table dog human)) (predicate 'location :perception t :arguments '((x 1D-grid %))) (predicate 'location-selected :world 'closed :arguments '((x 1D-grid !))) (predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'object-perceived :world 'closed :arguments '( (location 1D-grid) (object obj-type !))) (setq pre-t '((setf 1d-grid-location 4) )) (setq pre-run '((when (or (= 1d-grid-location 1) (= 1d-grid-location 7)) (halt)))) (setq post-d '( (ppvn 'location) (ppvn 'object) (ppfn 'location-selected) (ppfn 'object-perceived) (ppfn 'selected) )) (setq perceive-list `((perceive-location ,perception-prob ,perception-mass) (perceive-object) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'perceived-objects :conditions '( (object (object (obj))) (location (x (loc))) ) :actions '((object-perceived (object (obj)) (location (loc)))) ) (conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
SIGMA 44 > (random-walk-7) >>> Trial 1 <<< <<< Decision 1 >>> (1.0: WM-X(1D-GRID)[4]) (1: WM-OBJECT(OBJ-TYPE)[TABLE]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 1 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 0 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 0 0 0 0 0 0 0 [HUMAN] 0 0 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 0 [RIGHT] 0 [NONE] 1 <<< Decision 2 >>> (1.0: WM-X(1D-GRID)[4]) (1: WM-OBJECT(OBJ-TYPE)[TABLE]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 1 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 0 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 0 0 0 0 0 0 0 [HUMAN] 0 0 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 0 [RIGHT] 0 [NONE] 1 <<< Decision 3 >>> (1.0: WM-X(1D-GRID)[4]) (1: WM-OBJECT(OBJ-TYPE)[TABLE]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 1 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 0 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 0 0 0 0 0 0 0 [HUMAN] 0 0 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 <<< Decision 4 >>> (1.0: WM-X(1D-GRID)[3]) (1: WM-OBJECT(OBJ-TYPE)[WALKER]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 1 0 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 1 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 0 0 0 0 0 0 0 [HUMAN] 0 0 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 <<< Decision 5 >>> (1.0: WM-X(1D-GRID)[2]) (1: WM-OBJECT(OBJ-TYPE)[HUMAN]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 1 0 0 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 1 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 0 0 0 0 0 0 0 [HUMAN] 0 1 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 1 [RIGHT] 0 [NONE] 0 <<< Decision 6 >>> (1.0: WM-X(1D-GRID)[1]) (1: WM-OBJECT(OBJ-TYPE)[DOG]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 1 0 0 0 0 0 0 WM for OBJECT-PERCEIVED Factor [7_OBJECT-PERCEIVED-WM-FN] Function: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0 0 1 0 0 0 0 [TABLE] 0 0 0 1 0 0 0 [DOG] 1 0 0 0 0 0 0 [HUMAN] 0 1 0 0 0 0 0 WM for SELECTED Factor [2_SELECTED-WM-FN] Function: WM-STATE x WM-OPERATOR: [0:> [LEFT] 0 [RIGHT] 1 [NONE] 0 Total time 0.01 sec Trials: 1; Msec per trial: 7.0 Decision cycles: 6; Msec per decision cycle: 1 (init: 0, messages: 0, decision: 0, learn: 0) Total messages: 162; Messages per decision: 27; Msec per message: 0.02
Learning (of Maps)
In the above output, the working memory factor node of the predicate object-perceived
appears to learn the object-location relation but this is not a stable representation WHAT IS A BETTER WAY OF SAYING THIS?. Learning occurs in Sigma via a process of gradient descent over functions defined in predicates or conditionals. For instance, learning a map of objects in the single dimensional grid would require defining a function representative of the concept being learned. Let's modify the `object-perceived predicate by changing its name to map and adding a function to it:
(predicate 'map :arguments '( (location 1D-grid) (object obj-type %)) :function 1)
This map
predicate is now open-world predicate and the `object parameter is a distribution parameter rather than a select-best parameter as denoted by the %. The function of this predicate is defined over the parameters of the predicate: location and object. Here, the object is the parameter that we are keeping a distribution so in this case, we are defining a distribution for each location on the grid. The 1 used in function definition simply tells that we are starting with a uniform prior, that is every object is equally likely at the beginning for each location.
Turning on the learning in Sigma is simple:
(learn '(:gd))
learning-rate
, which, as the name implies, sets the learning rate. The default learning rate is 0.05 and in the model below, it is set to 0.01. The second parameter that we are going to modify is the max-gdl-increment
, which determines the maximum change that can be applied to a parameter. The default value is 1 but we are going to set it to 0.2 to further lower the cap on the maximum gradient. (This prevents a certain type of potential problem in learning with this model but we are not going to discuss the background in this part of the tutorial)
One other change that we have made to the model is that we are going to run the model for a fixed number of decisions rather than waiting for the agent to arrive at a particular location. Main reason for this change is to have better coverage on the map that is learned by the agent.
So the updated model is:
(defun random-walk-8 (number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) (learn '(:gd)) (setf learning-rate 0.01) (setf max-gdl-increment 0.2) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (setf MAX-DECISIONS number-of-decisions) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(walker table dog human)) (predicate 'location :perception t :arguments '((x 1D-grid %))) (predicate 'location-selected :world 'closed :arguments '((x 1D-grid !))) (predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'map :arguments '( (location 1D-grid) (object obj-type %)) :function 1) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `( (when (= decision-count ,number-of-decisions) (halt)))) (setq post-d '( (format trace-stream "~&~&Current location: ~S~%" 1d-grid-location) (ppvn 'location) (ppvn 'object) (ppfn 'location-selected) (ppfs 'map) (ppfn 'selected) )) (setq perceive-list `((perceive-location ,perception-prob ,perception-mass) (perceive-object) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'perceived-objects :conditions '( (object (object (obj))) (location (x (loc))) ) :condacts '( (map (object (obj)) (location (loc))) ) ) (conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
If we want to run the model for 100 decision cycles, the required function call is:
(random-walk-8 100)
The function to print map
predicate's function is (ppf 'map 'array)
After 100 decision cycles, this function looks like:
SIGMA 91 > (ppf 'map 'array) WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0.1724305 0.16033 0.49504992 0.14520198 0.15644807 0.20907025 0.30710384 [TABLE] 0.1724305 0.16033 0.1683167 0.564394 0.15644807 0.37278926 0.23096539 [DOG] 0.48270845 0.16033 0.1683167 0.14520198 0.53065587 0.20907025 0.23096539 [HUMAN] 0.1724305 0.51901007 0.1683167 0.14520198 0.15644807 0.20907025 0.23096539
SIGMA 98 > (ppf 'map 'array) WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 2.2884786E-4 2.283254E-4 0.942032 0.04620318 0.048757505 0.019322614 0.8224507 [TABLE] 2.2884786E-4 2.283254E-4 0.019322677 0.8613905 0.048757505 0.94203216 0.059183095 [DOG] 0.9993135 2.283254E-4 0.019322677 0.04620318 0.8537275 0.019322614 0.059183095 [HUMAN] 2.2884786E-4 0.999315 0.019322677 0.04620318 0.048757505 0.019322614 0.059183095
(setf learning-rate 0.01)
)and run the model for 100 decision cycles:
SIGMA 100 > (ppf 'map 'array) WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 9.708738E-4 9.6153847E-4 0.9969999 1.0100713E-3 1.0752688E-3 1.1494253E-3 0.6166667 [TABLE] 9.708738E-4 9.6153847E-4 1.0000448E-3 0.9969698 1.0752688E-3 0.99655176 0.12777779 [DOG] 0.99708736 9.6153847E-4 1.0000448E-3 1.0100713E-3 0.9967742 1.1494253E-3 0.12777779 [HUMAN] 9.708738E-4 0.9971154 1.0000448E-3 1.0100713E-3 1.0752688E-3 1.1494253E-3 0.12777779
We are not going to further discuss parameter tuning in this tutorial but there are a number of Sigma variables that can be used to control learning.
Simultaneous Localization and Mapping (SLAM)
Now, let's consider a simple change to the conditional perceived-object
by moving the predicate pattern (location (x (loc)))
from conditions to condacts. As you may recall, messages propagate away from conditions, whereas message flow for condacts is bidirectional. In other words, moving the predicate pattern (location (x (loc)))
from conditions to condacts allows the predicate 'map' to influence the posterior on the location
predicate. The perceived-objects
conditional with the proposed change is:
(conditional 'perceived-objects :conditions '( (object (object (obj))) ) :condacts '( (location (x (loc))) (map (object (obj)) (location (loc))) ) )
To make things more concrete, let's assume we run the original model, where the location
predicate pattern was a condition in a probabilistic setting. In this setting, there is a 60% chance that the agent correctly perceives the location. Furthermore, 60% of the probability mass is on the perceived location and the remaining 40% is distributed over the neighboring locations. Such a scenario can be activated by:
(random-walk-8 1000 0.6 0.6)
Correct location: 4 (0.19999999: WM-X(1D-GRID)[4]) (0.6: WM-X(1D-GRID)[5]) (0.19999999: WM-X(1D-GRID)[6]) (1: WM-OBJECT(OBJ-TYPE)[TABLE]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 0 1 0 0 MAP: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0.05167313 0.34530607 0.7832272 0.11975537 0.050506998 0.11340284 0.79965735 [TABLE] 1.55521E-4 1.55521E-4 0.21646221 0.87993443 0.2564895 0.8417874 0.20002768 [DOG] 0.40140295 0.19412153 1.552944E-4 1.5503877E-4 0.69284845 0.04465457 1.5746542E-4 [HUMAN] 0.5467684 0.46041688 1.552944E-4 1.5503877E-4 1.5503877E-4 1.5503877E-4 1.5746542E-4
Table
has no influence on the location in this case. When we look at the map function learned, we see that it is more likely to perceive a dog
in location 5 rather than a table
, whereas it is more likely to perceive a 'table' at location 4. So if the information in the learned map function can help the agent to recover from errors in perceptions of the location. So consider the case below, where the model is run with location
predicate pattern being a condact in the perceived-objects
conditional:
Current location: 5 (1.0623143E-4: WM-X(1D-GRID)[3]) (0.2195618: WM-X(1D-GRID)[4]) (0.78033197: WM-X(1D-GRID)[5]) (1: WM-OBJECT(OBJ-TYPE)[DOG]) WM for LOCATION-SELECTED Factor [5_LOCATION-SELECTED-WM-FN] Function: WM-X: [1] [2] [3] [4] [5] [6] [7] 0 0 0 0 1 0 0 MAP: WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 1.00908175E-4 0.1106981 0.8140824 0.02834832 0.06153891 0.3477502 0.9556432 [TABLE] 1.00908175E-4 0.039249763 0.114756346 0.72501874 0.13430768 0.6173503 0.044159383 [DOG] 0.9138201 0.34931234 0.071062826 0.2465345 0.804055 0.034800887 9.8716686E-5 [HUMAN] 0.085978076 0.5007398 9.852217E-5 9.852217E-5 9.852217E-5 9.861933E-5 9.8716686E-5
dog
at the current location. The learned map tells that it is more likely to have a dog at location 5 than location 4 and this information is received by the location
as its pattern used in a condact rather than a condition in the perceived-objects
conditional. As a result, the posterior on the location
shows that the agent believes that it is in location 5 even if the perception on location is centered on location 4.
The phenomenon described above is a simple illustration of simultaneous localization and mapping (SLAM), commonly used in the robotics literature.
The updated model for SLAM is provided below:
(defun random-walk-9 (number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) (learn '(:gd)) (setf learning-rate 0.01) (setf max-gdl-increment 0.2) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (setf MAX-DECISIONS number-of-decisions) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(walker table dog human)) (predicate 'location :perception t :arguments '((x 1D-grid %))) (predicate 'location-selected :world 'closed :arguments '((x 1D-grid !))) (predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'map :arguments '( (location 1D-grid) (object obj-type %)) :function 1) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `( (when (= decision-count ,number-of-decisions) (halt)))) (setq post-d '( (format trace-stream "~&~&Current location: ~S~%" 1d-grid-location) (ppvn 'location) (ppvn 'object) (ppfn 'location-selected) (ppfs 'map) (ppfn 'selected) )) (setq perceive-list `((perceive-location ,perception-prob ,perception-mass) (perceive-object) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'perceived-objects :conditions '( (object (object (obj))) ) :condacts '( (location (x (loc))) (map (object (obj)) (location (loc))) ) ) (conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
Semantic Memory (& Learning)
Now, we are going to step back from the agent on a grid example to explain the basic semantic memory concept in Sigma. Later, we are going to reformulate this concept within the agent on a virtual grid model.
Now, let's assume that there are four different types of objects: dog human table walker
. There are also three distinct features that are associated with these objects: (1) Whether they are alive, (2) Number of legs they have , and (3) What color they are. The underlying assumption in this example is that it isn't always possible to directly perceive the objects but we can observe their features to make predictions about the objects.
The perception function is defined below. It takes one input perceive-object
to determine whether the object is perceived or not. We are going to use this to switch between training and testing.
In this example, all objects are equally likely to be perceived. Dog and human are alive, whereas table and walker are not. Walker has definitively 1 leg; there is 80% chance that a dog has 4 legs and 20% chance it has 3 legs; there is 90% chance that a table has 4 legs and 10% chance that it has 3 legs; and there is 97% chance a human has 2 legs, 2% chance that s/he has 1 leg, and 1% chance that s/he has 0 legs. Humans are either brown or white with equal probabilities; dog is brown with 70% chance, white with 25% chance, and silver with 5% chance; walker is silver with 95% chance, and brown with 5% chance; and table is brown with 40% chance and silver with 60% chance. perceive-object-features
function generate perceptions based on these probability distributions by sampling from them.
(defun perceive-object-features (perceive-object) (let ( (rand (random 1.0)) (rand2 (random 1.0)) (rand3 (random 4)) object ) (case rand3 (0 (setf object 'dog)) (1 (setf object 'human)) (2 (setf object 'table)) (3 (setf object 'walker)) ) (if (not perceive-object) (format trace-stream "~&~&Correct object: ~S~%" object )) (if perceive-object (perceive `((object 1 (object ,object)))) (perceive '((object 1 (object *)))) ) (cond ((eq object 'dog) (perceive '((alive 1 (value true)))) (cond ( (< rand 0.7) (perceive '((color 1 (value brown))))) ( (< rand 0.95) (perceive '((color 1 (value white))))) ( (< rand 1) (perceive '((color 1 (value silver))))) ) (cond ((< rand2 0.8) (perceive '((legs 1 (value 4))))) ((< rand2 1) (perceive '((legs 1 (value 3))))) ) ) ((eq object 'walker) (perceive '((alive 1 (value false)))) (cond ( (< rand 0.95) (perceive '((color 1 (value silver))))) ( (< rand 1) (perceive '((color 1 (value brown))))) ) (cond ( (< rand2 1) (perceive '((legs 1 (value 1))))) ) ) ((eq object 'human) (perceive '((alive 1 (value true)))) (cond ((< rand 0.5) (perceive '((color 1 (value brown))))) ((< rand 1) (perceive '((color 1 (value white))))) ) (cond ((< rand2 0.97) (perceive '((legs 1 (value 2))))) ((< rand2 0.99) (perceive '((legs 1 (value 1))))) ((< rand2 1.0) (perceive '((legs 1 (value 0))))) ) ) ((eq object 'table) (perceive '((alive 1 (value false)))) (cond ((< rand 0.4) (perceive '((color 1 (value brown))))) ((< rand 1) (perceive '((color 1 (value silver))))) ) (cond ((< rand2 0.9) (perceive '((legs 1 (value 4))))) ((< rand2 1) (perceive '((legs 1 (value 3))))) ) ) ) ) )
object-prior
is a functional predicate and its function learns the prior distribution on objects.
(predicate 'object-prior :arguments '((object obj-type %)) :function 1)
The object-color
predicate's function is an example of capturing conditional probabilities. The color
argument of the object-color
predicate is distributional, whereas the object argument is universal. This means that this function will learn a probability distribution over color for each object using the gradient descent learning algorithm of Sigma.
(predicate 'object-color :arguments '((object obj-type) (color color %)) :function 1)
We need to define the necessary conditionals that defines the interaction between these function predicates and the rest of the model. For example, perceived-objects
conditional below establishes a bidirectional flow between the object
perceptual predicate and the object-prior
functional predicate. In practice, when there is a perception for the object
predicate, this data will be used to modify the object-prior
function via gradient-descent. On the reverse direction, the functional value coming out of the object-prior
will set the prior on the object
predicate.
(conditional 'perceived-objects :condacts '( (object (object (obj))) (object-prior (object (obj))) ) )
The object-color*join
predicate establishes the relation between the object
and color
perceptual predicates and object-color
functional predicate. Similar to perceived-objects
conditional, the function of the object-color
predicate will be updated when there are perceptions for either (or both) of object
and color
predicates. The functional values of the object-color
predicates will be used when generating the posteriors on object
and/or color
predicates.
(conditional 'object-color*join :condacts '((object (object (obj))) (color (value (color))) (object-color (object (obj)) (color (color)))) )
The basic semantic memory model is provided below. This model defines the corresponding conditionals for legs
and alive
features similar to the object-color*join
conditional above.
(defun random-walk-10(number-of-decisions &optional number-of-test-decisions) (init) (unless number-of-test-decisions (setf number-of-test-decisions 10)) (setf MAX-DECISIONS number-of-decisions) (learn '(:gd)) (setf max-gdl-increment 0.2) (setf learning-rate 0.01) (setf max-fraction-pa-regions 0) (new-type 'obj-type :constants '(walker table dog human)) (new-type 'color :constants '(silver brown white)) (new-type 'i04 :numeric t :discrete t :min 0 :max 5) (predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'legs :perception t :arguments '( (value i04 %))) (predicate 'color :perception t :arguments '((value color %))) (predicate 'alive :perception t :arguments '((value boolean %))) ; Function predicates (predicate 'object-prior :arguments '((object obj-type %)) :function 1) (predicate 'object-color :arguments '((object obj-type) (color color %)) :function 1) (predicate 'object-legs :arguments '((object obj-type) (legs i04 %)) :function 1) (predicate 'object-alive :arguments '((object obj-type) (alive boolean %)) :function 1) (setq post-run `((when (= decision-count ,number-of-decisions) (halt)))) (setf post-t '( (ppf 'object-prior 'array) (ppf 'object-color 'array) (ppf 'object-legs 'array) (ppf 'object-alive 'array) ) ) (setq perceive-list `( (perceive-object-features t) )) (conditional 'perceived-objects :condacts '( (object (object (obj))) (object-prior (object (obj))) ) ) (conditional 'object-color*join :condacts '((object (object (obj))) (color (value (color))) (object-color (object (obj)) (color (color)))) ) (conditional 'object-legs*join :condacts '((object (object (obj))) (legs (value (legs))) (object-legs (object (obj)) (legs (legs)))) ) (conditional 'object-alive*join :condacts '((object (object (obj))) (alive (value (alive))) (object-alive (object (obj)) (alive (alive)))) ) (format trace-stream "~%~&~&********************Training Phase ~%") (trials 1) (learn) (setq post-run `((when (= decision-count (+ ,number-of-decisions ,number-of-test-decisions)) (halt)))) (setf post-t '()) (setq perceive-list `( (perceive-object-features nil) )) (setq post-d '( (format trace-stream "~&~&Guess object: ~S~%" (best-in-plm (vnp 'object))) )) (format trace-stream "~%~&~&******************Testing Phase ~%") (trials 1) )
The above model has two parts: training and testing. In a cognitive architecture setting, differentiation between training and testing is not meaningful. Nonetheless, we have created such a split in this model for demonstrative purposes. So in the above model, the first part is used for training and then learning is turned off by (learn)
- calling learn
with no arguments turns off the gradient descent learning. The major distinction between the training and testing phases is that in the testing phase, objects are not perceived, whereas they are perceived in the training phase. This is achieved via setting the parameter passed to the perceive-object-features
function to nil. So in the testing phase, perceive-list
is updated via:
(setq perceive-list `( (perceive-object-features nil) ))
(random-walk-10 200)
object-prior
predicate is ((ppwm)
):
WM-OBJECT: [WALKER] [TABLE] [DOG] [HUMAN] 0.38091165 0.18750984 0.13409288 0.29748565
This is different than the true population, which is uniform distribution but 200 is relatively a small sample size.
The learned conditional probabilities for the features look like:
WM-OBJECT x WM-COLOR: [WALKER] [TABLE] [DOG] [HUMAN] [SILVER] 0.9226708 0.55758697 5.102041E-4 4.950495E-4 [BROWN] 0.038664583 0.4419131 0.7008815 0.53610927 [WHITE] 0.038664583 5.0E-4 0.29860833 0.46339563 WM-OBJECT x WM-LEGS: [WALKER] [TABLE] [DOG] [HUMAN] [0] 8.5179005E-3 4.950495E-4 5.050505E-4 4.9020804E-4 [1] 0.9659284 4.950495E-4 5.050505E-4 0.15040179 [2] 8.5179005E-3 4.950495E-4 5.050505E-4 0.8481276 [3] 8.5179005E-3 0.2946807 0.3414362 4.9020804E-4 [4] 8.5179005E-3 0.7038342 0.6570487 4.9020804E-4 WM-OBJECT x WM-ALIVE: [WALKER] [TABLE] [DOG] [HUMAN] [FALSE] 0.89606596 0.83220387 0.13817707 0.12088767 [TRUE] 0.10393404 0.1677961 0.86182297 0.8791123
These are again different than the true probability distributions but they capture what is seen in the perceived samples.
In the testing phase, the object is not perceived. So the task here is to predict the object using the perceptions on the features.
The model’s best prediction on the object can be extracted via:
(best-in-plm (vnp 'object))
The Sigma function vnp
accesses the posterior of a predicate in plm form. Another Sigma function best-in-plm
works only if there is a single variable (other than state variable) defined in the predicate. It finds the value with the highest probability in the alternatives and extracts it. So in this case, the model’s guess on the object is basically the alternative that has the highest probability in the posterior distribution.
When we look at the testing phase, we see that in all 10 cases the model was able to generate the correct answer:
******************Testing Phase >>> Trial 1 <<< <<< Decision 201 >>> Correct object: WALKER Guess object: WALKER <<< Decision 202 >>> Correct object: DOG Guess object: DOG <<< Decision 203 >>> Correct object: WALKER Guess object: WALKER <<< Decision 204 >>> Correct object: WALKER Guess object: WALKER <<< Decision 205 >>> Correct object: HUMAN Guess object: HUMAN <<< Decision 206 >>> Correct object: TABLE Guess object: TABLE <<< Decision 207 >>> Correct object: WALKER Guess object: WALKER <<< Decision 208 >>> Correct object: HUMAN Guess object: HUMAN <<< Decision 209 >>> Correct object: TABLE Guess object: TABLE <<< Decision 210 >>> Correct object: WALKER Guess object: WALKER
perceive-object-features-grid
is provided below:
(defun perceive-object-features-grid (perceive-object) (let ( (rand0 (random 1.0)) (rand (random 1.0)) (rand2 (random 1.0)) object ) (case 1d-grid-location (1 (cond ((< rand0 0.5) (setf object 'dog) ) ((< rand0 1) (setf object 'human) ) ) ) (2 (cond ((< rand0 0.5) (setf object 'human) ) ((< rand0 1) (setf object 'table) ) ) ) (3 (cond ((< rand0 0.5) (setf object 'walker) ) ((< rand0 1) (setf object 'human) ) ) ) (4 (cond ((< rand0 0.5) (setf object 'table) ) ((< rand0 1) (setf object 'walker) ) ) ) (5 (cond ((< rand0 0.5) (setf object 'dog) ) ((< rand0 1) (setf object 'walker)) ) ) (6 (cond ((< rand0 0.5) (setf object 'table) ) ((< rand0 1) (setf object 'dog) ) ) ) (7 (cond ((< rand0 0.5) (setf object 'walker) ) ((< rand0 1) (setf object 'table) ) ) ) ) (unless perceive-object (format trace-stream "~&~&Correct object: ~S~%" object )) (if perceive-object (perceive `((object 1 (object ,object)))) (perceive '((object 1 (object *)))) ) (cond ((eq object 'dog) (perceive '((alive 1 (value true)))) (cond ( (< rand 0.7) (perceive '((color 1 (value brown))))) ( (< rand 0.95) (perceive '((color 1 (value white))))) ( (< rand 1) (perceive '((color 1 (value silver))))) ) (cond ((< rand2 0.8) (perceive '((legs 1 (value 4))))) ((< rand2 1) (perceive '((legs 1 (value 3))))) ) ) ((eq object 'walker) (perceive '((alive 1 (value false)))) (cond ( (< rand 0.95) (perceive '((color 1 (value silver))))) ( (< rand 1) (perceive '((color 1 (value brown))))) ) (cond ( (< rand2 1) (perceive '((legs 1 (value 1))))) ) ) ((eq object 'human) (perceive '((alive 1 (value true)))) (cond ((< rand 0.5) (perceive '((color 1 (value brown))))) ((< rand 1) (perceive '((color 1 (value white))))) ) (cond ((< rand2 0.97) (perceive '((legs 1 (value 2))))) ((< rand2 0.99) (perceive '((legs 1 (value 1))))) ((< rand2 1.0) (perceive '((legs 1 (value 0))))) ) ) ((eq object 'table) (perceive '((alive 1 (value false)))) (cond ((< rand 0.4) (perceive '((color 1 (value brown))))) ((< rand 1) (perceive '((color 1 (value silver))))) ) (cond ((< rand2 0.9) (perceive '((legs 1 (value 4))))) ((< rand2 1) (perceive '((legs 1 (value 3))))) ) ) ) ) )
Integrating random-walk-9
and random-walk-10
is relatively straightforward. The main conceptual argument that needs to be made over here is that the combination of map
and location
predicates in random-walk-9
replace the object-prior
predicate in random-walk-10
. So the priors on the objects originate from the belief that the agent has on its current location and the map that the agent is building up. The perceived-objects
conditional needs to be updated to have all the predicate patterns appearing under condacts:
(conditional 'perceived-objects :condacts '( (object (object (obj))) (location (x (loc))) (map (object (obj)) (location (loc))) ) )
This update achieves bidirectional information flow between the map, object and location predicates and hence, the combination of map and location can be used as a prior on the object.
(defun random-walk-11(number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) (setf MAX-DECISIONS number-of-decisions) (learn '(:gd)) ;(setf trace-decisions nil) (setf max-gdl-increment 0.2) (setf learning-rate 0.01) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (new-type '1d-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(s table dog human)) (new-type 'color :constants '(silver brown white)) (new-type 'i04 :numeric t :discrete t :min 0 :max 5) (predicate 'location :perception t :arguments '((x 1D-grid %))) (predicate 'location-selected :world 'closed :arguments '((x 1D-grid !))) (predicate 'object :perception t :arguments '((object obj-type %))) (predicate 'map :arguments '( (location 1D-grid) (object obj-type %)) :function 1) (predicate 'legs :perception t :arguments '( (value i04 %))) (predicate 'color :perception t :arguments '((value color %))) (predicate 'alive :perception t :arguments '((value boolean %))) ; Function predicates (predicate 'object-color :arguments '((object obj-type) (color color %)) :function 1) (predicate 'object-legs :arguments '((object obj-type) (legs i04 %)) :function 1) (predicate 'object-alive :arguments '((object obj-type) (alive boolean %)) :function 1) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `((when (= decision-count ,number-of-decisions) (halt)))) (setq pre-d '( (format trace-stream "~&~&Current location pre-d: ~S~%" 1d-grid-location) )) (setf post-t '( (ppf 'object-color 'array) (ppf 'object-legs 'array) (ppf 'object-alive 'array) (ppf 'map 'array) ) ) (setq perceive-list `( (perceive-location ,perception-prob ,perception-mass) (perceive-object-features-grid t) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'perceived-objects :condacts '( (object (object (obj))) (location (x (loc))) (map (object (obj)) (location (loc))) ) ) (conditional 'object-color*join :condacts '((object (object (obj))) (color (value (color))) (object-color (object (obj)) (color (color)))) ) (conditional 'object-legs*join :condacts '((object (object (obj))) (legs (value (legs))) (object-legs (object (obj)) (legs (legs)))) ) (conditional 'object-alive*join :condacts '((object (object (obj))) (alive (value (alive))) (object-alive (object (obj)) (alive (alive)))) ) (conditional 'select-location :conditions '((location (x (location)))) :actions '((location-selected (x (location)))) ) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) (learn) (setq post-run `((when (= decision-count (+ ,number-of-decisions 10)) (halt)))) (setq perceive-list `( (perceive-location ,perception-prob ,perception-mass) (perceive-object-features-grid nil) )) (setq post-d '( ;(format trace-stream "~&~&Current location: ~S~%" 1d-grid-location) (format trace-stream "~&~&Guess object: ~S~%" (best-in-plm (vnp 'object))) (format trace-stream "~&~&Guess location: ~S~%" (best-in-plm (vnp 'location))) )) (trials 1) )
We run this model with
(random-walk-11 200)
WM-LOCATION x WM-OBJECT: [1] [2] [3] [4] [5] [6] [7] [WALKER] 0.07021035 0.045923394 0.4813388 0.38056886 0.41011244 0.047849633 0.6011289 [TABLE] 0.07021035 0.498842 0.028358559 0.48085094 0.06240859 0.5249279 0.39765155 [DOG] 0.36692062 0.045923394 0.028358559 0.06929011 0.4650704 0.37937284 6.097859E-4 [HUMAN] 0.4926587 0.40931124 0.46194407 0.06929011 0.06240859 0.047849633 6.097859E-4
In the testing phase, the model correctly predicts the objects at different locations when we don’t have noise in location perceptions:
******************Testing Phase >>> Trial 1 <<< <<< Decision 201 >>> Correct object: TABLE Current location pre-d: 4 Guess object: TABLE Guess location: 4 <<< Decision 202 >>> Correct object: WALKER Current location pre-d: 5 Guess object: WALKER Guess location: 5 <<< Decision 203 >>> Correct object: DOG Current location pre-d: 6 Guess object: DOG Guess location: 6 <<< Decision 204 >>> Correct object: DOG Current location pre-d: 5 Guess object: DOG Guess location: 5 <<< Decision 205 >>> Correct object: TABLE Current location pre-d: 6 Guess object: TABLE Guess location: 6 <<< Decision 206 >>> Correct object: DOG Current location pre-d: 5 Guess object: DOG Guess location: 5 <<< Decision 207 >>> Correct object: WALKER Current location pre-d: 5 Guess object: WALKER Guess location: 5 <<< Decision 208 >>> Correct object: WALKER Current location pre-d: 4 Guess object: WALKER Guess location: 4 <<< Decision 209 >>> Correct object: WALKER Current location pre-d: 3 Guess object: WALKER Guess location: 3 <<< Decision 210 >>> Correct object: TABLE Current location pre-d: 2 Guess object: TABLE Guess location: 2
Action modeling (& templates)
In this part of the tutorial (for random walk models 12, 13, and 14), we are going to discuss the templates in Sigma and how these templates can be leveraged to achieve complex functionality. Before getting into details with templates, let's first discuss diachronic processing in Sigma. For our random walk model, the actual states (the current location) are latent as we have discussed in the SLAM
example. In such models, system state evolves through time (diachronic processing) and Sigma defines a prediction mode
to architecturally distinguish the current state from the previous. Such architectural distinction enables both the current state and the previous state to be accessed simultaneously. Setting the Sigma parameter diachronic-processing
to true will enable the prediction mode in Sigma. In this mode, for any closed world state predicate, an open world *next is created automatically by the architecture. For example, let's assume there is a closed world state predicate named location
is defined when the prediction mode is on (diachronic processing is set to true):
(predicate 'location :world 'closed :perception t :arguments '((state state) (x 1D-grid !)))
(PREDICATE 'SELECTED :WORLD 'CLOSED :PERSISTENT T :UNIQUE '(OPERATOR) :SELECT 'BOLTZMANN :ARGUMENTS '((STATE STATE) (OPERATOR OPERATOR !))) (PREDICATE 'TIME :WORLD 'CLOSED :PERSISTENT T :UNIQUE '(VALUE) :SELECT 'BEST :ARGUMENTS '((VALUE TIME !))) (PREDICATE 'STATE :WORLD 'CLOSED :PERSISTENT T :ARGUMENTS '((STATE STATE))) (PREDICATE 'HALT :WORLD 'CLOSED :PERSISTENT T) (PREDICATE 'LOCATION :WORLD 'CLOSED :PERSISTENT T :UNIQUE '(X) :SELECT 'BEST :PERCEPTION T :ARGUMENTS '((STATE STATE) (X 1D-GRID !))) (PREDICATE 'LOCATION*NEXT :WORLD 'OPEN :UNIQUE '(X) :PERCEPTION T :ARGUMENTS '((STATE STATE) (X 1D-GRID %))) T
The underlying assumption is the previous state is captured by the closed world predicate and the open world predicate captures the current state. For our current example, the location
predicate now captures the previous state and the location*next
predicate captures the current state. As time elapses (or new decisions are made), the architecture automatically shifts the contents of the location*next
predicate to the location predicate.
So now we are going to adjust our random walk model to work with diachronic processing in Sigma to better illustrate the use of templates in Sigma. First, perception function is updated. We changed the function name to perceive-location*next
. Only change in the function is that perceptions are now read into location*next
predicate (as it captures the current state):
(defun perceive-location*next (&optional correct-prob correct-mass) (unless correct-prob (setq correct-prob 1.0)) (unless correct-mass (setq correct-mass 1.0)) (let ((rand (random 1.0)) location ; Perceived location 1-cm ) (setq 1-cm (- 1 correct-mass)) (setq location 1d-grid-location) (format trace-stream "~&~&Perceive Current location: ~S~%" 1d-grid-location) ; Perceive new location with correct-prob of getting right (and otherwise on one side) (cond ((= 1d-grid-location 1) (when (>= rand correct-prob) (setq location 2)) ) ((= 1d-grid-location 7) (when (>= rand correct-prob) (setq location 6)) ) (t (when (>= rand correct-prob) (if (< (random 1.0) .5) (setq location (1- 1d-grid-location)) (setq location (1+ 1d-grid-location))) ) ) ) ; Zero out all perception for predicate location (perceive '((location*next 0))) ; Generate noisy perceptions based on correct-mass (perceive `((location*next ,correct-mass (x ,location)))) ; Correct-mass at location ; Divide incorrect mass among adjacent locations when they exist (cond ((= location 1) (perceive `((location*next ,1-cm (x 2)))) ) ((= location 7) (perceive `((location*next ,1-cm (x 6)))) ) (t (perceive `((location*next ,(/ 1-cm 2) (x ,(1- location))) (location*next ,(/ 1-cm 2) (x ,(1+ location))) ) ) ) ) ) )
Templates automatically create any additional predicates, conditionals and other structures that are needed from the predicates that have been explicitly specified. Prediction mode supports the learning of transition functions, where access to successive pairs of states is essential. A template has been defined in Sigma for probabilistic transition functions that automatically turns on prediction mode if it isn’t already on, to yield *next predicates, and creates a transition conditional for each unique closed-world state predicate that is defined (where a state predicate is a predicate that includes an argument for the state). If the Selected predicate is defined, a condition for it is also included in the conditional, to convert the transition function into an action model. So let's put this in action within our random walk example. Turning on action modeling is done via (learn '(:am))
instruction. The :am
does multiple things: (1) Turns on gradient descent learning (2) Turns on the prediction mode (diachronic processing) and creates the *next predicates, and (3) Creates the required predicate and conditional for the action learning. So the stripped down random-walk model with only action modeling looks like:
(defun random-walk-12(number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) (learn '(:am)) (setf learning-rate 0.05) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (setf max-decisions 10000) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (predicate 'location :world 'closed :perception t :arguments '((state state) (x 1D-grid !))) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `( (when (= decision-count ,number-of-decisions) (halt)))) (setq perceive-list `((perceive-location*next ,perception-prob ,perception-mass) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
(PREDICATE 'LOCATION*NEXT :WORLD 'OPEN :UNIQUE '(X) :PERCEPTION T :ARGUMENTS '((STATE STATE) (X 1D-GRID %))) (PREDICATE 'ACTION-1212 :WORLD 'OPEN :UNIQUE '(X-2) :ARGUMENTS '((X-0 1D-GRID) (OPERATOR-1 OPERATOR) (X-2 1D-GRID)) :FUNCTION 1)
The predicate ACTION-1212
is the predicate that captures the state transitions and stores them in its function. The name is automatically generated so we need to check the name of this predicate if we need to access this predicate. The state transition probabilities are learned with the help of the architecture generated conditional:
(CONDITIONAL 'LOCATION-PREDICTION :CONDITIONS '((STATE (STATE (S))) (LOCATION (STATE (S)) (X (X-0))) (SELECTED (STATE (S)) (OPERATOR (OPERATOR-1)))) :CONDACTS '((LOCATION*NEXT (STATE (S)) (X (X-2))) (ACTION-1212 (X-0 (X-0)) (OPERATOR-1 (OPERATOR-1)) (X-2 (X-2)))) )
After running the model for 500 decisions via (random-walk-12 500), we can check the learned the transition
(ppf 'action-1212 'array)(don't forget the check the predicate name via '''(pps)
. Predicate name can be different as they are named by Sigma):
WM-OPERATOR-1 x [WM-X-0 x WM-X-2]: [LEFT] [1] [2] [3] [4] [5] [6] [7] [1] 0.9105056 0.99538446 3.076923E-4 3.1248297E-4 3.2258066E-4 3.236246E-4 3.2469237E-4 [2] 0.014915756 7.6924777E-4 0.9981539 3.1248297E-4 3.2258066E-4 3.236246E-4 3.2469237E-4 [3] 0.014915756 7.6924777E-4 3.076923E-4 0.99812514 3.2258066E-4 3.236246E-4 3.2469237E-4 [4] 0.014915756 7.6924777E-4 3.076923E-4 3.1248297E-4 0.9980645 3.236246E-4 3.2469237E-4 [5] 0.014915756 7.6924777E-4 3.076923E-4 3.1248297E-4 3.2258066E-4 0.9980582 3.2469237E-4 [6] 0.014915756 7.6924777E-4 3.076923E-4 3.1248297E-4 3.2258066E-4 3.236246E-4 0.9980519 [7] 0.014915756 7.6924777E-4 3.076923E-4 3.1248297E-4 3.2258066E-4 3.236246E-4 3.2469237E-4 [RIGHT] [1] [2] [3] [4] [5] [6] [7] [1] 7.3529414E-4 3.0674847E-4 3.144654E-4 3.311514E-4 3.3005004E-4 3.2894738E-4 3.2574142E-4 [2] 0.9955883 3.0674847E-4 3.144654E-4 3.311514E-4 3.3005004E-4 3.2894738E-4 3.2574142E-4 [3] 7.3529414E-4 0.99815965 3.144654E-4 3.311514E-4 3.3005004E-4 3.2894738E-4 3.2574142E-4 [4] 7.3529414E-4 3.0674847E-4 0.99811316 3.311514E-4 3.3005004E-4 3.2894738E-4 3.2574142E-4 [5] 7.3529414E-4 3.0674847E-4 3.144654E-4 0.99801314 3.3005004E-4 3.2894738E-4 3.2574142E-4 [6] 7.3529414E-4 3.0674847E-4 3.144654E-4 3.311514E-4 0.9980197 3.2894738E-4 3.2574142E-4 [7] 7.3529414E-4 3.0674847E-4 3.144654E-4 3.311514E-4 3.3005004E-4 0.9980264 0.99804557 [NONE] [1] [2] [3] [4] [5] [6] [7] [1] 0.9955554 7.2992704E-4 3.1054198E-4 0.014915708 7.0707677E-3 3.3783785E-4 3.4129692E-4 [2] 7.4075774E-4 0.9956205 3.1054198E-4 0.014915708 7.0707677E-3 3.3783785E-4 3.4129692E-4 [3] 7.4075774E-4 7.2992704E-4 0.99813676 0.014915708 7.0707677E-3 3.3783785E-4 3.4129692E-4 [4] 7.4075774E-4 7.2992704E-4 3.1054198E-4 0.9105058 7.0707677E-3 3.3783785E-4 3.4129692E-4 [5] 7.4075774E-4 7.2992704E-4 3.1054198E-4 0.014915708 0.9575754 3.3783785E-4 3.4129692E-4 [6] 7.4075774E-4 7.2992704E-4 3.1054198E-4 0.014915708 7.0707677E-3 0.99797297 3.4129692E-4 [7] 7.4075774E-4 7.2992704E-4 3.1054198E-4 0.014915708 7.0707677E-3 3.3783785E-4 0.9979522
So in a no noise model, the transition probabilities are learned perfectly.
Perception modeling
As with transition functions, a template has been defined for perceptual memories. It enables prediction mode if it isn’t already enabled, and then creates for each perceptual predicate a conditional that has a condition for the state plus condacts for: (1) the perceptual predicate; (2) all of the Next predicates; and (3) a new memorial predicate whose function stores the conditional probability of the unique variables in the perceptual predicate given the variables in the Next predicates.
The instruction (learn '(:am :pm))
turns on both action modeling and perception modeling. This also turns on prediction mode (diachronic processing) and gradient descent learning similar to the action modeling only case. In addition to the random-walk-12
model, this model also has the object predicate, which is a perceptual predicate, similar to the external object
concept introduced earlier.
(defun random-walk-13(number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) (learn '(:pm :am)) (setf learning-rate 0.05) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (setf max-decisions 10000) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(walker table dog human)) (predicate 'location :world 'closed :perception t :arguments '((state state) (x 1D-grid !))) (predicate 'object :perception t :arguments '((state state) (object obj-type %))) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `( (when (= decision-count ,number-of-decisions) (halt)))) (setq perceive-list `((perceive-location*next ,perception-prob ,perception-mass) (perceive-object) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
This model introduces two architecture generated predicates:
(PREDICATE 'ACTION-1344 :WORLD 'OPEN :UNIQUE '(X-2) :ARGUMENTS '((X-0 1D-GRID) (OPERATOR-1 OPERATOR) (X-2 1D-GRID)) :FUNCTION 1) (PREDICATE 'PERCEPTION-1345 :WORLD 'OPEN :UNIQUE '(OBJECT-1) :ARGUMENTS '((OBJECT-1 OBJ-TYPE) (X-0 1D-GRID)) :FUNCTION 1)
and two architecture generated conditionals:
(CONDITIONAL 'LOCATION-PREDICTION :CONDITIONS '((STATE (STATE (S))) (LOCATION (STATE (S)) (X (X-0))) (SELECTED (STATE (S)) (OPERATOR (OPERATOR-1)))) :CONDACTS '((LOCATION*NEXT (STATE (S)) (X (X-2))) (ACTION-1344 (X-0 (X-0)) (OPERATOR-1 (OPERATOR-1)) (X-2 (X-2)))) ) (CONDITIONAL 'OBJECT-PERCEPTION-PREDICTION :CONDITIONS '((STATE (STATE (S)))) :CONDACTS '((OBJECT (STATE (S)) (OBJECT (OBJECT-1))) (LOCATION*NEXT (STATE (S)) (X (X-0))) (PERCEPTION-1345 (OBJECT-1 (OBJECT-1)) (X-0 (X-0)))) )
When we run the model, we can see that perception model learned the right function for this no noise model by calling (ppf 'perception-1345 'array)
:
WM-OBJECT-1 x WM-X-0: [WALKER] [TABLE] [DOG] [HUMAN] [1] 4.608295E-4 4.608295E-4 0.99861753 4.608295E-4 [2] 0.103523 0.103523 0.103523 0.6894311 [3] 0.71101916 0.09632693 0.09632693 0.09632693 [4] 0.09632695 0.71101916 0.09632695 0.09632695 [5] 0.06952423 0.06952423 0.7914274 0.06952423 [6] 0.012252189 0.9632434 0.012252189 0.012252189 [7] 0.99906266 3.124702E-4 3.124702E-4 3.124702E-4
##Reinforcement learning ##
Now, let's assume that the agent is looking for the human, which is located at location 2 in the 1D-grid. What we want is to learn how to get to the human location as quickly as possible. This could be done by introducing a reward structure and leveraging reinforcement learning in Sigma. Let's define a reward function, which has a reward of 9 at location 2 but no reward anywhere else.
; Fixed vector of rewards for use in assign-reward (defparameter rewards-rw (vector 0 9 0 0 0 0 0 ))
Let's also assume we have a reward predicate that utilizes the rewards (This predicate will be automatically defined by the template):
(PREDICATE 'REWARD :WORLD 'OPEN :UNIQUE '(VALUE) :PERCEPTION T :ARGUMENTS '((LOCATION-X 1D-GRID) (VALUE UTILITY %)) :FUNCTION '((0 * (0 20)) (0.1 * (0 10))))
; Assign a fixed reward based on location (defun assign-reward-rw (rewards-rw) (let ((cl 1d-grid-location)) (eval `(perceive (quote ((reward .1 (location-x *) (value *)) ; Empty WM of any previous rewards (reward (location-x ,cl) (value ,(aref rewards-rw (- cl 1)))))))) ; Add reward for current state ) )
The two functions defined above provides the basic external mechanism to run a reinforcement learning model in Sigma. The reinforcement learning model initialized exactly the same as perception-action modeling, the only difference is that the instruction used is now (learn '(:pm :am :rl))
. The basic model is:
; 1D Grid for RL with automatic RL structure generation (defun random-walk-14(number-of-decisions &optional perception-prob perception-mass action-prob) (init '(left right none)) ;(init-temporal-conditional) (learn '(:pm :am :rl)) (setf learning-rate 0.05) (operator-selection 'boltzmann) (setf max-fraction-pa-regions 0) (setf max-decisions 10000) (new-type '1D-grid :numeric t :discrete t :min 1 :max 8) (new-type 'obj-type :constants '(walker table dog human)) (predicate 'location :world 'closed :perception t :arguments '((state state) (x 1D-grid !))) (predicate 'object :perception t :arguments '((state state) (object obj-type %))) (setq pre-t '((setf 1d-grid-location 4) )) (setq post-run `( (when (= decision-count ,number-of-decisions) (halt)))) (setq post-d '( (format trace-stream "~&~&Perceived object post-d: ~S~%" (best-in-plm (vnp 'object) 0)) )) (setq post-t '( (test-rl-print-tutorial trace-stream) )) (setq perceive-list `( (assign-reward-rw rewards-rw) (format trace-stream "~&~&Perceived object at perception: ~S~%" (best-in-plm (vnp 'object) 0)) (perceive-location*next ,perception-prob ,perception-mass) (perceive-object) )) (setq action-list `((execute-operator ,action-prob))) (conditional 'acceptable :actions '((selected (operator *)) ) ) (trials 1) )
So the new predicates defined in the model in addition to the ones defined in action and perception modeling are:
(PREDICATE 'PROJECTED :WORLD 'OPEN :UNIQUE '(VALUE) :ARGUMENTS '((LOCATION-X 1D-GRID) (VALUE UTILITY %)) :FUNCTION 1) (PREDICATE 'PROJECTED*NEXT :WORLD 'OPEN :UNIQUE '(VALUE) :ARGUMENTS '((LOCATION-X 1D-GRID) (VALUE UTILITY %)) :FUNCTION 'PROJECTED) (PREDICATE 'REWARD :WORLD 'OPEN :UNIQUE '(VALUE) :PERCEPTION T :ARGUMENTS '((LOCATION-X 1D-GRID) (VALUE UTILITY %)) :FUNCTION '((0 * (0 20)) (0.1 * (0 10)))) (PREDICATE 'Q :WORLD 'OPEN :UNIQUE '(VALUE) :ARGUMENTS '((LOCATION-X 1D-GRID) (OPERATOR OPERATOR) (VALUE UTILITY %)) :FUNCTION 1)
Template-driven reinforcement learning in Sigma starts with perceptual learning of a reward function. This is then used as the basis for conjointly learning the projected future utility of states in the function for the Projected predicate, and the policy in the function of the Q predicate. The projected future value for the next state is handled via the projected*next
predicate. The important point here is that projected
and projected*next
predicates' functions are tied. In other words, what is being learned in this function shared by these two predicates. The BACKUP-PROJECTED
conditional shows the conditional that drives Projected learning, based on backing up the sum of the reward and the discounted projected value for the next state (from Projected*Next). This conditional uses an affine transform to compute the Projected distribution from which it will learn. It furthermore weights this by the filtered Q distribution – which converts the explicit distribution over utilities into an implicit distribution of functional values. The conditional for backing up the policy is much like this one, but has an action for the Q predicate instead. These conditionals are:
(CONDITIONAL 'BACKUP-PROJECTED :CONDITIONS '((STATE (STATE (S))) (LOCATION (STATE (S)) (X (X-0))) (LOCATION*NEXT (STATE (S)) (X (X-1))) (SELECTED (STATE (S)) (OPERATOR (OPERATOR-2))) (PROJECTED*NEXT (LOCATION-X (X-1)) (VALUE (VALUE-4))) (REWARD (LOCATION-X (X-1)) (VALUE (VALUE-5))) (Q (OPERATOR (OPERATOR-2)) (LOCATION-X (X-0)) (VALUE (Q (:FILTER #))))) :ACTIONS '((PROJECTED (LOCATION-X (X-0)) (VALUE (VALUE-4 (:COEFFICIENT 0.95 :OFFSET VALUE-5 :PAD 0 :APPLY-COEFFICIENT-TO-OFFSET T))))) ) (CONDITIONAL 'BACKUP-Q :CONDITIONS '((STATE (STATE (S))) (LOCATION (STATE (S)) (X (X-0))) (LOCATION*NEXT (STATE (S)) (X (X-1))) (SELECTED (STATE (S)) (OPERATOR (OPERATOR-2))) (PROJECTED*NEXT (LOCATION-X (X-1)) (VALUE (VALUE-4))) (REWARD (LOCATION-X (X-1)) (VALUE (VALUE-5))) (Q (OPERATOR (OPERATOR-2)) (LOCATION-X (X-0)) (VALUE (Q (:FILTER #))))) :ACTIONS '((Q (LOCATION-X (X-0)) (OPERATOR (OPERATOR-2)) (VALUE (VALUE-4 (:COEFFICIENT 0.95 :OFFSET VALUE-5 :PAD 0 :APPLY-COEFFICIENT-TO-OFFSET T))))) )
The model can be run by the command: (random-walk-14 1000)
In order to get expected values (EV) for the functions learned, a separate print function is defined that is called post trial in random-walk-14
model. This function is:
(defun test-rl-print-tutorial (stream) (format stream "~&~%PROJECTED FF (EV):~&") (pa 'projected nil '((expected wm-value)) stream) (format stream "~&~%Q FF (EV):~&") (pa 'q nil '((expected wm-value)) stream) (format stream "~&~%REWARD FF (EV):~&") (pa 'reward nil '((expected wm-value)) stream) (format stream "~&~%") )
After running the model for 1000 decision cycles, these are the numbers that we get:
PROJECTED FF (EV): Predicate function for predicate PROJECTED: WM-LOCATION-X: [1] [2] [3] [4] [5] [6] [7] 14.06555 14.565967 10.645658 7.827994 7.6288076 7.505014 7.505039 Q FF (EV): Predicate function for predicate Q: WM-LOCATION-X x WM-OPERATOR: [1] [2] [3] [4] [5] [6] [7] [LEFT] 14.065821 14.416634 16.098381 9.194201 8.088579 7.9278145 7.7437873 [RIGHT] 15.949139 13.478781 8.110692 8.046615 7.718419 7.6289535 7.489125 [NONE] 14.38163 16.170727 11.662604 8.114301 8.000571 7.572637 7.6661134 REWARD FF (EV): Predicate function for predicate REWARD: WM-LOCATION-X: [1] [2] [3] [4] [5] [6] [7] 0.5039216 9.486249 0.50406987 0.50408686 0.5041055 0.5041229 0.50416816
The maximum reward is at location 2 as expected. The q values for left, right and none operators are accurate as they all try to take the agent to location 2. Projected reward gets higher as the agent gets closer to location 2.
Updated