pyPOMDP is a POMDP implementation for Python 2.x.


The following example show how the contoller and the simulator can be used.

First, the POMDP model is read from file (either Cassandra's file format or a JSON file).

The controller returns a control action on sending an observation (senseact()). The control action is selected according to it's current belief state and policy. The policy is also read from a file (so far no online planning).

The simulator samples the current system state internally from the process model with respect to the applied action. Based on that, it records the reward and emmits a sampled observation to the controller.

See the following examples for implementation details:

import time
import pypomdp

def main():
    # model
    model = None

    # read from JSON
    model_filename = './examples/models/shuttle.95.JSON'
    with open(model_filename, 'r') as fh:
        model = pypomdp.model.POMDP.read_json(fh)

    # simulator
    simulator = pypomdp.simulator.Simulator(pomdp=model)

    # controller
    policy_filename = './examples/models/shuttle.95.alpha'
    policy = pypomdp.policy.AlphaVectorValueFunction.\
            initFromCassandraFile(policy_filename, model)
    controller = pypomdp.control.ProdController(model=model, policy=policy)

    # run system
    action = controller.last_action
        reward, observ = simulator.simulate(action)
        action = controller.senseact(observ)

if __name__ == "__main__":

Development Notes:

Indexing for transition probabilities, observation probabilities and rewards:

  • transition probabilities: [action][source][sink]
  • observation probabilities: [action][state][observ]
  • rewards: [action][source][sink][observ]