pyPOMDP is a POMDP implementation for Python 2.x.
The following example show how the contoller and the simulator can be used.
First, the POMDP model is read from file (either Cassandra's file format or a JSON file).
The controller returns a control action on sending an observation (senseact()). The control action is selected according to it's current belief state and policy. The policy is also read from a file (so far no online planning).
The simulator samples the current system state internally from the process model with respect to the applied action. Based on that, it records the reward and emmits a sampled observation to the controller.
See the following examples for implementation details:
import time import pypomdp def main(): # model model = None # read from JSON model_filename = './examples/models/shuttle.95.JSON' with open(model_filename, 'r') as fh: model = pypomdp.model.POMDP.read_json(fh) # simulator simulator = pypomdp.simulator.Simulator(pomdp=model) # controller policy_filename = './examples/models/shuttle.95.alpha' policy = pypomdp.policy.AlphaVectorValueFunction.\ initFromCassandraFile(policy_filename, model) controller = pypomdp.control.ProdController(model=model, policy=policy) # run system action = controller.last_action while(True): time.sleep(1) reward, observ = simulator.simulate(action) action = controller.senseact(observ) if __name__ == "__main__": main()
Indexing for transition probabilities, observation probabilities and rewards:
- transition probabilities: [action][source][sink]
- observation probabilities: [action][state][observ]
- rewards: [action][source][sink][observ]