Source

RLPy / Domains / PST.py

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
"""Persistent search and track mission domain."""

from Tools import *
from Domain import Domain, BigDiscreteDomain
import copy
__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
            "William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = ["Robert H. Klein", "Alborz Geramifard"]


class PST(Domain, BigDiscreteDomain):
    """
    Persistent Search and Track Mission with multiple Unmanned Aerial Vehicle
    (UAV) agents.

    Goal is to perform surveillance and communicate it back to base
    in the presence of stochastic communication and \"health\"
    (overall system functionality) constraints, all without
    without losing any UAVs because of running out of fuel.

    | **STATE:**
    Each UAV has 4 state dimensions:

    - LOC: position of a UAV: BASE (0),  REFUEL (1), COMMS (2), SURVEIL (3).
    - FUEL: integer fuel qty remaining.
    - ACT_STATUS: Actuator status: see description for info.
    - SENS_STATUS: Sensor status: see description for info.

    Domain state vector consists of 4 blocks of states,
    each corresponding to a property of the UAVs (listed above)

    So for example:

        >>> state = [1,2,9,3,1,0,1,1]

    corresponds to blocks

        >>> loc, fuel, act_status, sens_status = [1,2], [9,3], [1,0], [1,1]

    which has the meaning:

    UAV 1 in location 1, with 9 fuel units remaining, and
    sensor + actuator with status 1 (functioning).
    UAV 2 in location 2, 3 fuel units remaining, actuator
    with status 0 and sensor with status 1.

    | **ACTIONS:**
    Each UAV can take one of 3 actions: {*RETREAT, LOITER, ADVANCE*}
    Thus, the action space is :math:`3^n`, where n is the number of UAVs.

    | **Detailed Description**
    The objective of the mission is to fly to the surveillance node and perform
    surveillance on a target, while ensuring that a communication link with the
    base is maintained by having a UAV with a working actuator loitering on
    the communication node.

    Movement of each UAV is deterministic with 5% failure rate for both the
    actuator and sensor of each UAV on each step.
    A penalty is applied for each unit of fuel consumed,
    which occurs when a UAV moves between locations or when it is loitering
    above a COMMS or SURVEIL location
    (ie, no penalty when loitering at REFUEL or BASE).

    A UAV with a failed sensor cannot perform surveillance.
    A UAV with a failed actuator cannot perform surveillance or communication,
    and can only take actions leading it back to the REFUEL or BASE states,
    where it may loiter.

    Loitering for 1 timestep at REFUEL assigns fuel of 10 to that UAV.

    Loitering for 1 timestep at BASE assigns status 1 (functioning) to
    Actuator and Sensor.

    Finally, if any UAV has fuel 0, the episode terminates with large penalty.


    | **REWARD**
    The objective of the mission is to fly to the surveillance node and perform
    surveillance on a target, while ensuring that a communication link with the
    base is maintained by having a UAV with a working actuator loitering on
    the communication node.

    The agent receives: + 20 if an ally with a working sensor is at surveillance
    node while an ally with a working motor is at the communication node,
    apenalty of - 50 if any UAV crashes and always some small penalty for burned fuel.

    **REFERENCE:**

    .. seealso::
        J. D. Redding, T. Toksoz, N. Ure, A. Geramifard, J. P. How, M. Vavrina,
        and J. Vian. Distributed Multi-Agent Persistent Surveillance and
        Tracking With Health Management.
        AIAA Guidance, Navigation, and Control Conference (2011).

    """
    episodeCap          = 1000 #: Maximum number of steps per episode
    gamma               = 0.9  #: Discount factor

    # Domain constants
    FULL_FUEL           = 10   #: Number of fuel units at start
    P_ACT_FAIL          = 0.05 #: Probability that an actuator fails on this timestep for a given UAV
    P_SENSOR_FAIL       = 0.05 #: Probability that a sensor fails on this timestep for a given UAV
#    CRASH_REWARD_COEFF  = -2.0 # Negative reward coefficient for running out of fuel (applied on every step) [C_crash] [-2.0 in tutorial]
    CRASH_REWARD        = -50 #: Reward for a crashed UAV (terminates episode)
    SURVEIL_REWARD      = 20  #: Per-step, per-UAV reward coefficient for performing surveillance on each step [C_cov]
    FUEL_BURN_REWARD_COEFF = -1 #: Negative reward coefficient: for fuel burn penalty [not mentioned in MDP Tutorial]
    MOVE_REWARD_COEFF   = 0   #: Reward (negative) coefficient for movement (i.e., fuel burned while loitering might be penalized above, but no movement cost)
    NUM_TARGET          = 1   #: Number of targets in surveillance region; SURVEIL_REWARD is multiplied by the number of targets successfully observed
    NUM_UAV             = None # Number of UAVs present in the mission [3 in tutorial
    NOM_FUEL_BURN       = 1 # Nominal rate of fuel depletion selected with probability P_NOM_FUEL_BURN

    # Domain variables
    motionNoise         = 0    #: Noise in action (with some probability, loiter rather than move)
    LIMITS              = []   # Limits on action indices

    # Plotting constants
    UAV_RADIUS = 0.3
    SENSOR_REL_X        = 0.2 # Location of the sensor image relative to the uav
    SENSOR_LENGTH       = 0.2 # Length of the sensor surveillance image
    ACTUATOR_REL_Y      = 0.2 # Location of the actuator image relative to the uav
    ACTUATOR_HEIGHT     = 0.2 # Height of the actuator comms image
    domain_fig          = None
    subplot_axes        = None
    location_rect_vis   = None # List of rectangle objects used in the plot
    uav_circ_vis        = None # List of circles used to represent UAVs in plot
    uav_text_vis        = None # List of fuel text used in plot
    uav_sensor_vis      = None # List of sensor wedges used in plot
    uav_actuator_vis    = None # List of actuator wedges used in plot
    comms_line          = None # List of communication lines used in plot
    location_coord      = None # Coordinates of the center of each rectangle
#    uav_vis_list = None # List of UAV objects used in plot
    LOCATION_WIDTH      = 1.0 # Width of each rectangle used to represent a location
    RECT_GAP            = 0.9   # Gap to leave between rectangles
    dist_between_locations = 0 # Total distance between location rectangles in plot, computed in init()

    possible_action_cache = {}
    ###
    def __init__(self, NUM_UAV = 3, motionNoise = 0, logger = None):
        """
        :param NUM_UAV: the number of UAVs in the domain
        :param motionNoise: probability of taking the LOITER action instead of the one selected.

        """

        self.NUM_UAV                = NUM_UAV
        self.states_num             = NUM_UAV * UAVIndex.SIZE       # Number of states (LOC, FUEL...) * NUM_UAV
        self.actions_num            = pow(UAVAction.SIZE,NUM_UAV)    # Number of Actions: ADVANCE, RETREAT, LOITER
        locations_lim = array(tile([0,UAVLocation.SIZE-1],(NUM_UAV,1)))
        fuel_lim =      array(tile([0,self.FULL_FUEL],(NUM_UAV,1)))
        actuator_lim =  array(tile([0,ActuatorState.SIZE-1],(NUM_UAV,1)))
        sensor_lim =    array(tile([0,SensorState.SIZE-1],(NUM_UAV,1)))
        self.statespace_limits      = vstack([locations_lim, fuel_lim, actuator_lim, sensor_lim])# Limits of each dimension of the state space. Each row corresponds to one dimension and has two elements [min, max]
        self.motionNoise            = motionNoise # with some noise, when uav desires to transition to new state, remains where it is (loiter)
        self.LIMITS                 = UAVAction.SIZE * ones(NUM_UAV, dtype='int') # eg [3,3,3,3], number of possible actions
        self.location_rect_vis      = None #
        self.location_coord         = None
        self.uav_circ_vis           = None
        self.uav_text_vis           = None
        self.uav_sensor_vis         = None
        self.uav_actuator_vis       = None
        self.comms_line             = None
        self.dist_between_locations = self.RECT_GAP + self.LOCATION_WIDTH
        self.DimNames = []
        [self.DimNames.append('UAV%d-loc' % i) for i in arange(NUM_UAV)]
        [self.DimNames.append('UAV%d-fuel' % i) for i in arange(NUM_UAV)]
        [self.DimNames.append('UAV%d-act' % i) for i in arange(NUM_UAV)]
        [self.DimNames.append('UAV%d-sen' % i) for i in arange(NUM_UAV)]
        super(PST,self).__init__(logger)
        if self.logger: self.logger.log("NUM_UAV:\t\t%d" % self.NUM_UAV)
        def gk(key):
            key["policy"] = (key["policy"].__class__.__name__,
                            copy.copy(key["policy"].__getstate__()))
            if "domain" in key["policy"][1]:
                del key["policy"][1]["domain"]
            if "random_state" in key["policy"][1]:
                del key["policy"][1]["random_state"]

            key["self"] = copy.copy(key["self"].__getstate__())
            lst = [
                "random_state", "V", "domain_fig", "logger", "DirNames", "state",
                "transition_samples", "_all_states", "chain_model", "policy",
                "possible_action_cache", "location_rect_vis", "location_coord",
                "uav_circ_vis", "uav_text_vis", "uav_sensor_vis",
                "uav_actuator_vis", "comms_line", "DimNames", "stationary_distribution"]
            for l in lst:
                if l in key["self"]:
                    del key["self"][l]
            return key

        self.V = memory.cache(self.V, get_key=gk)
        self.stationary_distribution = memory.cache(self.stationary_distribution, get_key=gk)
        self.s0()

    def showDomain(self, a=0):
        s = self.state
        if self.domain_fig is None:
            self.domain_fig = pl.figure(1, (UAVLocation.SIZE * self.dist_between_locations + 1, self.NUM_UAV + 1))
            pl.show()
        pl.clf()
         #Draw the environment
         # Allocate horizontal 'lanes' for UAVs to traverse

        # Formerly, we checked if this was the first time plotting; wedge shapes cannot be removed from
        # matplotlib environment, nor can their properties be changed, without clearing the figure
        # Thus, we must redraw the figure on each timestep
        #        if self.location_rect_vis is None:
        # Figure with x width corresponding to number of location states, UAVLocation.SIZE
        # and rows (lanes) set aside in y for each UAV (NUM_UAV total lanes).  Add buffer of 1
        self.subplot_axes = self.domain_fig.add_axes([0, 0, 1, 1], frameon=False, aspect=1.)
        crashLocationX = 2*(self.dist_between_locations)*(UAVLocation.SIZE-1)
        self.subplot_axes.set_xlim(0, 1 + crashLocationX + self.RECT_GAP)
        self.subplot_axes.set_ylim(0, 1 + self.NUM_UAV)
        self.subplot_axes.xaxis.set_visible(False)
        self.subplot_axes.yaxis.set_visible(False)

        # Assign coordinates of each possible uav location on figure
        self.location_coord = [0.5 + (self.LOCATION_WIDTH / 2) + (self.dist_between_locations)*i for i in range(UAVLocation.SIZE-1)]
        self.location_coord.append(crashLocationX + self.LOCATION_WIDTH / 2)

         # Create rectangular patches at each of those locations
        self.location_rect_vis = [mpatches.Rectangle([0.5 + (self.dist_between_locations)*i, 0], self.LOCATION_WIDTH, self.NUM_UAV * 2, fc = 'w') for i in range(UAVLocation.SIZE-1)]
        self.location_rect_vis.append(mpatches.Rectangle([crashLocationX, 0], self.LOCATION_WIDTH, self.NUM_UAV * 2, fc = 'w'))
        [self.subplot_axes.add_patch(self.location_rect_vis[i]) for i in range(4)]
        self.comms_line = [lines.Line2D([0.5 + self.LOCATION_WIDTH + (self.dist_between_locations)*i, 0.5 + self.LOCATION_WIDTH + (self.dist_between_locations)*i + self.RECT_GAP],[self.NUM_UAV*0.5 + 0.5, self.NUM_UAV * 0.5 + 0.5], linewidth = 3, color='black', visible=False) for i in range(UAVLocation.SIZE-2)]
        self.comms_line.append(lines.Line2D([0.5 + self.LOCATION_WIDTH + (self.dist_between_locations)*2, crashLocationX],[self.NUM_UAV*0.5 + 0.5, self.NUM_UAV * 0.5 + 0.5], linewidth = 3, color='black', visible=False))

        # Create location text below rectangles
        locText = ["Base","Refuel","Communication","Surveillance"]
        self.location_rect_txt = [pl.text(0.5 + self.dist_between_locations*i + 0.5*self.LOCATION_WIDTH,-0.3,locText[i], ha = 'center') for i in range(UAVLocation.SIZE-1)]
        self.location_rect_txt.append(pl.text(crashLocationX + 0.5*self.LOCATION_WIDTH,-0.3,locText[UAVLocation.SIZE-1], ha = 'center'))

        # Initialize list of circle objects

        uav_x = self.location_coord[UAVLocation.BASE]

        # Update the member variables storing all the figure objects
        self.uav_circ_vis = [mpatches.Circle((uav_x,1+uav_id), self.UAV_RADIUS, fc="w") for uav_id in range(0,self.NUM_UAV)]
        self.uav_text_vis = [None for uav_id in range(0,self.NUM_UAV)] # fuck
        self.uav_sensor_vis = [mpatches.Wedge((uav_x+self.SENSOR_REL_X, 1+uav_id),self.SENSOR_LENGTH, -30, 30) for uav_id in range(0,self.NUM_UAV)]
        self.uav_actuator_vis =[mpatches.Wedge((uav_x, 1+uav_id + self.ACTUATOR_REL_Y),self.ACTUATOR_HEIGHT, 60, 120) for uav_id in range(0,self.NUM_UAV)]

        # The following was executed when we used to check if the environment needed re-drawing: see above.
                # Remove all UAV circle objects from visualization
        #        else:
        #            [self.uav_circ_vis[uav_id].remove() for uav_id in range(0,self.NUM_UAV)]
        #            [self.uav_text_vis[uav_id].remove() for uav_id in range(0,self.NUM_UAV)]
        #            [self.uav_sensor_vis[uav_id].remove() for uav_id in range(0,self.NUM_UAV)]


        # For each UAV:
        # Draw a circle, with text inside = amt fuel remaining
        # Triangle on top of UAV for comms, black = good, red = bad
        # Triangle in front of UAV for surveillance

        sStruct = self.state2Struct(s)

        for uav_id in range(0,self.NUM_UAV):
            # Assign all the variables corresponding to this UAV for this iteration;
            # this could alternately be done with a UAV class whose objects keep track
            # of these variables.  Elect to use lists here since ultimately the state
            # must be a vector anyway.
            uav_location = sStruct.locations[uav_id] # State index corresponding to the location of this uav
            uav_fuel = sStruct.fuel[uav_id]
            uav_sensor = sStruct.sensor[uav_id]
            uav_actuator = sStruct.actuator[uav_id]

            # Assign coordinates on figure where UAV should be drawn
            uav_x = self.location_coord[uav_location]
            uav_y = 1 + uav_id

            # Update plot wit this UAV
            self.uav_circ_vis[uav_id] = mpatches.Circle((uav_x,uav_y), self.UAV_RADIUS, fc="w")
            self.uav_text_vis[uav_id] = pl.text(uav_x-0.05, uav_y-0.05, uav_fuel)
            if uav_sensor == SensorState.RUNNING: objColor = 'black'
            else: objColor = 'red'
            self.uav_sensor_vis[uav_id] = mpatches.Wedge((uav_x+self.SENSOR_REL_X,uav_y),self.SENSOR_LENGTH, -30, 30, color=objColor)

            if uav_actuator == ActuatorState.RUNNING: objColor = 'black'
            else: objColor = 'red'
            self.uav_actuator_vis[uav_id] = mpatches.Wedge((uav_x,uav_y + self.ACTUATOR_REL_Y),self.ACTUATOR_HEIGHT, 60, 120, color=objColor)

            self.subplot_axes.add_patch(self.uav_circ_vis[uav_id])
            self.subplot_axes.add_patch(self.uav_sensor_vis[uav_id])
            self.subplot_axes.add_patch(self.uav_actuator_vis[uav_id])

        numHealthySurveil = sum(logical_and(sStruct.locations == UAVLocation.SURVEIL, sStruct.sensor))
        if (any(sStruct.locations == UAVLocation.COMMS)): # We have comms coverage: draw a line between comms states to show this
            [self.comms_line[i].set_visible(True) for i in range(len(self.comms_line))]
            [self.comms_line[i].set_color('black') for i in range(len(self.comms_line))]
            if numHealthySurveil > 0: # We also have UAVs in surveillance; color the comms line black
                self.location_rect_vis[len(self.location_rect_vis)-1].set_color('green')
                #self.comms_line[len(self.comms_line)-1].set_color('red')
                #self.comms_line[len(self.comms_line)-1].set_visible(True)
                #midpoint = (0.5 + self.LOCATION_WIDTH + (self.dist_between_locations)*2 + crashLocationX)/2
                #self.subplot_axes.add_line(lines.Line2D([midpoint, midpoint],[self.NUM_UAV + 0.75, self.NUM_UAV + 0.25], linewidth = 3, color='red', visible=True))
        [self.subplot_axes.add_line(self.comms_line[i]) for i in range(len(self.comms_line))] # Only visible lines actually appear
        pl.draw()
        sleep(0.5)

    def showLearning(self,representation):
        pass

    def step(self,a):
        # Note below that we pass the structure by reference to save time; ie, components of sStruct refer directly to s
        ns = self.state.copy()
        sStruct = self.state2Struct(self.state)
        nsStruct = self.state2Struct(ns)
        # Subtract 1 below to give -1,0,1, easily sum actions
        actionVector = np.array(id2vec(a,self.LIMITS)) # returns list of form [0,1,0,2] corresponding to action of each uav
        nsStruct.locations += (actionVector-1)

        #TODO - incorporate cost graph as in matlab.
        fuelBurnedBool = [(actionVector[i] == UAVAction.LOITER and (nsStruct.locations[i] == UAVLocation.REFUEL or nsStruct.locations[i] == UAVLocation.BASE)) for i in arange(self.NUM_UAV)]
        fuelBurnedBool = array(fuelBurnedBool) == False
        nsStruct.fuel = array([sStruct.fuel[i] - self.NOM_FUEL_BURN * fuelBurnedBool[i] for i in arange(self.NUM_UAV)]) # if fuel
        fuelUnitsBurned = sum(fuelBurnedBool)
        distanceTraveled = sum(logical_and(nsStruct.locations, sStruct.locations))

        # Actuator failure transition
        randomFails = array([self.random_state.random_sample() for dummy in arange(self.NUM_UAV)])
        randomFails = randomFails > self.P_ACT_FAIL
        nsStruct.actuator = logical_and(sStruct.actuator, randomFails)

        # Sensor failure transition
        randomFails = array([self.random_state.random_sample() for dummy in arange(self.NUM_UAV)])
        randomFails = randomFails > self.P_SENSOR_FAIL
        nsStruct.sensor = logical_and(sStruct.sensor, randomFails)

        #Refuel those in refuel node
        refuelIndices = nonzero(logical_and(sStruct.locations == UAVLocation.REFUEL, nsStruct.locations == UAVLocation.REFUEL))
        nsStruct.fuel[refuelIndices] = self.FULL_FUEL

        # Fix sensors and motors in base state
        baseIndices = nonzero(logical_and(sStruct.locations == UAVLocation.BASE, nsStruct.locations == UAVLocation.BASE))
        nsStruct.actuator[baseIndices] = ActuatorState.RUNNING
        nsStruct.sensor[baseIndices] = SensorState.RUNNING

        # Test if have communication
        isCommStatesCovered = any(sStruct.locations == UAVLocation.COMMS)

        surveillanceBool = (sStruct.locations == UAVLocation.SURVEIL)
        numHealthySurveil = sum(logical_and(surveillanceBool, sStruct.sensor))

        totalStepReward = 0

        ns = self.struct2State(nsStruct)
        self.state = ns.copy()

        ##### Compute reward #####
        if isCommStatesCovered:
            totalStepReward += self.SURVEIL_REWARD * min(self.NUM_TARGET, numHealthySurveil)
        if self.isTerminal(): totalStepReward += self.CRASH_REWARD
        totalStepReward += self.FUEL_BURN_REWARD_COEFF * fuelUnitsBurned + self.MOVE_REWARD_COEFF * distanceTraveled # Presently movement penalty is set to 0
        return totalStepReward,ns,self.isTerminal(), self.possibleActions()

    def s0(self):
        locations   = ones(self.NUM_UAV, dtype='int') * UAVLocation.BASE
        fuel        = ones(self.NUM_UAV, dtype='int') * self.FULL_FUEL
        actuator    = ones(self.NUM_UAV, dtype='int') * ActuatorState.RUNNING
        sensor      = ones(self.NUM_UAV, dtype='int') * SensorState.RUNNING

        self.state = self.properties2StateVec(locations, fuel, actuator, sensor)
        return self.state.copy(), self.isTerminal(), self.possibleActions()

    def state2Struct(self,s):
        """
        Convert generic RLPy state ``s`` to generic

        :param s: RLPy state
        :returns: PST.StateStruct -- the custom structure used by this domain.

        """
        # Only perform multiplication once to save time
        fuelEndInd      = 2*self.NUM_UAV
        actuatorEndInd  = 3*self.NUM_UAV
        sensorEndInd    = 4*self.NUM_UAV
        locations = s[0             :self.NUM_UAV]
        fuel =      s[self.NUM_UAV  :fuelEndInd]
        actuator =  s[fuelEndInd    :actuatorEndInd]
        sensor =    s[actuatorEndInd:sensorEndInd]

        return StateStruct(locations, fuel, actuator, sensor)

    def properties2StateVec(self,locations, fuel, actuator, sensor):
        """
        Appends the arguments into an nparray to create an RLPy state vector.

        """
        return hstack([locations, fuel, actuator, sensor])

    def struct2State(self,sState):
        """
        Converts a custom PST.StateStruct to an RLPy state vector.

        :param sState: the PST.StateStruct object
        :returns: RLPy state vector

        """
        return hstack([sState.locations, sState.fuel, sState.actuator, sState.sensor])

    def possibleActions(self):
        s = self.state
        pp = self.possible_action_cache.get(tuple(s), None)
        if pp is not None:
            return pp

        # return the id of possible actions
        validActions = [] # Contains a list of uav_actions lists, e.g. [[0,1,2],[0,1],[1,2]] with the index corresponding to a uav.
        # First, enumerate the possible actions for each uav
        sStruct = self.state2Struct(s)
        for uav_id in range(0,self.NUM_UAV):
            uav_actions = []
            # Only allowed to loiter if have working actuator or are
            # in refuel/repair location
            if(sStruct.actuator[uav_id] == ActuatorState.RUNNING or \
            sStruct.locations[uav_id] == UAVLocation.REFUEL or \
            sStruct.locations[uav_id] == UAVLocation.BASE):
                uav_actions.append(UAVAction.LOITER)

            if(sStruct.fuel[uav_id] > 0): # This UAV is not crashed
                # Can advance to right as long as have working actuator and
                # are not in 'rightmost' state, surveillance
                if(sStruct.locations[uav_id] != UAVLocation.SURVEIL and \
                sStruct.actuator[uav_id] == ActuatorState.RUNNING):
                    uav_actions.append(UAVAction.ADVANCE)

                # Can retreat left anytime as long as aren't already in
                # 'leftmost' state, base
                if(sStruct.locations[uav_id] != UAVLocation.BASE):
                    uav_actions.append(UAVAction.RETREAT)
            else: # This UAV is crashed; give it a dummy action for now
                if(len(uav_actions) < 1): uav_actions.append(UAVAction.LOITER)
            if(isinstance(uav_actions,int)): # Test for single-UAV case
                validActions.append([uav_actions])
            else:
                validActions.append(uav_actions)
        pa = np.array(self.vecList2id(validActions, UAVAction.SIZE))
        self.possible_action_cache[tuple(s)] = pa
        return pa

    # TODO place this in Tools
    def vecList2id(self,x,maxValue):
        """
        Returns a list of unique id's based on possible permutations of a list of integer lists.
        The length of the integer lists need not be the same.

        :param x: A list of varying-length lists
        :param maxValue: the largest value a cell of ``x`` can take.
        :returns: int -- unique value associated with a list of lists of this length.

        Given a list of lists of the form [[0,1,2],[0,1],[1,2],[0,1]]... return
        unique id for each permutation between lists; eg above, would return 3*2*2*2 values
        ranging from 0 to 3^4 -1 (3 is max value possible in each of the lists, maxValue)

        """
        _id = 0
        actionIDs = [] # This variable is MODIFIED by vecList2idHelper() below.
        curActionList = []
        lenX = len(x)
        limits = tile(maxValue, (1,lenX))[0] # eg [3,3,3,3] # TODO redundant computation
        self.vecList2idHelper(x,actionIDs,0, curActionList, maxValue,limits) # TODO remove self

        return actionIDs

    def vecList2idHelper(self,x,actionIDs,ind,curActionList, maxValue,limits):
        """
        Helper method for vecList2id().

        :returns: a list of unique id's based on possible permutations of this list of lists.

        See vecList2id()

        """
        for curAction in x[ind]: # x[ind] is one of the lists, e.g [0, 2] or [1,2]
            partialActionAssignment = curActionList[:]
            partialActionAssignment.append(curAction)
            if(ind == len(x) - 1): # We have reached the final list, assignment is complete
                actionIDs.append(vec2id(partialActionAssignment, limits)) # eg [0,1,0,2] and [3,3,3,3]
            else:
                self.vecList2idHelper(x,actionIDs,ind+1,partialActionAssignment, maxValue,limits) # TODO remove self

    def isTerminal(self, s=None):
        if s is None:
            s = self.state
        sStruct = self.state2Struct(s)
        return any(logical_and(sStruct.fuel <= 0, sStruct.locations != UAVLocation.REFUEL))

class UAVLocation:
    """
    Enumerated type for possible UAV Locations

    """
    BASE        = 0
    REFUEL      = 1
    COMMS       = 2
    SURVEIL     = 3
    SIZE        = 4

class StateStruct:
    """
    Custom internal state structure

    """
    def __init__(self, locations, fuel, actuator, sensor):
        self.locations = locations
        self.fuel       = fuel
        self.actuator   = actuator
        self.sensor     = sensor

class ActuatorState:
    """
    Enumerated type for individual actuator state.

    """
    FAILED, RUNNING = 0,1 # Status of actuator
    SIZE = 2
class SensorState:
    """
    Enumerated type for individual sensor state.

    """
    FAILED, RUNNING = 0,1 # Status of sensor
    SIZE = 2

# Future code (in 'step') assumes that these can be summed, ie, 0,1,2 retreat, advance, loiter
class UAVAction:
    """
    Enumerated type for individual UAV actions.

    """
    ADVANCE,RETREAT,LOITER = 2,0,1 # Available actions
    SIZE = 3 # number of states above

# NOTE: properties2StateVec assumes the order loc,fuel,actuator,sensor
class UAVIndex:
    """
    Enumerated type for individual UAV Locations

    """
    LOC, FUEL, ACT_STATUS, SENS_STATUS = 0,1,2,3
    SIZE = 4 # Number of indices required for state of each UAV