Adaptive update through Q-learning

As people search through the graph, we should be updating the weights to reflect the paths that they actually take using the Q-learning update rule:

Q'(s, a) = (1- alpha)Q(s, a) + alpha(r_a + gamma*max_a(Q(s', a'))

The graph is already set up, we just need the update hooks to be implemented.

