Source

PyRNN / doc / source / elmannet.rst

Full commit

Elman Net

Dynamics of RNN

\begin{align*} \bm{x}_o^{(t)} &= \bm{a}_o( \bm{u}_o^{(t)} ) \\ \bm{x}_c^{(t)} &= \bm{a}_c( \bm{u}_c^{(t)} ) \\ \bm{u}_o^{(t)} &= W_{oc}\, \bm{u}_c^{(t)} + \bm{b}_o \\ \bm{u}_c^{(t)} &= (\bm{1}-\bm{e}_c)\bm{u}_c^{(t-1)} + \bm{e}_c (W_{cc}\, \bm{x}_c^{(t-1)} + W_{ci}\, \bm{x}_i^{(t)} + \bm{b}_c) \end{align*}

Note that \(\bm{x}\bm{u} = \sum_i x_i u_i\) and \(\bm{1} = (1,1,\cdots)^T\) .

Networks states and biases:

Layer Activated values Potentials Activation Func. Bias
Output \(\bm{x}_o\) \(\bm{u}_o\) \(\bm{a}_o\) \(\bm{b}_o\)
Context \(\bm{x}_c\) \(\bm{u}_c\) \(\bm{a}_c\) \(\bm{b}_c\)
Input \(\bm{x}_i\)      

Network weights:

Connection Weights
to from  
Output Context \(W_{oc}\)
Context Context \(W_{cc}\)
Context Input \(W_{ci}\)

\(\bm{e}_c\) is the time-constant like variable. For typical RNN, set \(\bm{e}_c = \bm{1}\) .

BPTT

Backward propagation of delta-error \(dE/d u_{c'}^{(t)}\) can be written as follows:

\begin{align*} \frac{dE}{d u_c^{(t-1)}} &= \sum_{o} \frac{dE}{d u_{o}^{(t)}} w_{oc} a_c'(u_c^{(t)}) + \sum_{c'} \frac{dE}{d u_{c'}^{(t)}} \frac{d u_{c'}^{(t)}}{d u_c^{(t-1)}} \\ \frac{d u_{c'}^{(t)}}{d u_c^{(t-1)}} &= (1 - e_{c'}) \delta_{c' c} + e_{c'} w_{c' c} a_c'(u_c^{(t-1)}) \end{align*}

Here I assume \(d a_c(u_c)/d u_{c'} = 0\) if \(c \neq c'\) .

Gradient of each parameters:

\begin{align*} \frac{dE}{d w_{oc}} &= \sum_{t} \frac{dE}{d u_{o}^{(t)}} x_c^{(t)}\\ \frac{dE}{d b_{o}} &= \sum_{t} \frac{dE}{d u_{o}^{(t)}}\\ \frac{dE}{d w_{cc'}} &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c x_{c'}^{(t-1)}\\ \frac{dE}{d w_{ci}} &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c x_{i}^{(t)}\\ \frac{dE}{d b_{c}} &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c\\ \end{align*}