I would say we train them with 2 FFNNs / feed-forward-neural-nets.

We train these FFNNs with q-Learing from Reinforcement Learning. The first FFNN learns to select the weight to be altered, and the second FFNN learns the delta for this weight. So the RNN is being executed, and we can compute a Reward for this RNN. Then we update the 2 FFNNs with this reward. Q(0) or SARSA(0) or Q(Lambda) or SARSA(Lambda) should do it for the 2 Action-Selection-FFNNs. The RNN consists of two rectangular Matrices. The first Matrix is the Matrix of weights w(i,j) <- weight from Neuron-Output i to Neuron-Input j, the second Matrix is iw(i,j,) <- weight from State-Input i to Neuron-Input j