It seems that sampleNMT incorrectly computes the attention vectors from the context vectors. There should be a tanh non-linear activation layer after W[c:h], as shown in Equation 3 GitHub - tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial. To fix the bug, you should add a tanh activation at the end of SLPAttention::addToModel.
Thank you for the feedback. I’ll bring this to our engineering team’s attention.
Hello,
our engineers have committed the fix (Add missing tanh to sampleNMT attention) and should be available in a future release.
thank you