It seems that sampleNMT incorrectly computes the attention vectors from the context vectors. There should be a tanh non-linear activation layer after W[c:h], as shown in Equation 3 https://github.com/tensorflow/nmt#background-on-the-attention-mechanism. To fix the bug, you should add a tanh activation at the end of SLPAttention::addToModel.
Thank you for the feedback. I’ll bring this to our engineering team’s attention.
our engineers have committed the fix (Add missing tanh to sampleNMT attention) and should be available in a future release.