Introduction to Neural Machine Translation with GPUs (part 1)

Note: This is the first part of a detailed three-part series on machine translation with neural networks by Kyunghyun Cho. You may enjoy part 2 and part 3. Neural machine translation is a recently proposed framework for machine translation based purely on neural networks. This post is the first of a series in which I will explain a simple…

One equation is missing. Looks like latex error.

Thanks, I've fixed it. Wordpress latex is tricky...

Had one question, there is a function g_theta specified towards the end of the post to model the conditional probability of p(x|x_(less_than(t))), but it is not defined anywhere. Is g_theta the soft-max function? Also is g_theta used at any point in the training?

I found this post so interesting, thank you for sharing! If I want to cite this (or the next two posts), what's the best thing to do?

Cite the NVIDIA Developer Blog, along with authors and title, as you would any source.