Neural Machine Translation Inference with TensorRT 4

Originally published at:

Neural machine translation exists across a wide variety consumer applications, including web sites, road signs, generating subtitles in foreign languages, and more. TensorRT, NVIDIA’s programmable inference accelerator, helps optimize and generate runtime engines for deploying deep learning inference apps to production environments. NVIDIA released TensorRT 4 with new features to accelerate inference of neural machine…

Hi, which GPU this blog used. I found that this blog said at the beginning 'Google’s Neural Machine Translation (GNMT) model performed inference 60x faster using TensorRT on Tesla V100 GPUscompared to CPU-only platforms' but '1 GPU: Tesla P4(GP104), Driver=r384.125, CPU = E5-2690 v4@2.60GHz 3.5GHz Turbo (Broadwell) HT On, Threads=56, Sockets=2, FP32. CPU-only configuration: Skylake Gold 6140@2.30GHz 3.7GHz Turbo (Skylake); HT Off; Sockets: 2; Threads: 36, FP32' at the end of this blog.

Detailed machine spec at the end of the blog corresponds to sampleNMT measurement, which was performed on a Tesla P4 GPU.
GNMT performance was on a Tesla V100 GPU as you mentioned above. Hope that clarifies.