Deploying GPT-J and T5 with FasterTransformer and Triton Inference Server

Originally published at: https://developer.nvidia.com/blog/deploying-gpt-j-and-t5-with-fastertransformer-and-triton-inference-server/

Learn step by step how to use the FasterTransformer library and Triton Inference Server to serve T5-3B and GPT-J 6B models in an optimal manner with tensor parallelism.

hi @jwitsoe ,
I am from the Chinese developer community.

There seems to be a picture mismatch in the results section of the article. Figure 5 should be T5-3B model inference speed-up comparison, but it shows GPT-J 6B.