Originally published at: Deploying GPT-J and T5 with NVIDIA Triton Inference Server | NVIDIA Technical Blog
Learn step by step how to use the FasterTransformer library and Triton Inference Server to serve T5-3B and GPT-J 6B models in an optimal manner with tensor parallelism.
hi @jwitsoe ,
I am from the Chinese developer community.
There seems to be a picture mismatch in the results section of the article. Figure 5 should be T5-3B model inference speed-up comparison, but it shows GPT-J 6B.