Accelerated Inference for Large Transformer Models Using FasterTransformer and Triton Inference Server

Originally published at: https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/

Learn about FasterTransformer, one of the fastest libraries for distributed inference of transformers of any size, including benefits of using the library.