Originally published at: Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server | NVIDIA Technical Blog
Learn about FasterTransformer, one of the fastest libraries for distributed inference of transformers of any size, including benefits of using the library.
Which NVIDIA hardware ressources do I need to deploy ChatGPT-J