Originally published at: Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT | NVIDIA Technical Blog
State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs. By leveraging the latest FP8 quantization features on NVIDIA Hopper GPUs with NVIDIA TensorRT, it’s possible to significantly reduce inference costs and serve more users with fewer GPUs.…