Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT

Originally published at: Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT | NVIDIA Developer Blog

○ TensorRT is an SDK for high-performance deep learning inference, and TensorRT 8.0 introduces support for sparsity that uses sparse tensor cores on NVIDIA Ampere GPUs. It can accelerate networks by reducing the computation of zeros present in GEMM operations in neural networks. You get a performance gain compared to dense networks by just following the steps in this post.