The latest release of high performance deep learning inference SDK, TensorRT 8 GA is now available for download. This version of TensorRT includes:
- BERT Inference in 1.2 ms with new transformer optimizations
- Achieve accuracy equivalent to FP32 with INT8 precision using Quantization Aware Training
- Support for Sparsity for faster inference on Ampere GPUs
Learn more about the new features and resources here.