NVIDIA Developer Forums

TensorRT 8.0: What’s New

AI & Data Science Deep Learning (Training & Inference) TensorRT

TomNVIDIA July 20, 2021, 2:30pm 1

The latest release of high performance deep learning inference SDK, TensorRT 8 GA is now available for download. This version of TensorRT includes:

BERT Inference in 1.2 ms with new transformer optimizations
Achieve accuracy equivalent to FP32 with INT8 precision using Quantization Aware Training
Support for Sparsity for faster inference on Ampere GPUs

Learn more about the new features and resources here.

Topic		Replies	Views	Activity
NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond Technical Blog	0	477	July 20, 2021
Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated) Technical Blog	0	550	July 20, 2021
TensorRT 5 GA Now Available Technical Blog	0	284	August 21, 2022
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	2982	June 2, 2023
Just Released: TensorRT 8.4 Technical Blog	0	343	June 16, 2022
Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT Technical Blog	1	892	December 3, 2023
Turing Tensor core int4 operation TensorRT	3	2909	December 11, 2018
Sparsity in INT8: Training Workflow and Best Practices for NVIDIA TensorRT Acceleration Technical Blog	0	414	May 26, 2023
TensorRT 5 RC Now Available Technical Blog	0	295	August 21, 2022
Just Released: TensorRT 8.4 Technical Blog	0	296	June 16, 2022