NVIDIA Developer Forums

NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond

Technical Blogs & Events Technical Blog

jwitsoe July 20, 2021, 2:28pm 1

Originally published at: https://developer.nvidia.com/blog/nvidia-announces-tensorrt-8-slashing-bert-large-inference-down-to-1-millisecond/

NVIDIA announced TensorRT 8.0 which brings BERT-Large inference latency down to 1.2 ms with new optimizations.

1 Like