NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond

Originally published at: https://developer.nvidia.com/blog/nvidia-announces-tensorrt-8-slashing-bert-large-inference-down-to-1-millisecond/

NVIDIA announced TensorRT 8.0 which brings BERT-Large inference latency down to 1.2 ms with new optimizations.

1 Like