NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond

Originally published at: NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond | NVIDIA Developer Blog

NVIDIA announced TensorRT 8.0 which brings BERT-Large inference latency down to 1.2 ms with new optimizations.

1 Like