Description
I am using Huggingface(Bert-large-cased) model and converted it to ONNX format using transformers[onnx] library.
And when I am converting onnx model tensorrt engine, I don’t see improvement in latency with the increase in batch size…Can you please help with this…
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:1x512,attention_mask:1x512,token_type_ids:1x512 --workspace=4096
output:-
[05/26/2022-16:43:12] [I] === Performance summary ===
[05/26/2022-16:43:12] [I] Throughput: 31.895 qps
[05/26/2022-16:43:12] [I] Latency: min = 31.0613 ms, max = 31.6614 ms, mean = 31.327 ms, median = 31.2918 ms, percentile(99%) = 31.6614 ms
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:8x512,attention_mask:8x512,token_type_ids:8x512 --workspace=4096
output:-
[05/26/2022-16:48:24] [I] === Performance summary ===
[05/26/2022-16:48:24] [I] Throughput: 4.42512 qps
[05/26/2022-16:48:24] [I] Latency: min = 224.912 ms, max = 226.356 ms, mean = 225.977 ms, median = 226.124 ms, percentile(99%) = 226.356 ms
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:32x512,attention_mask:32x512,token_type_ids:32x512 --workspace=4096
output:-
[05/26/2022-16:53:20] [I] === Performance summary ===
[05/26/2022-16:53:20] [I] Throughput: 1.13289 qps
[05/26/2022-16:53:20] [I] Latency: min = 879.309 ms, max = 884.625 ms, mean = 882.779 ms, median = 882.981 ms, percentile(99%) = 884.625 ms
Environment
TensorRT Version: 8.2.4.2
GPU Type: V100-SXM2
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2.2
CUDNN Version: 8.2.1.32
Operating System + Version: ubuntu-20.04.1
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 2.7
PyTorch Version (if applicable): n/a
Baremetal or Container (if container which image + tag): container
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered