Hi, I tried with 8.4 EA, there is no change in results. Can you please suggest where to upload ONNX file? Can I attach to this response?
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:1x512,attention_mask:1x512,token_type_ids:1x512 --workspace=4096
output:-
[05/27/2022-12:50:36] [I] === Performance summary ===
[05/27/2022-12:50:36] [I] Throughput: 32.5049 qps
[05/27/2022-12:50:36] [I] Latency: min = 30.4623 ms, max = 30.9917 ms, mean = 30.7379 ms, median = 30.7373 ms, percentile(99%) = 30.9917 ms
[05/27/2022-12:50:36] [I] End-to-End Host Latency: min = 30.4785 ms, max = 31.0029 ms, mean = 30.7483 ms, median = 30.7476 ms, percentile(99%) = 31.0029 ms
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:8x512,attention_mask:8x512,token_type_ids:8x512 --workspace=4096
output:-
[05/27/2022-12:52:36] [I] === Performance summary ===
[05/27/2022-12:52:36] [I] Throughput: 4.56083 ups
[05/27/2022-12:52:36] [I] Latency: min = 218.283 ms, max = 220.419 ms, mean = 219.258 ms, median = 219.195 ms, percentile(99%) = 220.419 ms
[05/27/2022-12:52:36] [I] End-to-End Host Latency: min = 218.291 ms, max = 220.43 ms, mean = 219.27 ms, median = 219.206 ms, percentile(99%) = 220.43 ms
command:-
/usr/src/tensorrt/bin/trtexec --onnx=/git/notebooks/onnx/model.onnx --saveEngine=bert_base.trt --shapes=input_ids:32x512,attention_mask:32x512,token_type_ids:32x512 --workspace=4096
output:-
[05/27/2022-12:56:47] [I] === Performance summary ===
[05/27/2022-12:56:47] [I] Throughput: 1.13098 qps
[05/27/2022-12:56:47] [I] Latency: min = 881.25 ms, max = 887.634 ms, mean = 884.279 ms, median = 884.527 ms, percentile(99%) = 887.634 ms
[05/27/2022-12:56:47] [I] End-to-End Host Latency: min = 881.266 ms, max = 887.651 ms, mean = 884.295 ms, median = 884.543 ms, percentile(99%) = 887.651 ms