Hi,
i am doing some benchmark tests on the Jetson AGX Orin DevKit running as a Orin-NX16GB.
For benchmarking i am using the trtexec command
/usr/src/tensorrt/bin/trtexec --onnx=resnet50-v1-12.onnx --fp16
as a model i am using the most recent Resnet50 model from onnx model zoo .
However for the inference benchmark in the performance summary I only get a max throughput of 466 qps which is only a fraction of the official benchmark results .
Why is it that trtexec performs so badly on my onnx-model?
Find attached the printout of my bash.
resnet50_trtexec.log (34.5 KB)
for reference i am using nvidia-jetpack 5.1.2 with TensorRT 8.5.2.2 and cuda 11.4 and L4T-version 35.4.1
Thanks in advance