High detection latency on Jetpack 5.1

we have upgraded some of our Jetsons to JetPack 5.1 from JetPack 4.6 and noticed substantial latency spikes when running our pytorch-based object detector.

Since we are unable to share our detector, we used Yolov7 for this post.
detector.zip (204.9 KB)

Base docker image on JetPack 4.6 was: nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
and on Jetpack 5.1 we tried: nvcr.io/nvidia/l4t-ml:r35.2.1-py3, nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3, as well as building pytorch for 5.1.

We also tried Yolov5 and converting Yolov7 model to TensorRT .engine file, but all of these methods resulted in latency spikes. Please note that we would like to use pytorch for the time being, and not migrate to DeepStream.

Finally, here are the latency comparison results:

Duplicate to topic 244772: