Inference Time is longer on new GPU (Quadro RTX 4000 Vs Quadro M4000)


After developping an inference sample on my NVIDIA’s Quadro M4000, and having quite good results for the throughput, I wanted to develop a product with a more recent (and I guess more effecient) GPU.

However I had surprising results as the throughput I had on the new GPU using “trtexec.exe” were much longer than the one I had with the old GPU.

Please find the results I had on trtexec in the linked file.

Here are my specs :

  • OLD GPU : Quadro M4000 / TensorRT- / Cuda 11.6
  • NEW GPU : Quadro RTX 4000 / TensorRT- / Cuda 11.8

To be more precise : The results of the old GPU are measured on an older computer, I replaced the new GPU by the old GPU in the new computer and the inferences time on the old GPU are way worse (GPU latency : 300ms). I don’t know what could be the source of these bad performances, the GPU isn’t overheating, I have 64Go of RAM, and the CPU is an i5-10500 @ 3.1 GHz)

If you have any idea about what is causing this differences i am really interested…