Hello,
After developping an inference sample on my NVIDIA’s Quadro M4000, and having quite good results for the throughput, I wanted to develop a product with a more recent (and I guess more effecient) GPU.
However I had surprising results as the throughput I had on the new GPU using “trtexec.exe” were much longer than the one I had with the old GPU.
Please find the results I had on trtexec in the linked file.
Here are my specs :
- OLD GPU : Quadro M4000 / TensorRT-8.4.1.5 / Cuda 11.6
- NEW GPU : Quadro RTX 4000 / TensorRT-8.5.1.7 / Cuda 11.8
To be more precise : The results of the old GPU are measured on an older computer, I replaced the new GPU by the old GPU in the new computer and the inferences time on the old GPU are way worse (GPU latency : 300ms). I don’t know what could be the source of these bad performances, the GPU isn’t overheating, I have 64Go of RAM, and the CPU is an i5-10500 @ 3.1 GHz)
If you have any idea about what is causing this differences i am really interested…