The time of cudaMemcpyAsync() in cudaMemcpyDeviceToHost mode is unstable after upgrading the graphics card driver to version 512

I have been using tensorrt to do my work for a while and the inference time is stable.
However, some computers with the same code behave the great instability in inference speed of tensorrt.
For example, an image should be detected in 30ms normally.In fact, the same image needs 120ms in the next detection time with the same model.
After my conjecture and experimental verification,I found the key factor——the version of graphics card driver.I found that all the cumpters which behave the great instability have the common feature.Their version of graphics card driver all above 512.And the version of graphics card driver of the other computers are under 500 such as 471.41 and 472.88.
I decided to downgrade one computer’s graphics driver to 472 and it detected stably!Therefore my guess is correct.
So why is the TensorRT inference time related to the graphics card driver version?

1 Like