- OS : Windows 10
- CUDA version : 11.5
- CUDNN version : 8.4.5.1
- TensorRT version: 8.4.3.1
- NVIDIA GPU : 2080 Ti
- NVIDIA Driver : 545.84
In my test application I run the inference process for 500 times. I expect the inference time to be stable around 1 or 2 milliseconds. This is the case for the very first 50 iterations. Regarding the fact that my test image and all the other variables stay unchanged during the execution, strangely enough I see the execution time keeps growing up to 12 milliseconds that is too much for me, since I have other operations that when they add up, the final execution time is disappointing.
This is the code snippet I use for the inference part :
auto now = std::chrono::high_resolution_clock::now();
bool status = mContext->enqueueV2(&mBindingDataHolder[0], *inferenceCudaStream, nullptr);
CUDA_CHECK(cudaStreamSynchronize(*inferenceCudaStream));
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - now);
spdlog::info("inference time {} ms", duration.count());
And this is the log I get on my console :
[2023-12-04 13:56:07.101] [info] inference time 1 ms
[2023-12-04 13:56:07.134] [info] inference time 1 ms
[2023-12-04 13:56:07.166] [info] inference time 1 ms
[2023-12-04 13:56:07.198] [info] inference time 1 ms
[2023-12-04 13:56:07.231] [info] inference time 1 ms
[2023-12-04 13:56:07.264] [info] inference time 1 ms
[2023-12-04 13:56:07.296] [info] inference time 1 ms
[2023-12-04 13:56:07.327] [info] inference time 1 ms
[2023-12-04 13:56:07.358] [info] inference time 5 ms
[2023-12-04 13:56:07.517] [info] inference time 6 ms
[2023-12-04 13:56:07.570] [info] inference time 5 ms
[2023-12-04 13:56:07.607] [info] inference time 6 ms
[2023-12-04 13:56:07.645] [info] inference time 6 ms
[2023-12-04 13:56:07.764] [info] inference time 6 ms
[2023-12-04 13:56:07.875] [info] inference time 6 ms
[2023-12-04 13:56:07.911] [info] inference time 6 ms
[2023-12-04 13:56:07.956] [info] inference time 6 ms
[2023-12-04 13:56:08.020] [info] inference time 6 ms
[2023-12-04 13:56:08.053] [info] inference time 6 ms
[2023-12-04 13:56:08.108] [info] inference time 6 ms
[2023-12-04 13:56:08.147] [info] inference time 6 ms
[2023-12-04 13:56:08.185] [info] inference time 6 ms
[2023-12-04 13:56:08.218] [info] inference time 6 ms
[2023-12-04 13:56:08.259] [info] inference time 6 ms
[2023-12-04 13:56:08.299] [info] inference time 6 ms
[2023-12-04 13:56:08.339] [info] inference time 6 ms
[2023-12-04 13:56:08.380] [info] inference time 6 ms
[2023-12-04 13:56:08.413] [info] inference time 6 ms
[2023-12-04 13:56:08.451] [info] inference time 6 ms
[2023-12-04 13:56:08.491] [info] inference time 6 ms
[2023-12-04 13:56:08.531] [info] inference time 6 ms
[2023-12-04 13:56:08.565] [info] inference time 6 ms
[2023-12-04 13:56:08.645] [info] inference time 7 ms
[2023-12-04 13:56:08.789] [info] inference time 7 ms
[2023-12-04 13:56:08.853] [info] inference time 7 ms
[2023-12-04 13:56:08.917] [info] inference time 7 ms
[2023-12-04 13:56:08.958] [info] inference time 7 ms
[2023-12-04 13:56:09.080] [info] inference time 10 ms
[2023-12-04 13:56:09.170] [info] inference time 11 ms
[2023-12-04 13:56:09.298] [info] inference time 11 ms
[2023-12-04 13:56:09.441] [info] inference time 11 ms
I emphasize again that the test image and other variables stay unchanged during the whole process. mBindingDataHolder and inferenceCudaStream are allocated once as well.
I did a search on the Internet for similar issues. The only thing that came across advised by some people was to set cudaSetDeviceFlags (cudaDeviceScheduleBlockingSync). I’m afraid it did not help either.